Adam King [Mon, 13 Jan 2025 18:25:00 +0000 (13:25 -0500)]
service_spec: force ceph-exporter sock_dir to be unest or "/var/run/ceph/"
As discussed in https://tracker.ceph.com/issues/69475 this
setting is effectively useless as it only controls the directory
inside the container where the ceph-exporter will write out
its asok file, and has no influence over where it is on the
host where the ceph-exporter daemon is deployed. Given any
custom values for the sock_dir setting would have always
been broken, we decided to skip out on writing a proper migration
step in cephadm to deal with this, and instead just force this
field to be unset
Adam King [Thu, 9 Jan 2025 16:36:34 +0000 (11:36 -0500)]
cephadm: fix handling of ceph-exporter sock-dir
Fixes: https://tracker.ceph.com/issues/69475
It turns out the sock-dir for ceph-exporter only needs to
exist within the container, not on the host. Previous code,
including the validation function this commit removes
and previous patches trying to fix the ceph-exporter asok
file not appearing on the host, were all done assuming
it mattered what was on the host. This patch changes things
so all we do with the sock dir is mount it to /var/run/ceph/<fsid>
and don't worry about whether that dir exists on the host.
Additionally, the patch makes it so /var/run/ceph/<fsid> is
created during ceph-exporter deployment.
Yuri Weinstein [Mon, 13 Jan 2025 13:32:04 +0000 (05:32 -0800)]
Merge pull request #60278 from rzarzynski/wip-os-fastomapiter
os, osd: bring the lightweight OMAP iteration
Reviewed-by: Casey Bodley <cbodley@redhat.com> Reviewed-by: Matan Breizman <Matan.Brz@gmail.com> Reviewed-by: Mark Kogan <mkogan@redhat.com> Reviewed-by: Adam Kupczyk <akupczyk@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com>
This commit adds:
1. workflow summary in the first section along with an image.
2. sub-section "Pushing to ceph-ci repository" to second section.
3. file doc/dev/developer_guide/testing_integration_tests/workflow.png
This commit updates RGW Config Reference - Lifecycle Settings section. In particular it addresses an incorrect suggestion to decrease parallel threads in the workers pool for a more aggressive/accelerated per-bucket lifecycle processing. A more aggressive lifecycle processing for a bucket containing higher number of objects is achieved by increasing, not decreasing parallel threads.
Current suggestion is miss-leading.
rgw/multisite: the create_bucket forward request omits the
the request body, thus missing some data if specified inside
CreateBucketConfiguration xml on the non-master zone.
also, now that we perform cksum validation against empty payloads,
such a request would fail with -ERR_AMZ_CONTENT_SHA256_MISMATCH due
to a zero content-length but a non-empty payload hash.
this fix ensures that request body is forwarded during create_bucket
Casey Bodley [Fri, 18 Oct 2024 17:27:51 +0000 (13:27 -0400)]
rgw/rados: get_part_obj_state() fixes accounted_size when uncompressed
the part head objects don't have a RGW_ATTR_MANIFEST attribute, so
get_obj_state_impl() isn't able to set the correct
RGWObjState::accounted_size unless RGW_ATTR_COMPRESSION provides one
get_part_obj_state() builds a fake manifest in memory to represent the
part and updates state.size accordingly, but it hadn't corrected the
value of state.accounted_size
Matt Benjamin [Sat, 12 Oct 2024 17:49:29 +0000 (13:49 -0400)]
rgw_cksum: permit fallback to checksum-type from create-multipart, in upload-part
There appear to be workloads that provide a checksum algorithm in
create-multipart-upload, but do not provide (what must be) the
corresponding algorithm when uploading the parts. (complete-multipart-upload
has no checksum argument, the value is implicit.)
This behavior is inconsistent with at least some SDKs, but it is
possibly accepted behavior in AWS S3, and is not logically necessary,
since the originally supplied checksum type is already known.
Therefore, change the behavior of upload-part to fall back to a
checksum type that was sent with the corresponding create-multipart-upload
request, if one is present, rather than failing with InvalidRequest.
Fixes: https://tracker.ceph.com/issues/68513 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 6b487a4c6dbadf3f470c8b12ddd5f2521c6920c6)
Matt Benjamin [Mon, 8 Jan 2024 02:33:07 +0000 (21:33 -0500)]
rgw: implement GetObjectAttributes
Implements the corresponding S3 operation, and
introduces a new Object::list_parts SAL interface to support it.
Includes Casey Bodley <cbodley@redhat.com>:
* use uncompressed part size
* local variable shadowed a member variable and broke handling of
PartNumberMarker in request and response
Fixes: https://tracker.ceph.com/issues/64109 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
This commit updates the `is_device` function to correctly handle
loop devices.
The function now validates loop devices when they are explicitly
allowed, by checking their type (`loop`) in addition to `disk`
and `mpath`.
Changes include:
- Extending the type check to include `loop` in the list of
supported device types.
- Enhancing the docstring for better documentation of the
function's purpose and behavior.
These changes ensure that loop devices are properly recognized
and handled when configuring OSDs in ceph-volume.
Implement inheritence for ceph::io_exerciser::IoOp to allow better differentiation between the different Op types and allow more complex Operations to be implemented
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
ceph-volume: add python hints to util.prepare.create_id()
This commit introduces type annotations to the `create_id` function in `ceph_volume.util.prepare`.
The parameters and return value are now typed as follows:
- `fsid` is a `str`.
- `json_secrets` is a `str`.
- `osd_id` is an optional `str` (`Optional[str]`).
- The function returns a `str`.
Vallari Agrawal [Wed, 25 Dec 2024 05:01:21 +0000 (10:31 +0530)]
mon/NVMeofGwMap: add delay to NVMEOF_GATEWAY_DELETING warning
Instead of immediately triggering, have this healthcheck trigger
after some time has elasped. This delay can be configured by
mon_nvmeofgw_delete_grace.
Track the time when gateways go into DELETING state in a new
member var (of NVMeofGwMon) 'gws_deleting_time'.
Casey Bodley [Wed, 18 Dec 2024 16:28:02 +0000 (11:28 -0500)]
rgw: don't use merge_and_store_attrs() when recreating a bucket
https://github.com/ceph/ceph/pull/56583 recently fixed
merge_and_store_attrs() to preserve existing attrs, but this broke the
swift api's ability to remove container metadata. RGWCreateBucket
handles this merging itself with prepare_add_del_attrs(), so we should
just assign createparams.attrs to the bucket and store it with
bucket->put_info()
make the same change for RGWPutMetadataBucket which swift uses to
add/remove existing metadata
This config allows to configure the delay in triggering
NVMEOF_GATEWAY_DELETING healthcheck warning, which is
triggered when NVMeoF gateways are in DELETEING state
for too long (indicating a problem in namespace
load-balacing).
The default value for this config is 15 mins.
Xuehan Xu [Fri, 22 Nov 2024 08:38:02 +0000 (16:38 +0800)]
crimson/osd/replicated_backend: make sure the check on whether to send
ops to replica osds and the pg log append happens in the same
continuation
Since backfill relies on the pg log to discover new modifications, we
need to make sure backfill always discover the modification that's not
sent to replica osds.