osd_scrub_queued_snaptrims_limit, introduced in PR#68737,
blocks the initiation of non-urgent scrubs on OSDs that
are overloaded with snap-trim operations.
ShreeJejurikar [Wed, 20 May 2026 07:18:03 +0000 (12:48 +0530)]
qa/rgw/bucket-logging: configure STS for assume-role test
Set rgw sts key and enable rgw s3 auth use sts, both needed by
test_bucket_logging_requester_assumed_role. Mirrors the existing
settings in qa/suites/rgw/verify/overrides.yaml.
Xuehan Xu [Fri, 15 May 2026 09:10:04 +0000 (17:10 +0800)]
crimson/os/seastore: also update the mappings copied by client
transactions when committing background rewriting transactions
With the 128-bit laddr key layout in place, SeaStore::rename would
involve copying mappings. These mappings must also be updated when
the logical extents they point to are rewritten.
This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.
The following counters are now exposed under:
> ceph daemon mgr.<id> perf dump
Example structure:
"mgr_module_<module_name>": {
"notify_avg_usec": { <- Average time spent handling notify events
"avgcount": 0,
"sum": 0
},
"cmd_avg_usec": { <- Average time spent processing CLI/admin commands
"avgcount": 0,
"sum": 0
},
"serve_avg_usec": { <- Average time spent in module serve loop (if applicable)
"avgcount": 0,
"sum": 0
},
"alive": 1 <- Module is alive (1 = running, 0 = exited)
"cpu_usage": 0, <- CPU usage in percent
"mem_rss_change": 0, <- Memory RSS change in bytes
"mem_rss_current": 490737664 <- Memory RSS current in bytes
}
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859
ceph-volume: OSD mapper lifecycle (LVM + raw) for activate
This adds small helpers so activate can consistently bring the OSD device
stack online (LVM lvchange, optional mapper open) and tear it down again,
with refresh in between. Same idea for the raw path. Crypto is handled
inside that flow when the OSD is encrypted.
Kefu Chai [Fri, 22 May 2026 11:01:17 +0000 (19:01 +0800)]
crimson/scrub: fix assert in PGScrubber::release_range() on interval change
when an interval change occurs while ScrubReserveRange is still
waiting to acquire background_process_lock, ChunkState::exit()
calls release_range() but blocked is not yet set. this triggers
ceph_assert(blocked) in release_range().
fix by checking if blocked is set before asserting. if blocked is
not set, the range was never reserved, so release_range() is a
no-op. ScrubReserveRange's finally block handles lock cleanup in
this case.
Jan Radon [Fri, 15 May 2026 13:42:08 +0000 (15:42 +0200)]
feat(rgw/kafka): add mTLS client certificate authentication for Kafka notifications
Add support for mutual TLS (mTLS) client certificate authentication
when publishing bucket notifications to Kafka brokers. RGW can now
present a client certificate and private key to authenticate with
brokers that require ssl.client.auth=required.
Changes:
- Add ssl-certificate-location, ssl-key-location, and ssl-key-password
topic attributes for configuring client certificates
- Validate that ssl_certificate and ssl_key are provided together
- Include ssl_key_password in connection identity (hash/equality)
- Add kafka-security.sh script for generating broker and client TLS certs
- Add mTLS test (test_notification_kafka_security_ssl_mtls) using
use_mtls=True flag on the existing SSL security path
- Update RGW notifications documentation with mTLS parameters
Fixes: http://tracker.ceph.com/issues/67427 Signed-off-by: Jan Radon <jan.fabian.radon@sap.com>
The new implementation retire an absent extent by constructing a real
empty extent and add it to the transaction's retired_set, instead of
creating a retired placeholder
osd/scrub: limit scrubbing under snap-trimming overload
When the snap-trim queues are long, scrubbing is likely to
make things worse. This change adds a new scrubbing restriction
for that case, and prevents periodic scrubs from starting when
the total snap-trim queue length across all PGs exceeds a
configurable threshold.
Maodi Ma [Wed, 5 Nov 2025 02:35:46 +0000 (02:35 +0000)]
common: enable AVX512+VPCLMULQDQ for crc32c performance on x86
- Add crc32_iscsi_by16_10 in src/isa-l into candidates for ceph_crc32c
- Add hardware capability check for AVX512 instr before register
- Add NASM feature check to ensure compatibility and to enable
AS_FEATURE_LEVEL in crc32_iscsi_by16_10.asm
ShreeJejurikar [Wed, 13 May 2026 13:05:39 +0000 (18:35 +0530)]
rgw/logging: use assumed-role ARN as Requester for STS requests
When a request is made with STS temporary credentials, the bucket logging
Requester field was being set to the underlying user ID instead of the
assumed-role ARN. Per the AWS S3 server-access-log spec, the Requester
field should contain the assumed-role ARN (e.g.
arn:aws:sts::<account>:assumed-role/<role>/<session>) for STS-credentialed
requests.
Detect TYPE_ROLE identities via s->auth.identity->get_identity_type() and
use the ARN returned by Identity::get_caller_identity() (already
implemented by RoleApplier in the expected AWS format) instead of falling
straight through to s->user->get_id(). Existing behavior for account- and
user-scoped requests is unchanged.
Kefu Chai [Wed, 20 May 2026 07:55:58 +0000 (15:55 +0800)]
crimson/osd: use store-specific max_object_size for the OSD-layer write check
is_offset_and_length_valid() checked write sizes against
osd_max_object_size (128 MiB), but SeaStore caps per-onode laddr space
at seastore_default_max_object_size (16 MiB). Writes between the two
limits pass the OSD check, reach SeaStore, and trip
prepare_data_reservation()'s ceph_assert(), crashing the OSD and its
replicas.
Add FuturizedStore::Shard::get_max_object_size() (returns
osd_max_object_size by default) and override it in SeaStore::Shard to
return min(osd_max_object_size, max_object_size). Convert
is_offset_and_length_valid() from a static function to a PGBackend
member that queries the store, so EFBIG reaches the client before the
write ever hits the store.
Kefu Chai [Wed, 20 May 2026 04:03:12 +0000 (12:03 +0800)]
crimson/osd: only unblock wait_for_active_blocker on replica when ACTIVE
ReplicaActive::react(ActivateCommitted) sets ACTIVE or PEERED before
calling on_activate_committed(). Without a guard, an unconditional
unblock() on the PEERED path resets the promise, causing ops that
arrive afterward to park indefinitely (until the next on_change()).
The primary already has this guard in on_activate_complete(); mirror it
on the replica side.
Kefu Chai [Wed, 20 May 2026 07:36:51 +0000 (15:36 +0800)]
crimson/seastore: reject oversized writes and zeros instead of aborting
prepare_data_reservation() ceph_assert()s the request fits within
seastore_default_max_object_size (16 MiB), but the OSD validates writes
against osd_max_object_size (128 MiB). Anything between the two limits
passes OSD validation then trips the assert, crashing the OSD and its
replicas.
_zero() already returned EIO for this case; mirror that in _write() and
fix _zero()'s off-by-one (>= should be >, matching the <= in the assert).