Kefu Chai [Tue, 26 May 2026 14:01:41 +0000 (22:01 +0800)]
crimson/seastore: make RecordSubmitter::wait_available() idempotent
Under sustained 4K randwrite workloads that roll journal segments
frequently, crimson-osd hits
```
crimson/os/seastore/journal/record_submitter.cc:198:
FAILED ceph_assert(!is_available())
```
and, in release builds without assertions, a downstream
`boost::throw_exception<std::length_error>` from
`seastar::shared_promise::get_shared_future()` called on a
disengaged `std::optional` in the same code path.
`RecordSubmitter::roll_segment()` arms wait_available_promise on entry,
then chains `journal_allocator.roll().safe_then(...)` whose continuation
sets the promise's value and resets the optional. The background
continuation can resolve before the subsequent `wait_available()` call
is entered -- the optional gets reset, `is_available()` becomes true
again, and `wait_available()`'s `assert(!is_available())` fires. The
brittle invariant being assumed
> .safe_then's continuation will not run before its outer call returns
is not part of seastar's contract.
Honour the documented contract instead. record_submitter.h
says:
> wait for available if cannot submit, should check
> is_available() again when the future is resolved.
The postcondition is "available when resolved"; the precondition
"unavailable when called" was incidental. Make `wait_available()`
idempotent: if `is_available()` is already true on entry, return a
ready future immediately. All three external callers
- `RecordSubmitter::roll_segment`
- `CircularBoundedJournal::submit_record`
- `SegmentedOolWriter::do_write`
re-check `is_available()` on the next iteration or in the chained
continuation and dispatch correctly.
Validated by runing a 96-job fio randwrite bench to confirm
the fix in operation; pre-patch the assert fires within ~30 min
and kills the OSD.
Afreen Misbah [Wed, 27 May 2026 00:07:38 +0000 (05:37 +0530)]
mgr/dashboard: fix nested shell quoting in cephadm e2e start-cluster
with_libvirt wraps commands in sg libvirt -c "$1", adding an extra
shell layer. Nested double quotes inside the outer double-quoted
string caused the argument to be split — with_libvirt received a
truncated $1, producing "Unterminated quoted string" on the remote
shell.
Drop the unnecessary inner double quotes around cephadm shell
arguments since cephadm shell accepts the command as separate args.
Use single quotes for the grep pattern inside the double-quoted
string so it survives the sg subshell.
osd_scrub_queued_snaptrims_limit, introduced in PR#68737,
blocks the initiation of non-urgent scrubs on OSDs that
are overloaded with snap-trim operations.
ShreeJejurikar [Wed, 20 May 2026 07:18:03 +0000 (12:48 +0530)]
qa/rgw/bucket-logging: configure STS for assume-role test
Set rgw sts key and enable rgw s3 auth use sts, both needed by
test_bucket_logging_requester_assumed_role. Mirrors the existing
settings in qa/suites/rgw/verify/overrides.yaml.
Xuehan Xu [Fri, 15 May 2026 09:10:04 +0000 (17:10 +0800)]
crimson/os/seastore: also update the mappings copied by client
transactions when committing background rewriting transactions
With the 128-bit laddr key layout in place, SeaStore::rename would
involve copying mappings. These mappings must also be updated when
the logical extents they point to are rewritten.
This commit introduces performance counters for individual Ceph mgr modules.
These counters allow monitoring module behavior, debugging latency issues,
and identifying performance bottlenecks, all without modifying the modules themselves.
The following counters are now exposed under:
> ceph daemon mgr.<id> perf dump
Example structure:
"mgr_module_<module_name>": {
"notify_avg_usec": { <- Average time spent handling notify events
"avgcount": 0,
"sum": 0
},
"cmd_avg_usec": { <- Average time spent processing CLI/admin commands
"avgcount": 0,
"sum": 0
},
"serve_avg_usec": { <- Average time spent in module serve loop (if applicable)
"avgcount": 0,
"sum": 0
},
"alive": 1 <- Module is alive (1 = running, 0 = exited)
"cpu_usage": 0, <- CPU usage in percent
"mem_rss_change": 0, <- Memory RSS change in bytes
"mem_rss_current": 490737664 <- Memory RSS current in bytes
}
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Conflicts:
src/mgr/ActivePyModules.cc - finisher.queue changed by 63859, adding py_module to the parameter list
src/mgr/PyModuleRegistry.cc - check_all_modules_started added by 63859
ceph-volume: OSD mapper lifecycle (LVM + raw) for activate
This adds small helpers so activate can consistently bring the OSD device
stack online (LVM lvchange, optional mapper open) and tear it down again,
with refresh in between. Same idea for the raw path. Crypto is handled
inside that flow when the OSD is encrypted.
Kefu Chai [Sun, 24 May 2026 08:25:46 +0000 (16:25 +0800)]
rgw: bump Apache Arrow submodule from 17.0.0 to 19.0.1
When WITH_SYSTEM_ARROW is false, Ceph builds Arrow from the bundled
src/apache submodule. Our CI uses ubuntu:jammy as the base image, which
does not package libarrow-dev, so the bundled path is always taken there.
Arrow 17.0.0 vendors a copy of Thrift whose download URLs are no longer
reachable, breaking CI builds that try to fetch them at configure time.
Bump arrow submodule to 19.0.1, the latest Arrow release that:
- builds successfully on ubuntu:jammy, and
- requires only CMake 3.22 (the version shipped by ubuntu:jammy)
See also
CMake version shipped by ubuntu:jammy
- https://packages.ubuntu.com/jammy/cmake
Kefu Chai [Fri, 22 May 2026 11:01:17 +0000 (19:01 +0800)]
crimson/scrub: fix assert in PGScrubber::release_range() on interval change
when an interval change occurs while ScrubReserveRange is still
waiting to acquire background_process_lock, ChunkState::exit()
calls release_range() but blocked is not yet set. this triggers
ceph_assert(blocked) in release_range().
fix by checking if blocked is set before asserting. if blocked is
not set, the range was never reserved, so release_range() is a
no-op. ScrubReserveRange's finally block handles lock cleanup in
this case.
Jan Radon [Fri, 15 May 2026 13:42:08 +0000 (15:42 +0200)]
feat(rgw/kafka): add mTLS client certificate authentication for Kafka notifications
Add support for mutual TLS (mTLS) client certificate authentication
when publishing bucket notifications to Kafka brokers. RGW can now
present a client certificate and private key to authenticate with
brokers that require ssl.client.auth=required.
Changes:
- Add ssl-certificate-location, ssl-key-location, and ssl-key-password
topic attributes for configuring client certificates
- Validate that ssl_certificate and ssl_key are provided together
- Include ssl_key_password in connection identity (hash/equality)
- Add kafka-security.sh script for generating broker and client TLS certs
- Add mTLS test (test_notification_kafka_security_ssl_mtls) using
use_mtls=True flag on the existing SSL security path
- Update RGW notifications documentation with mTLS parameters
Fixes: http://tracker.ceph.com/issues/67427 Signed-off-by: Jan Radon <jan.fabian.radon@sap.com>
The new implementation retire an absent extent by constructing a real
empty extent and add it to the transaction's retired_set, instead of
creating a retired placeholder
osd/scrub: limit scrubbing under snap-trimming overload
When the snap-trim queues are long, scrubbing is likely to
make things worse. This change adds a new scrubbing restriction
for that case, and prevents periodic scrubs from starting when
the total snap-trim queue length across all PGs exceeds a
configurable threshold.
Maodi Ma [Wed, 5 Nov 2025 02:35:46 +0000 (02:35 +0000)]
common: enable AVX512+VPCLMULQDQ for crc32c performance on x86
- Add crc32_iscsi_by16_10 in src/isa-l into candidates for ceph_crc32c
- Add hardware capability check for AVX512 instr before register
- Add NASM feature check to ensure compatibility and to enable
AS_FEATURE_LEVEL in crc32_iscsi_by16_10.asm
ShreeJejurikar [Wed, 13 May 2026 13:05:39 +0000 (18:35 +0530)]
rgw/logging: use assumed-role ARN as Requester for STS requests
When a request is made with STS temporary credentials, the bucket logging
Requester field was being set to the underlying user ID instead of the
assumed-role ARN. Per the AWS S3 server-access-log spec, the Requester
field should contain the assumed-role ARN (e.g.
arn:aws:sts::<account>:assumed-role/<role>/<session>) for STS-credentialed
requests.
Detect TYPE_ROLE identities via s->auth.identity->get_identity_type() and
use the ARN returned by Identity::get_caller_identity() (already
implemented by RoleApplier in the expected AWS format) instead of falling
straight through to s->user->get_id(). Existing behavior for account- and
user-scoped requests is unchanged.