mgr/dashboard: "Access Denied" being shown on overview page for read-only user
Fix: https://tracker.ceph.com/issues/76293 Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
This commit removes centos9 from crimson's supported distros. This is in
line with the wider ceph moving on to rocky10 from centos9. We have
established that crimson is compatible with rocky10. More details can be
found in this tracker: https://tracker.ceph.com/issues/75823. Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
osd/SnapMapper::update_snaps() to handle a missing OBJ_ record
by falling back to add_oid() instead of silently creating an
inconsistent state (OBJ_ without matching SNA_ entries). This
was observed on replicas that had recently recovered objects:
the snap mapper entries created during recovery were not visible
to a subsequent snap-trim repop's update_snaps() call, leaving
the clone with no snap mapper entries. Scrub would then detect
and report the inconsistency as an error.
Promote snap mapper remove_oid/clear_snaps logging to dout(10)
and add apply_op_stats tracing to aid diagnosis of any remaining
stat or snap mapper drift.
Gil Bregman [Tue, 5 May 2026 08:53:25 +0000 (11:53 +0300)]
mgr/dashboard: Allow empty port value when adding a listener in NVMEoF CLI Fixes: https://tracker.ceph.com/issues/76410 Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
rgw/test: add Journal mode support to bucket logging test suite
Add --logging-type flag to run the Python bucket logging test suite
in either Standard or Journal mode. The same tests run against both
logging types with no changes to test logic or assertions.
- Add --logging-type pytest CLI option (Standard default, Journal opt-in)
- Detect boto3 LoggingType extension availability at session startup
- Thread logging_type through helpers and test functions
- Add teuthology task YAML for Journal mode suite runs
- Install service-2.sdk-extras.json in the teuthology task when
logging_type is Journal (s3tests cleans it up after its own run,
so the file isn't available by the time our Journal job runs)
- Document Journal mode local usage in the test suite README
The LibRadosAio.PoolEIOFlag test was unstable, skip due to:
- Test Unreliability (Timing Dependency): The test used a fixed
iteration count for IO submission and finished too quickly, missing the
target error.
* Fix: The submission loop now runs continuously (time-bounded)
until the EIO error is reliably caught, eliminating the timing issue.
Nitzan Mordechai [Wed, 26 Nov 2025 14:36:42 +0000 (14:36 +0000)]
aio_cxx: Fix mutual deadlock in PoolEIOflag test
The LibRadosAio.PoolEIOFlag test was unstable, it was hanging due to:
- Deadlock: The main thread held the shared mutex ('my_lock') while
calling thread join. This created a mutual deadlock.
* Fix: Mutex is unlocked before thread join using RAII scopes.
- also convert to std::jthread and drop the join
seastore/omap_manager/btree: change omap manager funcs to coroutines
This commit changes funcs in BTree OMap manager to coroutines. Apart
from cleaner code that's easier to follow this is done to fix ASan
heap-use-after-free asserts.
Example QA job with the error: https://pulpito.ceph.com/shraddhaag-2026-04-20_07:04:25-crimson-rados-main-distro-debug-trial/164374/ Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
seastore/omap_manager/btree: change node insert/del funcs to coroutines
This commit changes OMapLeafNode and OMapInnerNode funcs to coroutines
to improve readability and prevent any ASan heap-use-after-free asserts. Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
* refs/pull/67536/head:
qa/multisite: enable the multisite test for oidc.
rgw/oidc: plumb RGWObjVersionTracker through load/store for race detection
rgw/oidc rados: add rgwrados::oidcs namespace abstraction for cls_user for accounts.
rgw/rest-oidc: Forward all oidc mutation request to master zone.
rgw/oidc: add rgwrados::oidc interface to support multisite.
Matthew N. Heler [Wed, 17 Dec 2025 02:53:20 +0000 (20:53 -0600)]
qa/rgw: add teuthology support for target_by_bucket cloud transition
Add cloud_target_by_bucket and cloud_target_by_bucket_prefix options
to rgw_cloudtier.py and s3tests.py. Create new test suite to run
target_by_bucket-specific s3-tests.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
sestore/omap_manager/btree: prevent heap buffer overflow in log
This commit fixes a heap overflow in omap_btree_node_impl when
logging the full bufferlist. This issue was already tracked in
https://tracker.ceph.com/issues/71524. To prevent this from happening,
we log the length of the bufferlist instead of the full log.
Matthew N. Heler [Mon, 20 Apr 2026 21:25:47 +0000 (16:25 -0500)]
rgw/cloud-transition: yield in cloud_tier_bucket_exists HEAD
The HEAD request used null_yield, so every attempt (including the
retries added by retry_on_busy) blocked the LC worker thread for
the full HTTP timeout instead of yielding.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
rgw/cloud-transition: check bucket existence before create
Add HEAD request to check if target bucket exists before attempting
to create it. This avoids unnecessary PUT requests when the bucket
already exists on the remote endpoint.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Add per-bucket cloud tier targeting via new options target_by_bucket
and target_by_bucket_prefix, and use them in transition/restore to
derive the destination bucket name
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Alex Ainscow [Wed, 18 Mar 2026 09:22:26 +0000 (09:22 +0000)]
osd: PGLog Attach correct version to missing list when ignoring log entries
A previous fix for PR 66698 fixed an issue where log entries associated with
partial writes were being processed incorrectly (see that PR and associated
tracker for details). The fix was to ignore log entries that should not have
been present on the non-primary shard.
The problem with that approach is that in a more complex scenario, where the
log contained a partial write, followed by a full write AND the shard is
backfilling, then the missing list was being given the version prior to the
full write, rather than prior to the clone.
Our fix here corrects how the missing list version is calculated.
See the associated tracker for instructions on how to recreate
Fixes: https://tracker.ceph.com/issues/75211 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Alex Ainscow [Mon, 27 Apr 2026 13:24:45 +0000 (14:24 +0100)]
osd/test: Add EC peering test infrastructure and recovery test cases
This commit enhances the EC peering test framework and adds test cases
for erasure-coded pool recovery scenarios:
NOTE: Many of the tests cases are disabled as they recreate certain
problems. Later commits will enable these tests and fix the production
issues, but under different PRs.
Test Infrastructure Improvements:
- Add MockStore wrapper with read error injection capabilities for testing
error handling in EC recovery
- Enhance ECPeeringTestFixture with recovery callback verification
- Add support for pg_upmap to better simulate OSD placement
- Implement write_attribute() for testing partial vs full stripe writes
- Add read_shard_object_info() to verify on-disk version consistency
- Improve logging with missing object stats (m=, u=, mbc=)
- Add support for doing object recovery in Fast EC.
- Add set_config() helper for runtime configuration changes
- Preserve xinfo features when marking OSDs up/down
- Fix pg_temp handling for EC pools with optimizations
Patrick Donnelly [Tue, 28 Apr 2026 22:25:44 +0000 (15:25 -0700)]
doc/start/os-recommendations: update for Umbrella and future releases
Overhaul the OS recommendations documentation to reflect deployment
practices and map out the support matrices for upcoming releases through
Ceph X (24.x).
Key changes include:
* Emphasized container-based deployments: Added a new section strongly
recommending containerized deployments via `cephadm` over legacy
package-based installations to simplify upgrades and avoid host-level
dependency conflicts.
* Expanded support tables: Updated the Platforms and Container Hosts
tables to include Umbrella (21.x), Vampire (22.x), W (23.x), and
X (24.x). Removed EOL releases like Reef.
* Added EOL visibility: Included End-of-Life dates for Linux
distributions and anticipated EOL dates for Ceph releases to help
administrators plan lifecycle events.
* Updated OS targets: Added support tracking for Ubuntu 24.04 (Noble),
Ubuntu 26.04, Ubuntu 28.04, Rocky Linux 10, and Rocky Linux 11.
* Addressed CentOS transition: Added a warning that CentOS 10+ will no
longer be built or tested by upstream. Documented that Rocky Linux 10
is the new default container base image for Umbrella, while clarifying
that the bare-metal host OS can remain any supported distribution.
* Added horizontal upgrade guidance: Introduced a new section outlining
safe "horizontal" bare-metal OS upgrade paths (e.g., CentOS 9 to
Rocky 10, Ubuntu 22.04 to 24.04) so users can safely migrate their
nodes outside of Ceph version upgrade windows.
AI-Assisted: Gemini Pro, through numerous prompts Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
rgw/multisite: log concurrency state transitions in adj_concurrency
Replace the timer-based "OSD cluster is overloaded" warning with
state-transition logging. Also, log when concurrency is halved and
eventually recovered.