Shubha Jain [Mon, 9 Mar 2026 07:11:14 +0000 (12:41 +0530)]
mgr/cephadm: fix upgrade order validation when using daemon_types with hosts
When both daemon_types and hosts filters are provided to
`ceph orch upgrade start`, the validation logic in
`_validate_upgrade_filters()` only checked earlier daemon
types on hosts outside the target host set.
This caused a bug where earlier daemon types running on the
target hosts were ignored, allowing upgrades to proceed out
of order. For example, a crash daemon upgrade could start on
a host even when mon daemons on that same host were still on
an older version.
This patch fixes the validation by checking earlier daemon
types on both:
* daemons running on the same hosts
* daemons running on other hosts
This ensures upgrade order enforcement remains correct when
host filters are applied.
sestore/omap_manager/btree: prevent heap buffer overflow in log
This commit fixes a heap overflow in omap_btree_node_impl when
logging the full bufferlist. This issue was already tracked in
https://tracker.ceph.com/issues/71524. To prevent this from happening,
we log the length of the bufferlist instead of the full log.
osd/scrub: auto-correct accounting-only stat mismatches
When scrub detects a PG stats mismatch but no object-level
inconsistencies (all replicas agree on actual data), fix the
stats in place rather than reporting a scrub error.
Previously, a pure stat mismatch would log [ERR], increment
shallow_errors, and trigger OSD_SCRUB_ERRORS / PG_STATE_INCONSISTENT
health alerts — yet leave the stats unfixed unless a repair
scrub was manually initiated. The scrubber's own object count
is authoritative in this case.
Persistence of the corrected stats is deferred until the next
transaction that sets dirty_info, consistent with the existing
stats_invalid repair path.
Matthew N. Heler [Mon, 20 Apr 2026 21:25:47 +0000 (16:25 -0500)]
rgw/cloud-transition: yield in cloud_tier_bucket_exists HEAD
The HEAD request used null_yield, so every attempt (including the
retries added by retry_on_busy) blocked the LC worker thread for
the full HTTP timeout instead of yielding.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
rgw/cloud-transition: check bucket existence before create
Add HEAD request to check if target bucket exists before attempting
to create it. This avoids unnecessary PUT requests when the bucket
already exists on the remote endpoint.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Add per-bucket cloud tier targeting via new options target_by_bucket
and target_by_bucket_prefix, and use them in transition/restore to
derive the destination bucket name
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Alex Ainscow [Wed, 18 Mar 2026 09:22:26 +0000 (09:22 +0000)]
osd: PGLog Attach correct version to missing list when ignoring log entries
A previous fix for PR 66698 fixed an issue where log entries associated with
partial writes were being processed incorrectly (see that PR and associated
tracker for details). The fix was to ignore log entries that should not have
been present on the non-primary shard.
The problem with that approach is that in a more complex scenario, where the
log contained a partial write, followed by a full write AND the shard is
backfilling, then the missing list was being given the version prior to the
full write, rather than prior to the clone.
Our fix here corrects how the missing list version is calculated.
See the associated tracker for instructions on how to recreate
Fixes: https://tracker.ceph.com/issues/75211 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Alex Ainscow [Mon, 27 Apr 2026 13:24:45 +0000 (14:24 +0100)]
osd/test: Add EC peering test infrastructure and recovery test cases
This commit enhances the EC peering test framework and adds test cases
for erasure-coded pool recovery scenarios:
NOTE: Many of the tests cases are disabled as they recreate certain
problems. Later commits will enable these tests and fix the production
issues, but under different PRs.
Test Infrastructure Improvements:
- Add MockStore wrapper with read error injection capabilities for testing
error handling in EC recovery
- Enhance ECPeeringTestFixture with recovery callback verification
- Add support for pg_upmap to better simulate OSD placement
- Implement write_attribute() for testing partial vs full stripe writes
- Add read_shard_object_info() to verify on-disk version consistency
- Improve logging with missing object stats (m=, u=, mbc=)
- Add support for doing object recovery in Fast EC.
- Add set_config() helper for runtime configuration changes
- Preserve xinfo features when marking OSDs up/down
- Fix pg_temp handling for EC pools with optimizations
Patrick Donnelly [Tue, 28 Apr 2026 22:25:44 +0000 (15:25 -0700)]
doc/start/os-recommendations: update for Umbrella and future releases
Overhaul the OS recommendations documentation to reflect deployment
practices and map out the support matrices for upcoming releases through
Ceph X (24.x).
Key changes include:
* Emphasized container-based deployments: Added a new section strongly
recommending containerized deployments via `cephadm` over legacy
package-based installations to simplify upgrades and avoid host-level
dependency conflicts.
* Expanded support tables: Updated the Platforms and Container Hosts
tables to include Umbrella (21.x), Vampire (22.x), W (23.x), and
X (24.x). Removed EOL releases like Reef.
* Added EOL visibility: Included End-of-Life dates for Linux
distributions and anticipated EOL dates for Ceph releases to help
administrators plan lifecycle events.
* Updated OS targets: Added support tracking for Ubuntu 24.04 (Noble),
Ubuntu 26.04, Ubuntu 28.04, Rocky Linux 10, and Rocky Linux 11.
* Addressed CentOS transition: Added a warning that CentOS 10+ will no
longer be built or tested by upstream. Documented that Rocky Linux 10
is the new default container base image for Umbrella, while clarifying
that the bare-metal host OS can remain any supported distribution.
* Added horizontal upgrade guidance: Introduced a new section outlining
safe "horizontal" bare-metal OS upgrade paths (e.g., CentOS 9 to
Rocky 10, Ubuntu 22.04 to 24.04) so users can safely migrate their
nodes outside of Ceph version upgrade windows.
AI-Assisted: Gemini Pro, through numerous prompts Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
rgw/multisite: log concurrency state transitions in adj_concurrency
Replace the timer-based "OSD cluster is overloaded" warning with
state-transition logging. Also, log when concurrency is halved and
eventually recovered.
Since we modified crushtool cli commands, we need to also update
its test with new flag: --show-retry-exhaustion
and also the modified --show-choose-tries option
Also added /src/script/run-cli-tests.sh to run the cram test
easily without having the config headache
The redeploy handler had no boolean "force" parameter, so the CLI could
bind --force to the optional image argument. Pass force through to
daemon_action, validate container image ref in cephadm, and guard
against --force being captured as the image in the CLI.
rgw/multisite: fix uninitialized LatencyMonitor average and use exponentially weighted moving average
LatencyMonitor::total was declared without an initializer. Since
std::chrono::duration's default constructor leaves the value indeterminate,
the very first add_latency() call adds a real sample to garbage, producing a
huge average that immediately triggers the "OSD cluster is overloaded" warning
within seconds of RGW startup, before any actual slow ops occur.
Additionally, the old implementation uses a naive lifetime average
(total/count) that could slow the recovery from a transient slow-ops
episode. Once poisoned, the average stayed high for a long time,
keeping the throttling sync concurrency to 1.
So, also replace the naive lifetime average in LatencyMonitor with an
exponentially weighted moving average (alpha=0.15). With the weighted average,
after a series of normal lock operations a past spike's influence decays faster,
allowing concurrency to recover without an RGW restart.
rgw/multisite: expose lock latency as perf counter for data sync
Add a "lock_latency" perf counter to the per-zone data sync counter.
This tracks the latency of RADOS lock/unlock operations in
RGWContinuousLeaseCR, giving operators visibility into the values
driving the LatencyConcurrencyControl.
The new perf counter can be queried via the admin socket:
ceph daemon <asok> perf dump data-sync-from-<zone>
and reset independently:
ceph daemon <asok> perf reset data-sync-from-<zone>
This would allow us to distinguish a poisoned average from ongoing
OSD latency issues without restarting the RGW process.
ceph-volume: make TPM2 PCR policy configurable (default to PCR 7)
tpm enrollment for dmcrypt OSDs is hardcoded to systemd-cryptenroll
--tpm2-pcrs 9+12 which ties the LUKS key to initrd and kernel
command line measurements, which is brittle on RHEL image mode
systems: after a bootc switch, the kernel, initrd, or cmdline often
change, the PCRs move, and the volume won't unlock until you re-enroll
or fall back to another key.
typical error:
```
Apr 27 14:17:25 ceph-jx5fq20u bash[4289]: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /usr/lib/systemd/systemd-cryptsetup attach M3zE7r-qsGZ-xs0T-610d-SJNZ-U89x-J0cJq8 /dev/ceph-cac05fb6-51d3-4a60-9fc1-4958c568b433/osd-block-b1a495a0-e1a4-4888-baf9-7990f45f1e56 - tpm2-device=auto,discard,headless=true,nofail
Apr 27 14:17:26 ceph-jx5fq20u ceph-e5520e2c-420d-11f1-a7b9-5254001191fb-osd-0-activate[4300]: stderr: Failed to unseal secret using TPM2: Operation not permitted
Apr 27 14:17:26 ceph-jx5fq20u bash[4289]: stderr: Failed to unseal secret using TPM2: Operation not permitted
```
The patch makes the PCR set configurable and defaults to 7 so bootc style
deployments behave correctly.
mgr/dashboard: Update permissions for pool-manager role
Fixes https://tracker.ceph.com/issues/76307
- says denied access when clicked on create pool table action
- this was happening due to the failing monitor API added for stretch cluster configuration
- also updates overview nav permissions
ceph-volume: raw activate should ignore lvm backed OSD devices
the generic activate (`ceph-volume activate`) runs the
raw path before LVM. Raw.activate was walking lsblk / raw
list entries and could hit block devices that are actually
logical volumes from `ceph-volume lvm prepare` or `lvm batch`
(with ceph lvm tags on the lv).
That made raw activation poke at LVM backed OSDs instead of
leaving it to `lvm activate`.
with this commit ceph-volume now builds the set of LV paths
that carry those tags once (`lvs` via ceph_volume_lvm_prepare_lv_paths)
and skip any candidate path that matches, so only real raw
OSDs go through the 'raw activate path'.
Also, we now pass `with_tpm` through luks_open() calls for db and
wal so encrypted metadata uses the same systemd-cryptsetup path
as the block LV when ceph.with_tpm is set.
Matthew N. Heler [Thu, 26 Feb 2026 01:03:56 +0000 (19:03 -0600)]
rgw: add RestoreStatus support to object listings
S3 clients can request restore status in listing responses through the
x-amz-optional-object-attributes header, but we had no support for it.
This stores the restore state in the bucket index so listings can
include <RestoreStatus> without having to read each object's attrs
individually.
Signed-off-by: Matthew N. Heler <matthew.heler@hotmail.com>
Add script to test for CRUSH retry exhaustion in stretch mode with
2 datacenters. Tests unbiased stretch rules by running multiple
iterations of PG mappings and checking for collisions that exceed
the 50-try limit.
Also add --show-retry-exhaustion flag to crushtool to detect and
report when CRUSH mapping hits the maximum retry limit.
mgr/cephadm: replace md5_hash with FIPS-safe config_hash
Replace md5_hash() usages in cephadm dependency hashing with an
algorithm-agnostic config_hash() helper. config_hash() is backed by
SHA-256, making dependency hash generation unconditionally FIPS-safe
while preserving change-detection behavior.
Ville Ojamo [Wed, 22 Apr 2026 06:51:34 +0000 (13:51 +0700)]
doc/rados: improve troubleshooting-mon.rst
Don't ceph tell mon_status and then claim it passes the help command.
Improve language and link to cephadm doc on asok usage. Add label and
note about accessing asok from the host in troubleshooting.rst.
Capitalize and use double backticks consistently.
Add some missing articles and other minor word changes.
Fix indentation.
Use ref and link definitions consistently, use automatic bold.
Use privileged prompts for CLI commands where necessary.
Remove spaces at end of lines and change tabs to four spaces.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Afreen Misbah [Fri, 27 Mar 2026 16:06:38 +0000 (21:36 +0530)]
mgr/dashboard: Add gray10 theme base color to all pages
- applies #f4f4f4 - $background to all pages as base page
- earlier the base color of page was white
- also updates tabs/navs/tables css to adapt
- some fixes of spacings in alerts tabs, nvmeof
Afreen Misbah [Thu, 26 Mar 2026 13:25:18 +0000 (18:55 +0530)]
mgr/dashboard: Remove tooltip and popover defaults
Fixes https://tracker.ceph.com/issues/75410
These defaults are not required as carbon adds blackish color to tooltips and moving forward we want to align to CDS.
If anything breaks then add / fix in the used component
The objectstore tool tests restart the OSDs without allowing enough
time for GC to run, which can lead to no-OOL-segments conditions on restart. This
adds a gc_before_restart option to the test config, which when set
to true will run crimson-objectstore-tool --op gc on each OSD
before restarting them.
crimson/tools/objectstore: add GC operation to crimson-objectstore-tool
This adds a GC operation to the crimson-objectstore-tool, allowing
us to trigger GC cycles on demand during testing. This will
help reduce segment pressure and avoid 'no-segments' conditions.