Edwin Rodriguez [Thu, 26 Feb 2026 21:35:37 +0000 (16:35 -0500)]
Introduce lockstat utility for mutex and lock performance profiling
Add a utility and framework for profiling mutex and locks within Ceph. Includes classes for tracking lock contention, wait times, and histograms of lock durations. Adds tests to validate histogram functionality and lockstat recording.
Ramana Raja [Mon, 29 Dec 2025 22:17:28 +0000 (17:17 -0500)]
mgr/rbd_support: Stagger mirror snapshot and trash purge schedules
Previously, multiple images or namespaces scheduled with the same
interval ran mirror snapshots or trash purges at around the same time,
creating spikes in cluster activity.
This change staggers scheduled jobs by:
- Adding a deterministic phase offset per image or namespace when no
start-time is set.
- Picking a random element from the queue at each scheduled time, rather
than always the first.
Together, these changes spread snapshot and trash purge operations more
evenly over time and improve cluster stability.
Fixes: https://tracker.ceph.com/issues/74288 Signed-off-by: Ramana Raja <rraja@redhat.com>
Kotresh HR [Wed, 18 Feb 2026 10:48:51 +0000 (16:18 +0530)]
mgr/mirroring: json pretty formatting
The 'daemon status' and 'peer_list' command
out don't support json-pretty format and isn't reader
friendly. This patch adds support of 'json-pretty'
when format='json-pretty' is passed.
Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}
In this scenario all commands that are invoked from the loop body
are going to time out anyway.
Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected
In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:
In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).
Victoria Mackie [Fri, 13 Feb 2026 21:40:01 +0000 (21:40 +0000)]
dashboard: add location field to NVMeoF namespace and gateway group APIs
Namespace location:
- Add location field to Namespace model in nvmeof.py
- Add location parameter to PATCH /api/nvmeof/subsystem/{nqn}/namespace/{nsid}
- Location can now be retrieved via GET and set via PATCH
Gateway group locations:
- Add locations array to gateway group endpoint response
- Extract locations from all gateways in a service group
- Add _get_gateway_locations() helper method using nvme-gw show command
- Locations appear in placement.locations for each service
Signed-off-by: Victoria Mackie <victoriam@uk.ibm.com>
```
283/322 Test #301: run-tox-qa ................................***Failed 92.31 sec
...
flake8: install_deps /ceph/qa> python -I -m pip install flake8
flake8: commands[0] /ceph/qa> flake8 --select=F,E9 --exclude=venv,.tox
./tasks/keycloak.py:51:5: F841 local variable 'os_version' is assigned to but never used
```
Remove the unused os_version assignment to fix flake8 F841 in run-tox-qa.
Ville Ojamo [Mon, 2 Mar 2026 08:26:47 +0000 (15:26 +0700)]
doc/start: Update and fix get-involved.rst
Remove not existing Planet Ceph, Wiki, Commit List rows.
Update Kernel Client, QA, Community mailing list links to working ones.
Use https instead of http.
Fix Ceph calendar link and split the old contribute guide link to a
separate table row.
Remove not working lists.ceph.com external link definition now that it
is unused.
Sort external link definitions in order of use.
Fix invalid space after a hyphen by rewrapping text.
Update Slack invite link.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Shraddha Agrawal [Wed, 25 Feb 2026 11:31:26 +0000 (17:01 +0530)]
doc: reformat crimson docs
This commit rearranges crimson docs so the deployment steps are inorder
to how they are supposed to be executed. Also, it removed `crimson-osd`
referneces as that is an internal detail that users don't need to be
aware of.
Seastar's escaped_string wrapper no longer implicitly converts to a string type.
Update dump_metric_value to call .value() to access the underlying string.
cmake: explicitly support and enable vptr sanitizer
As of Clang 21, -fsanitize=undefined no longer implies vptr.
This adds vptr as an explicit component and provides the necessary compile options to satisfy Seastar's build requirements.
See: https://github.com/scylladb/seastar/commit/c1060ea7d4676df23ce62af96ef2daa768f5de8a for more details
Ville Ojamo [Mon, 2 Mar 2026 07:00:45 +0000 (14:00 +0700)]
doc: Remove markup from CLI example commands
Remove all the remaining stars in the preformatted block CLI example
commands.
Several example CLI commands had stars around placeholder text in
addition to the standard angle brackets. The stars could make sense in
man pages or in text to make the placeholder italic but it will just be
rendered as-is in preformatted text blocks.
Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>
Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet
Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.
Currently, we use parallel_for_each when splitting a PG into multiple children.
In BlueStore's path, the sequential processing means the last split_collection call
(highest bits child) sets the final parent bits.
In SeaStore with parallel_for_each, separate transactions race, and a lower-bits child can overwrite the correct value.
With incorrect split_bits, the parent PG's list_objects computes a wider hash range than it should,
causing do_delete_work to list and delete objects that actually belong to a child PG.
When that child later tries to access those objects, SeaStore hits an unexpected ENOENT and aborts.
crimson/osd: fix PG splitting logic during map gaps
When the OSD advances through a range of maps (map gap), it must
ensure that every epoch transition is evaluated for potential
PG splits.
This patch updates the split check to use the current OSDMap epoch
from the PG as the baseline for each step. This guarantees that
splits are checked consecutively for every map in the sequence,
preventing intermediate splits from being missed during map gap.