]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
46 hours agotest: add tests for subvolume utilization metrics
Igor Golikov [Mon, 8 Dec 2025 11:40:05 +0000 (11:40 +0000)]
test: add tests for subvolume utilization metrics

Add comperehensive tests to validate correct quota and current size
metrics for subvolumes

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
46 hours agotest: add test for new MDS perf metrics
Igor Golikov [Mon, 8 Dec 2025 11:38:54 +0000 (11:38 +0000)]
test: add test for new MDS perf metrics

test for CPU utilizationa and number of open requests

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/73700
46 hours agomgr: add subvolume quota metrics to the manager
Igor Golikov [Mon, 8 Dec 2025 11:37:41 +0000 (11:37 +0000)]
mgr: add subvolume quota metrics to the manager

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
46 hours agomds: add new perf and subvolume utilization metrics
Igor Golikov [Mon, 8 Dec 2025 10:43:38 +0000 (10:43 +0000)]
mds: add new perf and subvolume utilization metrics

Perf merics: CPU% and number of open requests
Subvolume utilization metrics: quota info and current size

Signed-off-by: Igor Golikov <igolikov@redhat.com>
Fixes: https://tracker.ceph.com/issues/74135
Fixes: https://tracker.ceph.com/issues/73700
24 hours agoMerge pull request #66572 from kotreshhr/mirror-multithreaded
Venky Shankar [Wed, 25 Feb 2026 12:24:30 +0000 (17:54 +0530)]
Merge pull request #66572 from kotreshhr/mirror-multithreaded

tools/cephfs_mirror: Multi-threaded Mirroring

Reviewed-by: Venky Shankar <vshankar@redhat.com>
25 hours agoMerge pull request #67509 from bill-scales/peeringstate2
Bill Scales [Wed, 25 Feb 2026 11:36:18 +0000 (11:36 +0000)]
Merge pull request #67509 from bill-scales/peeringstate2

test: Fix unittest_peeringstate as fix for 74218 has merged

Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
28 hours agoMerge pull request #67387 from yuvalif/wi-yuval-bucket-notifications
Yuval Lifshitz [Wed, 25 Feb 2026 08:51:53 +0000 (10:51 +0200)]
Merge pull request #67387 from yuvalif/wi-yuval-bucket-notifications

test/rgw/notification: remove depracated dependencies

29 hours agotest: Fix unittest_peeringstate as 74218 has merged
Bill Scales [Wed, 25 Feb 2026 07:51:54 +0000 (07:51 +0000)]
test: Fix unittest_peeringstate as 74218 has merged

Fix merge conflict between unittest_peeringstate
(commit 87e3334de7bda6ac90f43ab1d9a7c6359dd10d35)
and fix for issue 74218 by fixing test case to
expect the new behavior.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
29 hours agoMerge pull request #66090 from rhcs-dashboard/73674-create-rule-certmgr
Aashish Sharma [Wed, 25 Feb 2026 07:31:58 +0000 (13:01 +0530)]
Merge pull request #66090 from rhcs-dashboard/73674-create-rule-certmgr

mgr/dashboard :  Add certmgr alerts and warings to Prometheus and dashboard

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
33 hours agoMerge pull request #67487 from rhcs-dashboard/cephfs-mirroring-entity-fix
Afreen Misbah [Wed, 25 Feb 2026 04:02:13 +0000 (09:32 +0530)]
Merge pull request #67487 from rhcs-dashboard/cephfs-mirroring-entity-fix

mgr/dashboard : Cephfs Mirroring Entity Deselect fix

Reviewed-by: pujaoshahu <pshahu@redhat.com>
35 hours agoMerge pull request #67328 from rhcs-dashboard/edit-namespace-task
Afreen Misbah [Wed, 25 Feb 2026 01:24:11 +0000 (06:54 +0530)]
Merge pull request #67328 from rhcs-dashboard/edit-namespace-task

mgr/dashboard: Nvmeof expand namespace size

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
36 hours agoMerge pull request #67347 from rhcs-dashboard/remove-initiators
Afreen Misbah [Wed, 25 Feb 2026 01:21:36 +0000 (06:51 +0530)]
Merge pull request #67347 from rhcs-dashboard/remove-initiators

mgr/dashboard: Add remove host in subsystem resource page

Reviewed-by: Afreen Misbah <afreen@ibm.com>
36 hours agoMerge pull request #67021 from rhcs-dashboard/74396-Generic-Performance-Chart
Afreen Misbah [Wed, 25 Feb 2026 01:19:57 +0000 (06:49 +0530)]
Merge pull request #67021 from rhcs-dashboard/74396-Generic-Performance-Chart

mgr/dashboard: Generic Performace Chart - Carbon

Reviewed-by: Afreen Misbah <afreen@ibm.com>
43 hours agoMerge pull request #67304 from bill-scales/unittest_peeringstate
Bill Scales [Tue, 24 Feb 2026 18:12:23 +0000 (18:12 +0000)]
Merge pull request #67304 from bill-scales/unittest_peeringstate

osd: Add unittests for PeeringState

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>
43 hours agoMerge pull request #66850 from tchaikov/wip-rgw-posix-driver-silence-asan-warning
Matt Benjamin [Tue, 24 Feb 2026 18:05:27 +0000 (13:05 -0500)]
Merge pull request #66850 from tchaikov/wip-rgw-posix-driver-silence-asan-warning

rgw/posix: add destructor to BucketCache to fix memory leaks

43 hours agomgr/dashboard: Nvmeof edit namespace size
Sagar Gopale [Thu, 12 Feb 2026 12:42:34 +0000 (18:12 +0530)]
mgr/dashboard: Nvmeof edit namespace size

Fixes: https://tracker.ceph.com/issues/74900
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>
43 hours agomgr/dashboard: Fix remove host in subsystem resource page
pujaoshahu [Fri, 13 Feb 2026 10:04:01 +0000 (15:34 +0530)]
mgr/dashboard: Fix remove host in subsystem resource page

Fixes: https://tracker.ceph.com/issues/74931
Signed-off-by: pujaoshahu <pshahu@redhat.com>
44 hours agoMerge pull request #67242 from rhcs-dashboard/create-namespace-task
Afreen Misbah [Tue, 24 Feb 2026 16:45:12 +0000 (22:15 +0530)]
Merge pull request #67242 from rhcs-dashboard/create-namespace-task

mgr/dashboard: Create-namespace-form

Reviewed-by: Afreen Misbah <afreen@ibm.com>
44 hours agoMerge pull request #67340 from rhcs-dashboard/subsytem-edit-host-key
Afreen Misbah [Tue, 24 Feb 2026 16:29:26 +0000 (21:59 +0530)]
Merge pull request #67340 from rhcs-dashboard/subsytem-edit-host-key

mgr/dashboard: Add nvmeof edit host key in subsystem resources page

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
45 hours agomgr/dashboard : Cephfs Mirroring Entity Deselect fix
Dnyaneshwari Talwekar [Tue, 24 Feb 2026 15:51:56 +0000 (21:21 +0530)]
mgr/dashboard : Cephfs Mirroring Entity Deselect fix

Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
Fixes: https://tracker.ceph.com/issues/75140
46 hours agoMerge pull request #64818 from avanthakkar/smb-rate-limit
John Mulligan [Tue, 24 Feb 2026 15:12:01 +0000 (10:12 -0500)]
Merge pull request #64818 from avanthakkar/smb-rate-limit

mgr/smb: add rate limiting support

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
47 hours agoMerge pull request #67473 from linuxbox2/wip-nfs-quota
Matt Benjamin [Tue, 24 Feb 2026 13:31:10 +0000 (08:31 -0500)]
Merge pull request #67473 from linuxbox2/wip-nfs-quota

rgw-nfs: run quota threads by default

2 days agoMerge pull request #66178 from linuxbox2/wip-restore-dir-attrs
anrao19 [Tue, 24 Feb 2026 12:11:29 +0000 (17:41 +0530)]
Merge pull request #66178 from linuxbox2/wip-restore-dir-attrs

rgw_file: restore ability of attributes to be...restored

2 days agoMerge pull request #66280 from suever/rgw-aws-eks-oidc
anrao19 [Tue, 24 Feb 2026 12:09:48 +0000 (17:39 +0530)]
Merge pull request #66280 from suever/rgw-aws-eks-oidc

rgw: Remove invalid Content-Type header from RGW OIDC discovery requests

2 days agoMerge pull request #65319 from beomseok-park/fix-efbig
anrao19 [Tue, 24 Feb 2026 12:05:29 +0000 (17:35 +0530)]
Merge pull request #65319 from beomseok-park/fix-efbig

rgw: treat -EFBIG as advance-and-retry in unordered listing

2 days agoMerge pull request #67331 from leonidc/ignore-gws-in-deleting-state
leonidc [Tue, 24 Feb 2026 11:40:55 +0000 (13:40 +0200)]
Merge pull request #67331 from leonidc/ignore-gws-in-deleting-state

nvmeofgw : ignore  beacons and send empty maps to  GWs in DELETING state

2 days agomgr/dashboard: create-namespace
Sagar Gopale [Fri, 6 Feb 2026 10:43:31 +0000 (16:13 +0530)]
mgr/dashboard: create-namespace

Fixes: https://tracker.ceph.com/issues/74826
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>
2 days agomgr/dashboard: Generic Performace Chart - Carbon
Devika Babrekar [Tue, 20 Jan 2026 06:16:33 +0000 (11:46 +0530)]
mgr/dashboard: Generic Performace Chart - Carbon
Fixes: https://tracker.ceph.com/issues/74396
Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
fix performance charts

mgr/dashboard: Generic Performance Chart - Area Chart Integration
Fixes:https://tracker.ceph.com/issues/74396
Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
add storage type view

mgr/dashboard:Performance Charts - alignment adjustments
fixes:https://tracker.ceph.com/issues/74396
Signed-off-by: Devika Babrekar <devika.babrekar@ibm.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/components/components.module.ts

2 days agoMerge pull request #66637 from Matan-B/wip-matanb-coroutine-repeat
Matan Breizman [Tue, 24 Feb 2026 10:25:41 +0000 (12:25 +0200)]
Merge pull request #66637 from Matan-B/wip-matanb-coroutine-repeat

test/crimson/test_crimson_coroutine: introduce interruptible repeat example

Reviewed-by: Samuel Just <sjust@redhat.com>
2 days agomgr/dashboard: Fix nvmeof edit host key in subsystem resources page
pujaoshahu [Fri, 13 Feb 2026 09:15:19 +0000 (14:45 +0530)]
mgr/dashboard: Fix nvmeof edit host key in subsystem resources page

Fixes: https://tracker.ceph.com/issues/74881
Signed-off-by: pujaoshahu <pshahu@redhat.com>
2 days agoMerge pull request #67474 from afreen23/health-card-hardware-tab
Afreen Misbah [Tue, 24 Feb 2026 09:59:06 +0000 (15:29 +0530)]
Merge pull request #67474 from afreen23/health-card-hardware-tab

mgr/dashboard: Health card hardware tab

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: pujaoshahu <pshahu@redhat.com>
2 days agoMerge pull request #67159 from rhcs-dashboard/subsystem-host-page
Afreen Misbah [Tue, 24 Feb 2026 09:51:42 +0000 (15:21 +0530)]
Merge pull request #67159 from rhcs-dashboard/subsystem-host-page

mgr/dashboard: NVMe – Fix host,listeners namespace list display on Subsystem resource page

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
2 days agoMerge pull request #67284 from knrt10/crimson-rgw-cls-get-config
Kautilya Tripathi [Tue, 24 Feb 2026 08:56:34 +0000 (14:26 +0530)]
Merge pull request #67284 from knrt10/crimson-rgw-cls-get-config

cls/rgw_gc: read config via cls_get_config

2 days agoMerge pull request #67467 from gbregman/main
Gil Bregman [Tue, 24 Feb 2026 06:40:07 +0000 (08:40 +0200)]
Merge pull request #67467 from gbregman/main

nvmeof: Change the NVMEOF image version to 1.7

2 days agoMerge pull request #66857 from rhcs-dashboard/cephfs-mirroring-entity
Dnyaneshwari Talwekar [Tue, 24 Feb 2026 06:05:56 +0000 (11:35 +0530)]
Merge pull request #66857 from rhcs-dashboard/cephfs-mirroring-entity

mgr/dashboard: Cephfs Mirroring - Entity

Reviewed-by: Dnyaneshwari talwekar <dtalweka@redhat.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Reviewed-by: Ankush Behl <cloudbehl@gmail.com>
2 days agomgr/dashboard: Add hardware tab to health card
Afreen Misbah [Mon, 23 Feb 2026 23:51:58 +0000 (05:21 +0530)]
mgr/dashboard: Add hardware tab to health card

Fixes https://tracker.ceph.com/issues/75120

Signed-off-by: Afreen Misbah <afreen@ibm.com>
2 days agorgw-nfs: run quota threads by default
Matt Benjamin [Mon, 23 Feb 2026 21:06:48 +0000 (16:06 -0500)]
rgw-nfs: run quota threads by default

Permits quota updates even if the only running rgw instances
are nfs gateways.

Fixes: https://tracker.ceph.com/issues/75118
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
2 days agomgr/dashboard: Added variations of alerts card sub total layout
Afreen Misbah [Mon, 23 Feb 2026 20:13:43 +0000 (01:43 +0530)]
mgr/dashboard: Added variations of alerts card sub total layout

- when health card's tab closed the layout is compact
- when health card's tab open the layout take space

Signed-off-by: Afreen Misbah <afreen@ibm.com>
2 days agomgr/dashboard: Css fixes for health card and alerts card
Afreen Misbah [Mon, 23 Feb 2026 19:33:15 +0000 (01:03 +0530)]
mgr/dashboard: Css fixes for health card and alerts card

Signed-off-by: Afreen Misbah <afreen@ibm.com>
2 days agoMerge pull request #67453 from baum/crimson-ceph-context-leak
baum [Mon, 23 Feb 2026 19:24:02 +0000 (21:24 +0200)]
Merge pull request #67453 from baum/crimson-ceph-context-leak

common: fix uninitialized nref in crimson CephContext

2 days agoMerge pull request #67379 from zdover23/wip-doc-2026-02-18-rados-config-mon-lookup-dns
Ilya Dryomov [Mon, 23 Feb 2026 19:18:02 +0000 (20:18 +0100)]
Merge pull request #67379 from zdover23/wip-doc-2026-02-18-rados-config-mon-lookup-dns

doc: update broken reference

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 days agoMerge pull request #67295 from kamoltat/wip-ksirivad-fix-74524
Radoslaw Zarzynski [Mon, 23 Feb 2026 18:46:28 +0000 (19:46 +0100)]
Merge pull request #67295 from kamoltat/wip-ksirivad-fix-74524

qa/standalone: improve reliability of osd-backfill tests

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2 days agoMerge pull request #66466 from smanjara/wip-fix-datasync-init
Shilpa Jagannath [Mon, 23 Feb 2026 18:24:34 +0000 (10:24 -0800)]
Merge pull request #66466 from smanjara/wip-fix-datasync-init

rgw/multisite: fix segfault during multisite startup

2 days agotest/rgw/notifications: increase the max number of boto client connections
Yuval Lifshitz [Fri, 20 Feb 2026 15:47:11 +0000 (15:47 +0000)]
test/rgw/notifications: increase the max number of boto client connections

this is to avoid this warning:
WARNING  urllib3.connectionpool:connectionpool.py:305 Connection pool is full, discarding connection: localhost

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
2 days agotest/rgw/notification: do not use netstat in the code
Yuval Lifshitz [Fri, 20 Feb 2026 15:41:14 +0000 (15:41 +0000)]
test/rgw/notification: do not use netstat in the code

* net-tools are deprecated in fedora and ubuntu
* using netstat -p (used to verify that the http server is listening on
  a port) requires root privilages, which may fail in some tests environments

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
2 days agotest/rgw/notifications: migrate from nose to pytest
Yuval Lifshitz [Thu, 19 Feb 2026 16:55:24 +0000 (16:55 +0000)]
test/rgw/notifications: migrate from nose to pytest

Fixes: https://tracker.ceph.com/issues/74573
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
Co-Authored-By: Claude <noreply@anthropic.com>
2 days agotest/rgw/notifications: fixes needed to run the tests in a multisite environment
Yuval Lifshitz [Wed, 18 Feb 2026 13:50:52 +0000 (13:50 +0000)]
test/rgw/notifications: fixes needed to run the tests in a multisite environment

the main issue was that a system user would get a JSON reply when
creating a bucket. the boto3 client is failing when this is happening.
so, the solution is to use a non-system user in the tests

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
2 days agotest/rgw: remove depracated boto dependency
Yuval Lifshitz [Wed, 18 Feb 2026 09:34:41 +0000 (09:34 +0000)]
test/rgw: remove depracated boto dependency

Fixes: https://tracker.ceph.com/issues/73663
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
Co-authored-by: Bob-Shell <bob-shell@ai-assistant>
2 days agotest/rgw/notifications: cleanup of tests
Yuval Lifshitz [Tue, 17 Feb 2026 14:30:43 +0000 (14:30 +0000)]
test/rgw/notifications: cleanup of tests

* remove dead code
* remove unnecessary text from test names used when we supported
  "pubsub" (pull mode)

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
2 days agofix for quorum in API
Afreen Misbah [Mon, 23 Feb 2026 10:23:13 +0000 (15:53 +0530)]
fix for quorum in API

Signed-off-by: Afreen Misbah <afreen@ibm.com>
2 days agomgr/dashboard: Add systems tab to health card
Afreen Misbah [Fri, 13 Feb 2026 23:14:46 +0000 (04:44 +0530)]
mgr/dashboard: Add systems tab to health card

Fixes https://tracker.ceph.com/issues/75065

Signed-off-by: Afreen Misbah <afreen@ibm.com>
2 days agoMerge pull request #67460 from afreen23/alerts-card
Afreen Misbah [Mon, 23 Feb 2026 15:52:41 +0000 (21:22 +0530)]
Merge pull request #67460 from afreen23/alerts-card

mgr/dashboard: Add alerts card

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>
2 days agoMerge PR #67135 into main
Patrick Donnelly [Mon, 23 Feb 2026 15:29:13 +0000 (10:29 -0500)]
Merge PR #67135 into main

* refs/pull/67135/head:
pybind: remove compile_time_env parameter from setup.py files
pybind/rados,rgw: replace Tempita errno checks with C preprocessor
pybind/cephfs: replace deprecated IF with C preprocessor macro

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 days agoMerge pull request #65947 from kchheda3/wipi-fix-lc-dm-delete
Krunal Chheda [Mon, 23 Feb 2026 15:21:21 +0000 (20:51 +0530)]
Merge pull request #65947 from kchheda3/wipi-fix-lc-dm-delete

rgw/lc: Do not delete DM if its at end of pagination list.

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2 days agoMerge pull request #65607 from kchheda3/wip-lc-skip-bucket
Krunal Chheda [Mon, 23 Feb 2026 15:20:33 +0000 (20:50 +0530)]
Merge pull request #65607 from kchheda3/wip-lc-skip-bucket

rgw/lc: Increase the timeout value while fetching the lc shard lock and update the logic on expired session

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2 days agomgr/dashboard : Add certmgr alerts and warings to Prometheus and dashboard
Abhishek Desai [Thu, 30 Oct 2025 04:40:27 +0000 (10:10 +0530)]
mgr/dashboard :  Add certmgr alerts and warings to Prometheus and dashboard
fixes : https://tracker.ceph.com/issues/73674
Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
New changes commit for certmgr alerts

2 days agomgr/dashboard: NVMe – Fix host,listeners namespace list display on Subsystem resource...
pujaoshahu [Mon, 2 Feb 2026 08:46:20 +0000 (14:16 +0530)]
mgr/dashboard: NVMe – Fix host,listeners namespace list display on Subsystem resource page

Fixes: https://tracker.ceph.com/issues/74697
Signed-off-by: pujaoshahu <pshahu@redhat.com>
 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/block.module.ts

Signed-off-by: pujaoshahu <pshahu@redhat.com>
2 days agoMerge pull request #67312 from ifed01/wip-ifed-fix-vselector-in-envmode_index_file
Igor Fedotov [Mon, 23 Feb 2026 14:40:20 +0000 (17:40 +0300)]
Merge pull request #67312 from ifed01/wip-ifed-fix-vselector-in-envmode_index_file

os/bluestore: fix vselector update after enveloped WAL recovery

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
2 days agoMerge pull request #67445 from cbodley/wip-mailmap-bluikko
Casey Bodley [Mon, 23 Feb 2026 14:05:55 +0000 (09:05 -0500)]
Merge pull request #67445 from cbodley/wip-mailmap-bluikko

mailmap: update email address for Ville Ojamo

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ville Ojamo <git2233+ceph@ojamo.eu>
2 days agoMerge pull request #67332 from anthonyeleven/docfix
Anthony D'Atri [Mon, 23 Feb 2026 14:03:40 +0000 (09:03 -0500)]
Merge pull request #67332 from anthonyeleven/docfix

doc/rados/operations: Improve formatting in crush-map.rst

2 days agoMerge pull request #67432 from kshtsk/wip-test-lua-ignore-tz
kyr [Mon, 23 Feb 2026 13:23:46 +0000 (14:23 +0100)]
Merge pull request #67432 from kshtsk/wip-test-lua-ignore-tz

test/rgw/lua: ignore hours for zero mtime

3 days agoMerge pull request #66108 from sseshasa/wip-rfe-implement-ok-to-upgrade-command
Sridhar Seshasayee [Mon, 23 Feb 2026 12:51:00 +0000 (18:21 +0530)]
Merge pull request #66108 from sseshasa/wip-rfe-implement-ok-to-upgrade-command

mgr/DaemonServer: Implement ok-to-upgrade command

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
3 days agoMerge pull request #66316 from kchheda3/wip-fix-parse-url-crash
Yuval Lifshitz [Mon, 23 Feb 2026 12:37:07 +0000 (14:37 +0200)]
Merge pull request #66316 from kchheda3/wip-fix-parse-url-crash

rgw/notification: Fix the crash in parse_url while initializing the regex

3 days agoMerge pull request #66065 from mertsunacoglu/wip-lua-abort
Yuval Lifshitz [Mon, 23 Feb 2026 12:35:46 +0000 (14:35 +0200)]
Merge pull request #66065 from mertsunacoglu/wip-lua-abort

 rgw: Add Lua functionality for blocking requests

3 days agocls/rgw_gc/cls_rgw_gc: read config via cls_get_config
Kautilya Tripathi [Tue, 10 Feb 2026 05:31:26 +0000 (11:01 +0530)]
cls/rgw_gc/cls_rgw_gc: read config via cls_get_config

Commit https://github.com/ceph/ceph/commit/3877c1e37f2fa4e1574b57f05132288f210835a7
added new way to let CLS gain access to global configuration (`g_ceph_context`).

`cls_rgw_gc_queue_init` method is not using the new CLS call of `cls_get_config`
but instead directly uses `g_ceph_context`.

Crimson OSD implementation does **not** support `g_ceph_context` which results in a (SIGSEGV)
crash due to null access. Switching to `cls_get_config`, similarly to `cls_rgw.cc`, would allow
both OSD implementations to access the conf safely.

The above approach is well-defined due to the two orthogonal implementations of objclass.cc.
Classical OSD uses `src/osd/objclass.cc` While Crimson OSD uses `src/crimson/osd/objclass.cc`.

Fixes: https://tracker.ceph.com/issues/74844
Signed-off-by: Kautilya Tripathi <kautilya.tripathi@ibm.com>
3 days agonvmeof: Change the NVMEOF image version to 1.7
Gil Bregman [Mon, 23 Feb 2026 10:56:54 +0000 (12:56 +0200)]
nvmeof: Change the NVMEOF image version to 1.7
Fixes: https://tracker.ceph.com/issues/75097
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
3 days agoMerge PR #65467 into main
Venky Shankar [Mon, 23 Feb 2026 10:24:10 +0000 (15:54 +0530)]
Merge PR #65467 into main

* refs/pull/65467/head:

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
3 days agoMerge PR #66475 into main
Venky Shankar [Mon, 23 Feb 2026 10:22:58 +0000 (15:52 +0530)]
Merge PR #66475 into main

* refs/pull/66475/head:

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
3 days agoMerge pull request #67442 from imran-imtiaz/wip-dashboard-schedule-level
Imran Imtiaz [Mon, 23 Feb 2026 10:22:31 +0000 (10:22 +0000)]
Merge pull request #67442 from imran-imtiaz/wip-dashboard-schedule-level

mgr/dashboard: add schedule_level to image API for pool/cluster snapshot schedule

3 days agomgr/dashboard: Add alerts card
Afreen Misbah [Sun, 22 Feb 2026 10:24:41 +0000 (15:54 +0530)]
mgr/dashboard: Add alerts card

Fixes https://tracker.ceph.com/issues/75066

Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 days agomgr/dashboard: Cephfs mirroring - Entity
Dnyaneshwari Talwekar [Fri, 9 Jan 2026 09:59:50 +0000 (15:29 +0530)]
mgr/dashboard: Cephfs mirroring - Entity

Fixes: https://tracker.ceph.com/issues/74366
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
3 days agoMerge pull request #66981 from rhcs-dashboard/namespace-list-delete
Afreen Misbah [Mon, 23 Feb 2026 07:16:23 +0000 (12:46 +0530)]
Merge pull request #66981 from rhcs-dashboard/namespace-list-delete

mgr/dashboard: Add nvmeof namespace list and delete modal

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
3 days agoMerge pull request #67360 from rhcs-dashboard/revamp-onboarding-screen
Afreen Misbah [Mon, 23 Feb 2026 07:14:53 +0000 (12:44 +0530)]
Merge pull request #67360 from rhcs-dashboard/revamp-onboarding-screen

mgr/dashboard:revamp on-boarding screen

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
3 days agomgr/DaemonServer: Re-order OSDs in crush bucket to maximize OSDs for upgrade
Sridhar Seshasayee [Thu, 12 Feb 2026 20:03:25 +0000 (01:33 +0530)]
mgr/DaemonServer: Re-order OSDs in crush bucket to maximize OSDs for upgrade

DaemonServer::_maximize_ok_to_upgrade_set() attempts to find which OSDs
from the initial set found as part of _populate_crush_bucket_osds() can be
upgraded as part of the initial phase. If the initial set results in failure,
the convergence logic trims the 'to_upgrade' vector from the end until a safe
set is found.

Therefore, it would be advantageous to sort the OSDs by the ascending number
of PGs hosted by the OSDs. By placing OSDs with smallest (or no PGs) at the
beginning of the vector, the trim logic along with _check_offlines_pgs() will
have the best chance of finding OSDs to upgrade as it approaches a grouping
of OSDs that have the smallest or no PGs.

To achieve the above, a temporary vector of struct pgs_per_osd is created and
sorted for a given crush bucket. The sorted OSDs are pushed to the main
crush_bucket_osds that is eventually used to run the _check_offlines_pgs()
logic to find a safe set of OSDs to upgrade.

pgmap is passed to _populate_crush_bucket_osds() to utilize get_num_pg_by_osd()
for the above logic to work.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 days agomgr/DaemonServer: Implement ok-to-upgrade command
Sridhar Seshasayee [Mon, 27 Oct 2025 16:34:54 +0000 (22:04 +0530)]
mgr/DaemonServer: Implement ok-to-upgrade command

Implement a new Mgr command called 'ok-to-upgrade' that returns a set of OSDs
within the provided CRUSH bucket that are safe to upgrade without reducing
immediate data availability.

The command accepts the following as input:
 - CRUSH bucket name (required)
   - The CRUSH bucket type is limited to 'rack', 'chassis', 'host' and 'osd'.
     This is to prevent users from specifying a bucket type higher up the tree
     which could result in performance issues if the number of OSDs in the
     bucket is very high.
 - The new Ceph version to check against. The format accepted is the short
   form of the Ceph version, for e.g. 20.3.0-3803-g63ca1ffb5a2. (required)
 - The maximum number of OSDs to consider if specified. (optional)

Implementation Details:

After sanity checks on the provided parameters, the following steps are
performed:

1. The set of OSDs within the CRUSH bucket is first determined.
2. From the main set of OSDs, a filtered set of OSDs not yet running the new
   Ceph version is created.
   - For this purpose, the OSD's 'ceph_version_short' string is read from
     the metadata. For this purpose a new method called
     DaemonServer::get_osd_metadata() is used. The information is determined
     from the DaemonStatePtr maintained within the DaemonServer.
3. If all OSDs are already running the new Ceph version, a success report is
   generated and returned.
4. If OSDs are not running the new Ceph version, a new set (to_upgrade) is
   created.
5. If the current version cannot be determined, an error is logged and the
   output report with 'bad_no_version' field populated with the OSD in question
   is generated.
6. On the new set (to_upgrade), the existing logic in _check_offline_pgs() is
   executed to see if stopping any or all OSDs in the set as part of the upgrade
   can reduce immediate data availability.
   - If data availability is impacted, then the number of OSDs in the filtered
     set is reduced by a factor defined by a new config option called
     'mgr_osd_upgrade_check_convergence_factor' which is set to 0.8 by default.
   - The logic in _check_offline_pgs() is repeated for the new set.
   - The above is repeated until a safe subset of OSDs that can be stopped for
     upgrade is found. Each iteration reduces the number of OSDs to check by
     the convergence factor mentioned above.
7. It must be noted that the default value of
   'mgr_osd_upgrade_check_convergence_factor' is on the higher side in order to
   help determine an optimal set of OSDs to upgrade. In other words, a higher
   convergence factor would help maximize the number of OSDs to upgrade. In this
   case, the number of iterations and therefore the time taken to determine the
   OSDs to upgrade is proportional to the number of OSDs in the CRUSH bucket.
   The converse is true if a lower convergence factor is used.
8. If the number of OSDs determined is lower than the 'max' specified, then an
   additional loop is executed to determine if other children of the CRUSH
   bucket can be added to the existing set.
9. Once a viable set is determined, an output report similar to the following is
   generated:

A standalone test is introduced that exercises the logic for both replicated
and erasure-coded pools by manipulating the min_size for a pool and check for
upgradability. The tests also performs other basic sanity checks and error
conditions.

The output shown below is for a cluster running on a single node with 10 OSDs
and with replicated pool configuration:

$ ceph osd ok-to-upgrade incerta06 01.00.00-gversion-test --format=json
{"ok_to_upgrade":true,"all_osds_upgraded":false,\
 "osds_in_crush_bucket":[0,1,2,3,4,5,6,7,8,9],\
 "osds_ok_to_upgrade":[0],"osds_upgraded":[],"bad_no_version":[]}

The following report is shown if all OSDs are running the desired Ceph version:

$ ceph osd ok-to-upgrade --crush_bucket  localrack \
  --ceph_version 20.3.0-3803-g63ca1ffb5a2
{"ok_to_upgrade":false,"all_osds_upgraded":true,\
 "osds_in_crush_bucket":[0,1,2,3,4,5,6,7,8,9],"osds_ok_to_upgrade":[],\
"osds_upgraded":[0,1,2,3,4,5,6,7,8,9],"bad_no_version":[]}'

Fixes: https://tracker.ceph.com/issues/73031
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 days agomgr/DaemonServer: Modify offline_pg_report to handle set or vector types
Sridhar Seshasayee [Mon, 2 Feb 2026 08:44:15 +0000 (14:14 +0530)]
mgr/DaemonServer: Modify offline_pg_report to handle set or vector types

The offline_pg_report structure to be used by both the 'ok-to-stop' and
'ok-to-upgrade' commands is modified to handle either std::set or std::vector
type containers. This is necessitated due to the differences in the way
both commands work. For the 'ok-to-upgrade' command logic to work optimally,
the items in the specified crush bucket including items found in the subtree
must be strictly ordered. The earlier std::set container re-orders the items
upon insertion by sorting the items which results in the offline pg check to
report sub-optimal results.

Therefore, the offline_pg_report struct is modified to use
std::variant<std::vector<int>, std::set<int>> as a ContainerType and handled
accordingly in dump() using std::visit(). This ensures backward compatibility
with the existing 'ok-to-stop' command while catering to the requirements of
the new 'ok-to-upgrade' command.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 days agoMerge pull request #67386 from afreen23/health-checks
Afreen Misbah [Mon, 23 Feb 2026 07:09:40 +0000 (12:39 +0530)]
Merge pull request #67386 from afreen23/health-checks

mgr/dashboard: Add health check panel

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>
3 days agoMerge pull request #67330 from gadididi/nvmeof/add_rados_ns
Gadi [Mon, 23 Feb 2026 06:54:28 +0000 (08:54 +0200)]
Merge pull request #67330 from gadididi/nvmeof/add_rados_ns

mgr/dashboard: Adding RADOS namespace option into add_ns_req

3 days agoqa/workunits/smb: add tests for rate limiting
Avan Thakkar [Mon, 2 Feb 2026 07:52:47 +0000 (13:22 +0530)]
qa/workunits/smb: add tests for rate limiting

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
3 days agodoc/mgr/smb: add doc for QoS support for CephFS-backed SMB shares
Avan Thakkar [Mon, 4 Aug 2025 14:44:53 +0000 (20:14 +0530)]
doc/mgr/smb: add doc for QoS support for CephFS-backed SMB shares

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
3 days agomgr/smb: add test coverage for rate-limiting
Avan Thakkar [Mon, 4 Aug 2025 17:41:36 +0000 (23:11 +0530)]
mgr/smb: add test coverage for rate-limiting

Add comprehensive QoS test coverage including:
  * Basic QoS configuration application
  * QoS updates
  * QoS removal
  * QoS delay_max

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
3 days agomgr/smb: add rate limiting support
Avan Thakkar [Thu, 31 Jul 2025 14:47:03 +0000 (20:17 +0530)]
mgr/smb: add rate limiting support

Introduce a new optional `qos` component under the `cephfs` block
of the Share resource to configure rate limiting options per SMB share.

The new structure supports:
- read_iops_limit
- write_iops_limit
- read_bw_limit
- write_bw_limit
- read_delay_max
- write_delay_max

A new CLI command is added:
  `ceph smb share update cephfs qos <cluster> <share> [options]`

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
3 days agomgr/dashboard: Fix nvmeof namespace list and delete modal
pujaoshahu [Tue, 20 Jan 2026 06:14:44 +0000 (11:44 +0530)]
mgr/dashboard: Fix nvmeof namespace list and delete modal

Fixes: https://tracker.ceph.com/issues/74451
Signed-off-by: pujaoshahu <pshahu@redhat.com>
 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.ts

Signed-off-by: pujaoshahu <pshahu@redhat.com>
3 days agoMerge pull request #66575 from Tom-Sollers/ceph-pg-repeer-test
SrinivasaBharathKanta [Mon, 23 Feb 2026 02:04:43 +0000 (07:34 +0530)]
Merge pull request #66575 from Tom-Sollers/ceph-pg-repeer-test

qa/standalone: Add a test for running repeer on simple ec and rep pools

3 days agoMerge pull request #53457 from NitzanMordhai/wip-nitzan-crush-rule-delete
SrinivasaBharathKanta [Mon, 23 Feb 2026 01:54:03 +0000 (07:24 +0530)]
Merge pull request #53457 from NitzanMordhai/wip-nitzan-crush-rule-delete

mon/OSDMonitor: remove unused crush rules after erasure code pools deleted

3 days agotools/cephfs_mirror: Fix lock order issue wip-khiremat-mulithread-mirror-66572-reviewed-5-fix-purge-snap
Kotresh HR [Sun, 15 Feb 2026 18:41:51 +0000 (00:11 +0530)]
tools/cephfs_mirror: Fix lock order issue

Lock order 1:
InstanceWatcher::m_lock ----> FSMirror::m_lock
Lock order 2:
FSMirror::m_lock -----> InstanceWatcher::m_lock

The Lock order 1 is where it's aborted and it happens
during blocklisting. The InstanceWatcher::handle_rewatch_complete()
acquires InstanceWatcher::m_lock and calls
m_elistener.set_blocklisted_ts() which tries to acquire
FSMirror::m_lock

The Lock order 2 exists in mirror peer status command.
The FSMirror::mirror_status(Formatter *f) takes FSMirro::m_lock
and calls is_blocklisted which takes InstanceWatcher::m_lock

Fix:
FSMirror::m_blocklisted_ts and FSMirror::m_failed_ts is converted
to std::<atomic> and also fixed the scope of m_lock in
InstanceWatcher::handle_rewatch_complete() and
MirrorWatcher::handle_rewatch_complete()

Look at the tracker for traceback and further details.

Fixes: https://tracker.ceph.com/issues/74953
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agodoc: Document cephfs mirroring multi-thread
Kotresh HR [Wed, 4 Feb 2026 10:07:30 +0000 (15:37 +0530)]
doc: Document cephfs mirroring multi-thread

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agodoc/PendingReleaseNotes: cephfs mirroring multi-thread
Kotresh HR [Sat, 21 Feb 2026 20:14:37 +0000 (01:44 +0530)]
doc/PendingReleaseNotes: cephfs mirroring multi-thread

Also mentions about blockdiff which is no longer used
for small files and about new configuration
introduced.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agodoc: Document configs introduced with multi-threaded cephfs-mirror
Kotresh HR [Wed, 4 Feb 2026 09:17:12 +0000 (14:47 +0530)]
doc: Document configs introduced with multi-threaded cephfs-mirror

Following configs are introduced:
  - cephfs_mirror_max_datasync_threads
  - cephfs_mirror_blockdiff_min_file_size

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Do remote fs sync once instead of fsync on each fd
Kotresh HR [Sat, 21 Feb 2026 15:55:30 +0000 (21:25 +0530)]
tools/cephfs_mirror: Do remote fs sync once instead of fsync on each fd

Do remote fs sync once just before taking snapshot
as it's faster than doing fsync on each fd after
file copy.

Moreover, all the datasync threads use the same sinlge libceph
onnection and doing ceph_fsync concurrently on different fds on
a single libcephfs connection could cause hang as observed in
testing as below. This issue is tracked at
https://tracker.ceph.com/issues/75070

-----
Thread 2 (Thread 0xffff644cc400 (LWP 74020) "d_replayer-0"):
0  0x0000ffff8e82656c in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
1  0x0000ffff8e828ff0 [PAC] in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libc.so.6
2  0x0000ffff8fc90fd4 [PAC] in ceph::condition_variable_debug::wait ...
3  0x0000ffff9080fc9c in ceph::condition_variable_debug::wait<Client::wait_on_context_list ...
4  Client::wait_on_context_list ... at /lsandbox/upstream/ceph/src/client/Client.cc:4540
5  0x0000ffff9083fae8 in Client::_fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13299
6  0x0000ffff90840278 in Client::_fsync ...
7  0x0000ffff90840514 in Client::fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13042
8  0x0000ffff907f06e0 in ceph_fsync ... at /lsandbox/upstream/ceph/src/libcephfs.cc:316
9  0x0000aaaaad5b2f88 in cephfs::mirror::PeerReplayer::copy_to_remote ...
----

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Don't use blockdiff on smaller files
Kotresh HR [Sun, 15 Feb 2026 09:37:09 +0000 (15:07 +0530)]
tools/cephfs_mirror: Don't use blockdiff on smaller files

Introduce a new configuration option,
'cephfs_mirror_blockdiff_min_file_size', to control the minimum file
size above which block-level diff is used during CephFS mirroring.

Files smaller than the configured threshold are synchronized using
full file copy, while larger files attempt block-level delta sync.
This provides better flexibility across environments with varying
file size distributions and performance constraints.

The default value is set to 16_M (16 MiB). The value is read once
at beginning of every snapshot sync.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Handle shutdown/blocklist/cancel at syncm dataq wait
Kotresh HR [Sat, 21 Feb 2026 15:40:08 +0000 (21:10 +0530)]
tools/cephfs_mirror: Handle shutdown/blocklist/cancel at syncm dataq wait

1. Add is_stopping() predicate at sdq_cv wait
2. Use the existing should_backoff() routine to validate
   shutdown/blocklsit/cancel errors and set corresponding errors.
3. Handle notify logic at the end
4. In shutdown(), notify all syncm's sdq_cv wait

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Handle shutdown/blocklist at syncm_q wait
Kotresh HR [Sun, 22 Feb 2026 18:10:32 +0000 (23:40 +0530)]
tools/cephfs_mirror: Handle shutdown/blocklist at syncm_q wait

1. Convert smq_cv.wait to timed wait as blocklist doesn't have
   predicate to evaluate. Evaluate is_shutdown() as predicate.
   When either of the two is true, set corresponding error and
   backoff flag in all the syncm objects. The last thread data
   sync thread would wake up all the crawler threads. This is
   necessary to wake up the crawler threads whose data queue
   is not picked by any datasync threads.
2. In shutdown(), change the order of join, join datasync threads
   first. The idea is kill datasync threads first before crawler
   threads as datasync threads are extension of crawler threads
   and othewise might cause issues. Also wake up smq_cv wait for
   shutdown.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Monitor num of active datasync threads
Kotresh HR [Sat, 21 Feb 2026 14:06:39 +0000 (19:36 +0530)]
tools/cephfs_mirror: Monitor num of active datasync threads

Introduce an atomic counter in PeerReplayer to track the number of
active SnapshotDataSyncThread instances.

The counter is incremented when a datasync thread enters its entry()
function and decremented automatically on exit via a small RAII guard
(DataSyncThreadGuard). This ensures accurate accounting even in the
presence of early returns or future refactoring.

This change helps in handling of shutdown and blocklist scenarios.
At the time of shutdown or blocklisting, datasync threads may still
be processing multiple jobs across different SyncMechanism instances.
It is therefore essential that only the final exiting datasync thread
performs the notifications for all relevant waiters, including the
syncm data queue, syncm queue, and m_cond.

This approach ensures orderly teardown by keeping crawler threads
active until all datasync threads have completed execution.
Terminating crawler threads prematurely—before datasync threads have
exited—can lead to inconsistencies, as crawler threads deregister the
mirroring directory while datasync threads may still be accessing it.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Store a reference of PeerReplayer object in SyncMechanism
Kotresh HR [Sat, 21 Feb 2026 14:03:39 +0000 (19:33 +0530)]
tools/cephfs_mirror: Store a reference of PeerReplayer object in SyncMechanism

Store a reference of PeerReplayer object in SyncMechanism.
This allows SyncMechansim object to call functions of PeerReplayer.
This is required in multiple places like handling
shutdown/blocklist/cancel where should_backoff() needs to be
called by syncm object while poppig dataq by data sync threads.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Make PeerReplayer::m_stopping atomic
Kotresh HR [Sun, 15 Feb 2026 03:09:54 +0000 (08:39 +0530)]
tools/cephfs_mirror: Make PeerReplayer::m_stopping atomic

Make PeerReplayer::m_stopping as std::<atomic> and make it
independant of m_lock. This helps 'm_stopping' to be used
as predicate in any conditional wait which doesn't use
m_lock.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Fix assert while opening handles
Kotresh HR [Sat, 21 Feb 2026 13:51:02 +0000 (19:21 +0530)]
tools/cephfs_mirror: Fix assert while opening handles

Issue:
When the crawler or a datasync thread encountered an error,
it's possible that the crawler gets notified by a datasync
thread and bails out resulting in the unregister of the
particular dir_root. The other datasync threads might
still hold the same syncm object and tries to open the
handles during which the following assert is hit.

ceph_assert(it != m_registered.end());

Cause:
This happens because the in_flight counter in syncm object
was tracking if it's processing the actual job from the data
queue.

Fix:
Make in_flight counter in syncm object to track the active
syncm object i.e, inrement as soon as the datasync thread
get a reference to it and decrement when it goes out of
reference.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Fix dequeue of syncm on error
Kotresh HR [Sat, 21 Feb 2026 10:36:31 +0000 (16:06 +0530)]
tools/cephfs_mirror: Fix dequeue of syncm on error

On error encountered in crawler thread or datasync
thread while processing a syncm object, it's possible
that multiple datasync threads attempts the dequeue of
syncm object. Though it's safe, add a condition to avoid
it.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Handle errors in crawler thread
Kotresh HR [Sat, 21 Feb 2026 10:27:42 +0000 (15:57 +0530)]
tools/cephfs_mirror: Handle errors in crawler thread

Any error encountered in crawler threads should be
communicated to the data sync threads by marking the
crawl error in the corresponding syncm object. The
data sync threads would finish pending jobs, dequeue
the syncm object and notify crawler to bail out.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>
3 days agotools/cephfs_mirror: Handle error in datasync thread
Kotresh HR [Sat, 21 Feb 2026 10:18:56 +0000 (15:48 +0530)]
tools/cephfs_mirror: Handle error in datasync thread

On any error encountered in datasync threads while syncing
a particular syncm dataq, mark the datasync error and
communicate the error to the corresponding syncm's crawler
which is waiting to take a snaphsot. The crawler will log
the error and bail out.

Fixes: https://tracker.ceph.com/issues/73452
Signed-off-by: Kotresh HR <khiremat@redhat.com>