Kotresh HR [Sun, 15 Feb 2026 18:41:51 +0000 (00:11 +0530)]
tools/cephfs_mirror: Fix lock order issue
Lock order 1:
InstanceWatcher::m_lock ----> FSMirror::m_lock
Lock order 2:
FSMirror::m_lock -----> InstanceWatcher::m_lock
The Lock order 1 is where it's aborted and it happens
during blocklisting. The InstanceWatcher::handle_rewatch_complete()
acquires InstanceWatcher::m_lock and calls
m_elistener.set_blocklisted_ts() which tries to acquire
FSMirror::m_lock
The Lock order 2 exists in mirror peer status command.
The FSMirror::mirror_status(Formatter *f) takes FSMirro::m_lock
and calls is_blocklisted which takes InstanceWatcher::m_lock
Fix:
FSMirror::m_blocklisted_ts and FSMirror::m_failed_ts is converted
to std::<atomic> and also fixed the scope of m_lock in
InstanceWatcher::handle_rewatch_complete() and
MirrorWatcher::handle_rewatch_complete()
Look at the tracker for traceback and further details.
Kotresh HR [Sun, 15 Feb 2026 19:38:00 +0000 (01:08 +0530)]
tools/cephfs_mirror: Do remote fs sync once instead of fsync on each fd
Doing remote fs sync once just before taking snapshot is faster than
doing fsync on each fd. Moreover, all the datasync threads use the
same sinlge libceph connection and doing ceph_fsync concurrently on
different fds on a single libcephfs connection could cause hang
as observed in testing as below.
-----
Thread 2 (Thread 0xffff644cc400 (LWP 74020) "d_replayer-0"):
0 0x0000ffff8e82656c in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
1 0x0000ffff8e828ff0 [PAC] in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libc.so.6
2 0x0000ffff8fc90fd4 [PAC] in ceph::condition_variable_debug::wait ...
3 0x0000ffff9080fc9c in ceph::condition_variable_debug::wait<Client::wait_on_context_list ...
4 Client::wait_on_context_list ... at /lsandbox/upstream/ceph/src/client/Client.cc:4540
5 0x0000ffff9083fae8 in Client::_fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13299
6 0x0000ffff90840278 in Client::_fsync ...
7 0x0000ffff90840514 in Client::fsync ... at /lsandbox/upstream/ceph/src/client/Client.cc:13042
8 0x0000ffff907f06e0 in ceph_fsync ... at /lsandbox/upstream/ceph/src/libcephfs.cc:316
9 0x0000aaaaad5b2f88 in cephfs::mirror::PeerReplayer::copy_to_remote ...
----
Kotresh HR [Sun, 15 Feb 2026 09:19:59 +0000 (14:49 +0530)]
tools/cephfs_mirror: Handle shutdown/blocklist/cancel at syncm dataq wait
1. Add is_stopping() predicate at sdq_cv wait
2. Use the existing should_backoff() routine to validate
shutdown/blocklsit/cancel errors and set corresponding errors.
3. Handle notify logic at the end
4. In shutdown(), notify all syncm's sdq_cv wait
Kotresh HR [Sun, 15 Feb 2026 08:30:07 +0000 (14:00 +0530)]
tools/cephfs_mirror: Handle shutdown/blocklist at syncm_q wait
1. Convert smq_cv.wait to timed wait as blocklist doesn't have
predicate to evaluate. Evaluate is_shutdown() as predicate.
When either of the two is true, set corresponding error and
backoff flag in all the syncm objects. The last thread data
sync thread would wake up all the crawler threads. This is
necessary to wake up the crawler threads whose data queue
is not picked by any datasync threads.
2. In shutdown(), change the order of join, join datasync threads
first. The idea is kill datasync threads first before crawler
threads as datasync threads are extension of crawler threads
and othewise might cause issues. Also wake up smq_cv wait for
shutdown.
Kotresh HR [Sun, 15 Feb 2026 06:39:45 +0000 (12:09 +0530)]
tools/cephfs_mirror: Monitor num of active datasync threads
Introduce an atomic counter in PeerReplayer to track the number of
active SnapshotDataSyncThread instances.
The counter is incremented when a datasync thread enters its entry()
function and decremented automatically on exit via a small RAII guard
(DataSyncThreadGuard). This ensures accurate accounting even in the
presence of early returns or future refactoring.
This change helps in handling of shutdown and blocklist scenarios.
At the time of shutdown or blocklisting, datasync threads may still
be processing multiple jobs across different SyncMechanism instances.
It is therefore essential that only the final exiting datasync thread
performs the notifications for all relevant waiters, including the
syncm data queue, syncm queue, and m_cond.
This approach ensures orderly teardown by keeping crawler threads
active until all datasync threads have completed execution.
Terminating crawler threads prematurely—before datasync threads have
exited—can lead to inconsistencies, as crawler threads deregister the
mirroring directory while datasync threads may still be accessing it.
Kotresh HR [Sun, 15 Feb 2026 04:00:25 +0000 (09:30 +0530)]
tools/cephfs_mirror: Store a reference of PeerReplayer object in SyncMechanism
Store a reference of PeerReplayer object in SyncMechanism.
This allows SyncMechansim object to call functions of PeerReplayer.
This is required in multiple places like handling
shutdown/blocklist/cancel where should_backoff() needs to be
called by syncm object while poppig dataq by data sync threads.
Kotresh HR [Sun, 15 Feb 2026 03:09:54 +0000 (08:39 +0530)]
tools/cephfs_mirror: Make PeerReplayer::m_stopping atomic
Make PeerReplayer::m_stopping as std::<atomic> and make it
independant of m_lock. This helps 'm_stopping' to be used
as predicate in any conditional wait which doesn't use
m_lock.
Kotresh HR [Wed, 14 Jan 2026 20:06:31 +0000 (01:36 +0530)]
tools/cephfs_mirror: Fix assert while opening handles
When the crawler or a datasync thread encountered an error,
it's possible that the crawler gets notified by a datasync
thread and bails out resulting in the unregister of the
particular dir_root. The other datasync threads might
still hold the same syncm object and tries to open the
handles during which the following assert is hit.
ceph_assert(it != m_registered.end());
The above assert is removed and the error is handled.
Kotresh HR [Wed, 14 Jan 2026 19:59:36 +0000 (01:29 +0530)]
tools/cephfs_mirror: Fix dequeue of syncm on error
On error encountered in crawler thread or datasync
thread while processing a syncm object, it's possible
that multiple datasync threads attempts the dequeue of
syncm object. Though it's safe, add a condition to avoid
it.
Kotresh HR [Wed, 14 Jan 2026 19:53:34 +0000 (01:23 +0530)]
tools/cephfs_mirror: Handle errors in crawler thread
Any error encountered in crawler threads should be
communicated to the data sync threads by marking the
crawl error in the corresponding syncm object. The
data sync threads would finish pending jobs, dequeue
the syncm object and notify crawler to bail out.
Kotresh HR [Wed, 14 Jan 2026 19:35:29 +0000 (01:05 +0530)]
tools/cephfs_mirror: Handle error in datasync thread
On any error encountered in datasync threads while syncing
a particular syncm dataq, mark the datasync error and
communicate the error to the corresponding syncm's crawler
which is waiting to take a snaphsot. The crawler will log
the error and bail out.
There is global queue of SyncMechanism objects(syncm). Each syncm
object represents a single snapshot being synced and each syncm
object owns m_sync_dataq representing list of files in the snapshot
to be synced.
The data sync threads should consume the next syncm job
if the present syncm has no pending work. This can evidently
happen if the last file being synced in the present syncm
job is a large file from it's syncm_dataq. In this case, one
data sync thread is busy syncing the large file, the rest of
data sync threads just wait for it to finish to avoid busy loop.
Instead, the idle data sync threads could start consuming the next
syncm job.
This brings in a change to data structure.
- syncm_q has to be std::deque instead of std::queue as syncm in the
middle can finish syncing first and that needs to be removed before
the front
Kotresh HR [Wed, 14 Jan 2026 12:30:43 +0000 (18:00 +0530)]
tools/cephfs_mirror: Synchronize taking snapshot
The crawler/entry creation thread needs to wait until
all the data is synced by datasync threads to take
the snapshot. This patch adds the necessary conditions
for the same.
It is important for the conditional flag to be part
of SyncMechanism and not part of PeerReplayer class.
The following bug would be hit if it were part of
PeerReplayer class.
When multiple directories are confiugred for mirroring as below
/d0 /d1 /d2
Crawler1 Crawler2 Crawler3
DoneEntryOps DoneEntryOps DoneEntryOps
WaitForSafeSnap WaitForSafeSnap WaitForSafeSnap
When all crawler threads are waiting at above, the data sync threads
which is done processing /d1, would notify, waking up all the crawlers
causing spurious/unwanted wake up and half baked snapshots.
Kotresh HR [Wed, 14 Jan 2026 12:05:33 +0000 (17:35 +0530)]
tools/cephfs_mirror: Fix data sync threads completion logic
We need to exactly know when all data threads completes
the processing of a syncm. If a few threads finishes the
job, they all need to wait for the in processing threads
of that syncm to complete. Otherwise the finished threads
would be busy loop until in processing threads finishes.
And only after all threads finishes processing, the crawler
thread can be notified to take the snapshot.
Kotresh HR [Tue, 9 Dec 2025 10:05:08 +0000 (15:35 +0530)]
tools/cephfs_mirror: Mark crawl finished
After entry operations are synced and stack is empty,
mark the crawl as finished so the data sync threads'
wait logic works correctly and doesn't indefinitely wait.
Kotresh HR [Wed, 14 Jan 2026 08:47:07 +0000 (14:17 +0530)]
tools/cephfs_mirror: Add SyncMechanism Queue
Add a queue of shared_ptr of type SyncMechanism.
Since it's shared_ptr, the queue can hold both
shared_ptr to both RemoteSync and SnapDiffSync objects.
Each SyncMechanism holds the queue for the SyncEntry
items to be synced using the data sync threads.
The SyncMechanism queue needs to be shared_ptr because
all the data sync threads needs to access the object
of SyncMechanism to process the SyncEntry Queue.
This patch sets up the building blocks for the same.
Kotresh HR [Wed, 14 Jan 2026 08:27:34 +0000 (13:57 +0530)]
tools/cephfs_mirror: Use the existing m_lock and m_cond
The entire snapshot is synced outside the lock.
The m_lock and m_cond pair is used for data sync
threads along with crawler threads to work well
with all terminal conditions like shutdown and
existing data structures.
Kotresh HR [Sat, 7 Feb 2026 14:26:36 +0000 (19:56 +0530)]
qa: Add retry logic to remove most sleeps in mirroring tests
The mirroring tests contain lot of sleeps adding it up to ~1hr.
This patch adds a retry logic and removes most of them. This
is cleaner and saves considerable time in test time for mirroring.
Patrick Donnelly [Wed, 11 Feb 2026 19:05:07 +0000 (14:05 -0500)]
Merge PR #67011 into main
* refs/pull/67011/head:
qa/multisite: use boto3's ClientError in place of assert_raises from tools.py.
qa/multisite: test fixes
qa/multisite: boto3 in tests.py
qa/multisite: zone files use boto3 resource api
qa/multisite: switch to boto3 in multisite test libraries
Xavi Hernandez [Wed, 5 Nov 2025 09:05:45 +0000 (10:05 +0100)]
libcephfs_proxy: add the number of supported operations to negotiation
The v0 negotiation structure has been modified to hold the total number
of operations and callbacks supported by the peer. The changes are done in
a way that it's completely transparent and harmless for a peer expecting
the previous definition.
This will be useful to quickly check if the daemon supports some
operation, or the client supports some callback before sending them.