From 7d63ab5f97192b2bde876f6d0acd818d4789e612 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Sun, 29 May 2022 18:20:34 +0200 Subject: [PATCH] librbd: unlink newest mirror snapshot when at capacity, bump capacity CreatePrimaryRequest::unlink_peer() invoked via "rbd mirror image snapshot" command or via rbd_support mgr module when creating a new scheduled mirror snapshot at rbd_mirroring_max_mirroring_snapshots capacity on the primary cluster can race with Replayer::unlink_peer() invoked by rbd-mirror when finishing syncing an older snapshot on the secondary cluster. Consider the following: [ primary: primary-snap1, primary-snap2, primary-snap3 secondary: non-primary-snap1 (complete), non-primary-snap2 (syncing) ] 0. rbd-mirror is syncing snap1..snap2 delta 1. rbd_support creates primary-snap4 2. due to rbd_mirroring_max_mirroring_snapshots == 3, rbd_support picks primary-snap3 for unlinking 3. rbd-mirror finishes syncing snap1..snap2 delta and marks non-primary-snap2 complete [ snap1 (the old base) is no longer needed on either cluster ] 4. rbd-mirror unlinks and removes primary-snap1 5. rbd-mirror removes non-primary-snap1 6. rbd-mirror picks snap2 as the new base 7. rbd-mirror creates non-primary-snap3 and starts syncing snap2..snap3 delta [ primary: primary-snap2, primary-snap3, primary-snap4 secondary: non-primary-snap2 (complete), non-primary-snap3 (syncing) ] 8. rbd_support unlinks and removes primary-snap3 which is in-use by rbd-mirror If snap trimming on the primary cluster kicks in soon enough, the secondary image becomes corrupted: rbd-mirror would eventually finish "syncing" non-primary-snap3 and mark it complete in spite of bogus data in the HEAD -- the primary cluster OSDs would start returning ENOENT for snap trimmed objects. Luckily, rbd-mirror's attempt to pick snap3 as the new base would wedge the replayer with "split-brain detected: failed to find matching non-primary snapshot in remote image" error. Before commit a888bff8d00e ("librbd/mirror: tweak which snapshot is unlinked when at capacity") this could happen pretty much all the time as it was the second oldest snapshot that was unlinked. This commit changed it to be the third oldest snapshot, turning this into a more narrow but still very much possible to hit race. Unfortunately this race condition appears to be inherent to the way snapshot-based mirroring is currently implemented: a. when mirror snapshots are created on the producer side of the snapshot queue, they are already linked b. mirror snapshots can be concurrently unlinked/removed on both sides of the snapshot queue by non-cooperating clients (local rbd_mirror_image_create_snapshot() vs remote rbd-mirror) c. with mirror peer links off the list due to (a), there is no existing way for rbd-mirror to persistently mark a snapshot as in-use As a workaround, bump rbd_mirroring_max_mirroring_snapshots to 5 and always unlink the newest snapshot (i.e. slot 4) instead of the third oldest snapshot (i.e. slot 2). Hopefully this gives enough leeway, as rbd-mirror would need to sync two snapshots (i.e. transition from syncing 0-1 to 1-2 and then to 2-3) before potentially colliding with rbd_mirror_image_create_snapshot() on slot 4. Fixes: https://tracker.ceph.com/issues/55803 Signed-off-by: Ilya Dryomov (cherry picked from commit ef83c0f347c38f3eee70d554d5433a1a17b209b6) Conflicts: src/common/options/rbd.yaml.in [ options are defined in src/common/options.cc in pacific ] --- doc/rbd/rbd-mirroring.rst | 2 +- src/common/options.cc | 2 +- .../mirror/snapshot/CreatePrimaryRequest.cc | 2 +- src/test/librbd/test_mirroring.cc | 18 ++++++++++-------- 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/doc/rbd/rbd-mirroring.rst b/doc/rbd/rbd-mirroring.rst index 300cb98d4c3db..d5d1191d63e4f 100644 --- a/doc/rbd/rbd-mirroring.rst +++ b/doc/rbd/rbd-mirroring.rst @@ -318,7 +318,7 @@ For example:: $ rbd --cluster site-a mirror image snapshot image-pool/image-1 -By default only ``3`` mirror-snapshots will be created per-image. The most +By default up to ``5`` mirror-snapshots will be created per-image. The most recent mirror-snapshot is automatically pruned if the limit is reached. The limit can be overridden via the ``rbd_mirroring_max_mirroring_snapshots`` configuration option if required. Additionally, mirror-snapshots are diff --git a/src/common/options.cc b/src/common/options.cc index 7fda073872332..5704bed71dab2 100644 --- a/src/common/options.cc +++ b/src/common/options.cc @@ -7829,7 +7829,7 @@ static std::vector