]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 months agorbd-mirror: implement group replayer Health State
Prasanna Kumar Kalever [Fri, 28 Mar 2025 16:24:44 +0000 (21:54 +0530)]
rbd-mirror: implement group replayer Health State

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: catch and bubble-up all the image level errors to group status
Prasanna Kumar Kalever [Fri, 28 Mar 2025 14:25:54 +0000 (19:55 +0530)]
rbd-mirror: catch and bubble-up all the image level errors to group status

Wait for the right group status to attain before checking for the image level
errors.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd: get_group_snap_get_mirror_namespace() API + groups in "rbd mirror pool status"
VinayBhaskar-V [Thu, 27 Feb 2025 22:08:55 +0000 (03:38 +0530)]
librbd: get_group_snap_get_mirror_namespace() API + groups in "rbd mirror pool status"

Co-authored-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agorbd_mirror: avoid passing empty remote_mirror_uuid to group_replayer
Prasanna Kumar Kalever [Thu, 27 Mar 2025 07:25:33 +0000 (12:55 +0530)]
rbd_mirror: avoid passing empty remote_mirror_uuid to group_replayer

group_replayer can fetch remote_mirror_uuid as remote_pool_meta.mirror_uuid

>>> gc = rbd.Group(ioctx, 'test_group')
>>> print(gc.group_snap_get_mirror_namespace('104b430672cf'))
{'state': 2, 'mirror_peer_uuids': [],
'primary_mirror_uuid': '6cd393ad-c21d-42e6-a404-0dabf596bfe7',
'primary_snap_id': '104b430672cf'}

Thanks to Ilya for highlighting the issue.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: don't mask images in group with read-only as part of image_demote()
Prasanna Kumar Kalever [Wed, 26 Mar 2025 13:40:33 +0000 (19:10 +0530)]
librbd/api: don't mask images in group with read-only as part of image_demote()

if the images are part of a group wait until group_demote() is finally done
with GroupUnlinkPeerRequest() and then mask the images part of the group with
IMAGE_READ_ONLY_FLAG_NON_PRIMARY.

Thanks to Nithya for working along for a better fix here.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agocleanup: uncommented assert m_state_builder
Prasanna Kumar Kalever [Wed, 26 Mar 2025 07:35:39 +0000 (13:05 +0530)]
cleanup: uncommented assert m_state_builder

Thanks to Nithya for highlighting it.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: add defence in group enable to bail about different pool images
Prasanna Kumar Kalever [Tue, 25 Mar 2025 19:39:52 +0000 (01:09 +0530)]
librbd/api: add defence in group enable to bail about different pool images

If a group contains images from different pools do not allow enabling
mirroring on it.

Thanks to Ilya for all the suggestions and review.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd: fix text in mirror group help messages
John Agombar [Mon, 30 Sep 2024 12:28:24 +0000 (13:28 +0100)]
rbd: fix text in mirror group help messages

Signed-off-by: John Agombar <agombar@uk.ibm.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: finalize the API's about skip-quiesce and ignore-quiesce-error flags
Prasanna Kumar Kalever [Mon, 24 Mar 2025 10:03:31 +0000 (15:33 +0530)]
librbd/api: finalize the API's about skip-quiesce and ignore-quiesce-error flags

* leave --skip-quiesce and --ignore-quiesce-error options only on
  rbd mirror group snapshot command
* drop flags argument from mirror_group_enable(), mirror_group_promote() and
  mirror_group_demote() APIs, it will remain only on
  mirror_group_create_snapshot() and aio_mirror_group_create_snapshot() APIs
* mirror_group_promote() and mirror_group_demote() should behave as if
  RBD_SNAP_CREATE_SKIP_QUIESCE flag was passed
* mirror_group_enable() should use get_default_snap_create_flags() to get flags
  -- it will be governed by rbd_default_snapshot_quiesce_mode config option
* make each of the mentioned APIs explicitly do either
  a) snap_create_flags_api_to_internal(<flags passed by the user>, &snap_create_flags),
  b) snap_create_flags_api_to_internal(get_default_snap_create_flags(), &snap_create_flags)

Credits to Ilya Dryomov <idryomov@gmail.com> for the above finalisation.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agocleanup: bootstrap no more need prepare_non_primary_mirror_snap_name
Prasanna Kumar Kalever [Mon, 24 Mar 2025 05:19:10 +0000 (10:49 +0530)]
cleanup: bootstrap no more need prepare_non_primary_mirror_snap_name

We had removed the need for non primary group snapshots creation on
secondary as part of bootstrap and this function is no more needed.

Thanks to Nithya for highlighting.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: don't call group_snap_set for every image snap for regular group snap
Prasanna Kumar Kalever [Fri, 21 Mar 2025 17:37:54 +0000 (23:07 +0530)]
rbd-mirror: don't call group_snap_set for every image snap for regular group snap

It looks like we fixed avoiding of calling group_snap_set() for each image
snapshot update for mirror group snapshot, but for regular group snapshot,
it is still happening. This commit will fix it.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: call set_image_replayer_limits() only if group_snap_set() successful
Prasanna Kumar Kalever [Fri, 21 Mar 2025 18:35:16 +0000 (00:05 +0530)]
rbd-mirror: call set_image_replayer_limits() only if group_snap_set() successful

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: address group_snap_set() failures
Prasanna Kumar Kalever [Fri, 21 Mar 2025 14:00:09 +0000 (19:30 +0530)]
rbd-mirror: address group_snap_set() failures

There are 5 places where we call group_snap_set()

1. create_mirror_snapshot()
2. create_regular_snapshot()
3. mirror_snapshot_complete()
4. regular_snapshot_complete() and
5. remove_mirror_peer_uuid()

For 1 & 2 cases, we can simply delete the so far created group snapshot
in the respective callback handler which is basically an empty INCOMPLETE
group snapshot and let the state machine recreate it again later.

For 3 & 4 cases, we are cannot delete the created snapshot, because the
image snapshots whould have synced/syncing by now, deleting the group
snapshot will bring additional comlications (if there is a failover at
the same time). Hence setting m_retry_validate_snap flag in this case,
this would all the rescan even for regular group snapshots, if the
snapshot is yet INCOMPLETE on disk the validate_image_snaps_sync_complete()
will be called again.

For case 5, added logic to retry remove_mirror_peer_uuid() again.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: fix m_stop_requested leading to a race
Prasanna Kumar Kalever [Fri, 21 Mar 2025 08:59:45 +0000 (14:29 +0530)]
rbd-mirror: fix m_stop_requested leading to a race

* if m_stop_requested is set then is_replay_interrupted return true.
* also shut_down should set m_stop_requested to false, it is instead
  setting it to true this will lead to race and a possible crash accessing GR
  b/w shut_down() and notify_group_listener_stop()

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agocleanup: avoid passing last_local_snap_id to unlink_group_snapshots
Prasanna Kumar Kalever [Fri, 21 Mar 2025 08:45:05 +0000 (14:15 +0530)]
cleanup: avoid passing last_local_snap_id to unlink_group_snapshots

last_local_snap_id can be fetched from m_local_group_snaps.rbegin()

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agocleanup: avoid use of remote_group_snap id and name variable names for readability
Prasanna Kumar Kalever [Fri, 21 Mar 2025 08:27:00 +0000 (13:57 +0530)]
cleanup: avoid use of remote_group_snap id and name variable names for readability

remote_group_snap_id is used interchangeably with local_group_snap_id
this patch cleans it up.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: fix and enable test_force_promote_before_initial_sync
Prasanna Kumar Kalever [Thu, 20 Mar 2025 20:54:49 +0000 (02:24 +0530)]
qa/workunits/rbd: fix and enable test_force_promote_before_initial_sync

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: fail group promote when there is no previous snapshot
Prasanna Kumar Kalever [Thu, 20 Mar 2025 17:36:29 +0000 (23:06 +0530)]
librbd/api: fail group promote when there is no previous snapshot

If the group enable time initial snapshot didn't sync to the secondary and
is in incomplete state, but then there happens a force promote on secondary,
there is no previous snapshot for that force promote to rollback to.

In this situation the force promote should fail.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: update to mirror group snapshot tests
John Agombar [Thu, 20 Mar 2025 20:48:57 +0000 (20:48 +0000)]
qa/workunits/rbd: update to mirror group snapshot tests

Updated status() helper function to dump contents of stderr and stdout for last command
New helper functions to check for image snaps existence
Added new environment variable RBD_MIRROR_HIDE_BASH_DEBUGGING to turn off set -x output.
Previously RBD_MIRROR_SHOW_CLI_CMD was being used for this and controlling the display of cli output.

New tests:
• test_group_rename - test that a group rename is only mirrored to the remote after a mirror group
  snapshot command.  Also test that a group rename is not inadvertantly mirrored or undone
  (test commented out as it is failing)
• test_enable_mirroring_when_duplicate_group_exists - various scenarios that check an empty group
  and approaches to fixing the duplicate names on either site.
  (test is commented out as it is not yet finished)
• test_enable_mirroring_when_duplicate_group_and_images_exists - builds on the previous test
  but has duplicate named images too (test is commented out as it is failing)
• test_image_snapshots_with_group - test regular image snapshots along with mirror group snapshots

Enabled tests:
- test_force_promote scenarios 1,2,3 and 5 pass

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: avoid bootstrap creating a local non_primary group snapshot
Prasanna Kumar Kalever [Thu, 20 Mar 2025 07:00:23 +0000 (12:30 +0530)]
rbd-mirror: avoid bootstrap creating a local non_primary group snapshot

previously we are bound to this but with the current design we no more need it.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: look for mismatch in name only on secondary cluster
Prasanna Kumar Kalever [Wed, 19 Mar 2025 18:17:10 +0000 (23:47 +0530)]
rbd-mirror: look for mismatch in name only on secondary cluster

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: enable and fix test_resync_marker test
Prasanna Kumar Kalever [Thu, 20 Mar 2025 06:02:13 +0000 (11:32 +0530)]
qa/workunits/rbd: enable and fix test_resync_marker test

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: group-replayer check for remote demote state
Prasanna Kumar Kalever [Thu, 20 Mar 2025 05:44:48 +0000 (11:14 +0530)]
rbd-mirror: group-replayer check for remote demote state

I'm seeing a possibility for 3 situations here for resync flagging and
rbd-mirror daemon working on it:

1. No Demotion on Primary while/just-before resync is play'ed
    there is no demote snap along side resync, we can cancel syncing other
    snaps, and start resync as soon as resync is flagged, because there is
    no point syncing snaps that we are anyway going to delete the whole
    group and resync fresh.

2. first Demote + immediately Resync
    demote came first, this mean before proceeding with resync, we should
    always see if the last remote snap is PRIMARY (validate if the remote
    is still primary, which is on point) and only proceed

3. first Resync + immediately Demote
    resync Came first, so we head straight to resync.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: fix rollback failures to mirror group snapshots
Prasanna Kumar Kalever [Tue, 18 Mar 2025 16:50:17 +0000 (22:20 +0530)]
librbd/api: fix rollback failures to mirror group snapshots

group_snap_rollback_by_record() is made flexible to handle the mirror snapshots
along side the existing user snapshots with minor changes.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: adjust grep invocation in is_leader()
Ilya Dryomov [Sun, 16 Mar 2025 16:33:58 +0000 (17:33 +0100)]
qa/workunits/rbd: adjust grep invocation in is_leader()

admin_daemon() no longer populates stdout and stderr, all output is
redirected inside of run_cmd_internal().

This unbreaks "TEST: release leader and wait it is reacquired".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agoqa/workunits/rbd: fix looping in remove_image_retry()
Ilya Dryomov [Sun, 16 Mar 2025 16:23:22 +0000 (17:23 +0100)]
qa/workunits/rbd: fix looping in remove_image_retry()

remove_image_retry() is to be called on images that may still have
a watcher (i.e. considered to be open), in which case either of "rbd
snap purge" and "rbd rm" commands can fail.

This unbreaks "TEST: delete images during bootstrap".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agoqa/workunits/rbd: fix positional argument expansion in create_image()
Ilya Dryomov [Sat, 15 Mar 2025 19:42:02 +0000 (20:42 +0100)]
qa/workunits/rbd: fix positional argument expansion in create_image()

Sticking $@ into a string that is supposed to form a command isn't
right because the string would be broken apart when $@ has more than
one argument:

  If the double-quoted expansion occurs within a word, the expansion of
  the first parameter is joined with the beginning part of the original
  word, and the expansion of the last parameter is joined with the last
  part of the original word.

Resort to $* despite its shortcomings -- given run_cmd() signature it's
the only practical fixup.  A wrapper such as run_cmd() should really be
variadic or take an array instead of insisting on a single string.

This unbreaks "TEST: data pool".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agoqa/workunits/rbd: fix looping in wait_for_snapshot_sync_complete()
Ilya Dryomov [Sat, 15 Mar 2025 17:02:49 +0000 (18:02 +0100)]
qa/workunits/rbd: fix looping in wait_for_snapshot_sync_complete()

get_primary_snap_id_for_newest_mirror_snapshot_on_secondary() may be
called on a freshly created image where the only non-primary snapshot
is still incomplete.  In this scenario it fails because a snapshot ID
can't be produced, but wait_for_snapshot_sync_complete() should keep
retrying the same as in the regular case.

This fixes sporadic failures in "TEST: add image and test replay",
"TEST: stop mirror, add image, start mirror and test replay" and other
tests that use wait_for_snapshot_sync_complete().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agoqa/workunits/rbd: make wait_for_omap_keys() work on a non-existent object
Ilya Dryomov [Sat, 15 Mar 2025 15:42:28 +0000 (16:42 +0100)]
qa/workunits/rbd: make wait_for_omap_keys() work on a non-existent object

wait_for_image_in_omap() may be called on a cluster with mirroring
configured but rbd-mirror daemon never started.  rbd_mirror_leader
object isn't expected to exist in that case.

This unbreaks "TEST: check if removed images' OMAP are removed (with
rbd-mirror on one cluster)".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agolibrbd: tolerate image not existing in ImageRemoveRequest
Ilya Dryomov [Sat, 15 Mar 2025 12:56:45 +0000 (13:56 +0100)]
librbd: tolerate image not existing in ImageRemoveRequest

ImageRemoveRequest may be called when the image no longer exists (or
even never existed in case of a clone image whose creation is pending
on its parent showing up on the secondary) to clean up leftover mirror
metadata.  Upon failure to get the group spec on a non-existing image
header, the state machine should be advanced to remove_mirror_image().

This fixes sporadic failures in "TEST: cloned images" and "TEST: check
if removed images' OMAP are removed".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2 months agolibrbd/api: address C_SaferCond object leak
Prasanna Kumar Kalever [Fri, 14 Mar 2025 08:56:12 +0000 (14:26 +0530)]
librbd/api: address C_SaferCond object leak

Credits to Ilya Dryomov <idryomov@gmail.com> for the fix.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd: fix issues in create and unlink group snapshots
N Balachandran [Fri, 14 Mar 2025 07:16:39 +0000 (12:46 +0530)]
librbd: fix issues in create and unlink group snapshots

Fixed GroupCreatePrimaryRequest to use the snap_create_flags.
Fixed a leak in GroupUnlinkPeerRequest.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agopybind/mgr: fix flake8 errors in rbd_support module
Ramana Raja [Thu, 13 Mar 2025 22:18:43 +0000 (18:18 -0400)]
pybind/mgr: fix flake8 errors in rbd_support module

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agotest/cli-integration/rbd: include `mirror group scheduler`
Ramana Raja [Thu, 13 Mar 2025 20:34:51 +0000 (16:34 -0400)]
test/cli-integration/rbd: include `mirror group scheduler`

... commands in cram-based mon command API test.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agoqa/workunits/rbd: update to mirror group snapshot tests
John Agombar [Thu, 13 Mar 2025 14:37:57 +0000 (14:37 +0000)]
qa/workunits/rbd: update to mirror group snapshot tests

Update run_test_secnarios function to support a non-contiguous sequence of scenario numbers
Remove assert that checked empty omap keys between tests - now just logs to testlog

New tests:
- test_odf_failover_failback - new scenario with resync request on test_odf_failover_failback

Disabled tests:
- test_force_promote all scenarios fail since test is now checking group
  consistency during rollback

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: fix issues with rebase
Prasanna Kumar Kalever [Thu, 13 Mar 2025 17:21:22 +0000 (22:51 +0530)]
rbd-mirror: fix issues with rebase

* revert changes to remote_namespace_set
* revert change in PoolWatcher<I>::notify_listener()

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: address compiler warnings
Prasanna Kumar Kalever [Thu, 13 Mar 2025 13:46:31 +0000 (19:16 +0530)]
rbd-mirror: address compiler warnings

Issue I:
src/tools/rbd_mirror/group_replayer/Replayer.cc:604:82:
error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare]
  604 | (prev_remote_snap_ns->state != cls::rbd::MIRROR_SNAPSHOT_STATE_PRIMARY ||
      |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  605 |  prev_remote_snap_ns->state != cls::rbd::MIRROR_SNAPSHOT_STATE_PRIMARY_DEMOTED)) {
      |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Issue II:
src/tools/rbd_mirror/GroupReplayer.h:178:10:
error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  178 |         [this](int r) {
      |          ^

Issue III:
src/test/rbd_mirror/test_mock_ImageSync.cc:258:16:
error: no matching constructor for initialization of 'MockImageSync'
(aka 'ImageSync<librbd::MockTestImageCtx>')
  258 |     return new MockImageSync(

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: improvements to group replayer
Prasanna Kumar Kalever [Wed, 12 Mar 2025 18:09:51 +0000 (23:39 +0530)]
rbd-mirror: improvements to group replayer

* group_snap_set() currently is called per image snapshot ack in group
  snapshot, with this change, now on it is called
  1. locally, on empty group snap creation with state INCOMPLETE
  2. locally when, group snap move to COMPLETE with all image snap details
  3. on remote snapshot when remove peer uuid on a previous COMEPLETE snap

* group_snap_set() revert conditioning and return value around check for
  "snap key already exists"

* fix user snapshot removal need two succeeding snapshots

* cleanup, avoid unrequired calls, unrequired variables.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: update to mirror group snapshot tests
John Agombar [Tue, 4 Mar 2025 14:24:43 +0000 (14:24 +0000)]
qa/workunits/rbd: update to mirror group snapshot tests

New tests:
- force promote test with daemon running on both clusters
- test_enable_mirroring_when_duplicate_group_exists
- test_odf_failover_failback test
- test_resync_marker test
- test_force_promote_before_initial_sync test
- scenarios in test_create_group_with_images_then_mirror_with_regular_snapshots

Disabled tests:
- test_force_promote scenarios 2 & 3 which repeatedly fail

Renamed tests:
- test_multiple_user_snapshot_time to test_multiple_mirror_group_snapshot_unlink_time
- test_multiple_user_snapshot_whilst_stopped to test_multiple_mirror_group_snapshot_whilst_stopped

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: fix MirrorStatusWatcher
Prasanna Kumar Kalever [Tue, 11 Mar 2025 09:47:41 +0000 (15:17 +0530)]
rbd-mirror: fix MirrorStatusWatcher

* moved the internal functions scope to private
* use m_on_start_finish to save the init time Context and use later,
  instead of passing it in various functions

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: reuse the ImageReplayers in the GroupReplayer
N Balachandran [Tue, 11 Mar 2025 10:19:16 +0000 (15:49 +0530)]
rbd-mirror: reuse the ImageReplayers in the GroupReplayer

This fix will start image replayers even if the group replayer
is primary so as to have the correctmirror pool status.
The group replayer will also attempt to reuse the image replayers where
possible on restart.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agoqa/workunits/rbd: required changes to expect state to up+unknown
Prasanna Kumar Kalever [Tue, 11 Mar 2025 04:38:44 +0000 (10:08 +0530)]
qa/workunits/rbd: required changes to expect state to up+unknown

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: mark group state as up+unknown when group is demoted
Prasanna Kumar Kalever [Tue, 11 Mar 2025 04:37:18 +0000 (10:07 +0530)]
rbd-mirror: mark group state as up+unknown when group is demoted

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/io/AioCompletion: allow operations on group
Ramana Raja [Tue, 11 Feb 2025 14:52:52 +0000 (09:52 -0500)]
librbd/io/AioCompletion: allow operations on group

For asychronous group librbd APIs make use of existing
librbd::io::AioCompletion class via the wrapper, RBD::AioCompletion.

Get rid of RBD::AioGroupCompletion, and RBD::C_AioGroupCompletion that
are no longer needed.

Also modify the python RBD library to not make use of
RBD::AioGroupCompletion.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agoqa/workunits/rbd: fix test_force_promote_delete_group
Prasanna Kumar Kalever [Mon, 10 Mar 2025 10:45:07 +0000 (16:15 +0530)]
qa/workunits/rbd: fix test_force_promote_delete_group

Both clusters see themselves in up+stopped and other clusters in up+error

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: The primary GR does not need to wait for the remote to be ready
Prasanna Kumar Kalever [Mon, 10 Mar 2025 10:42:33 +0000 (16:12 +0530)]
rbd-mirror: The primary GR does not need to wait for the remote to be ready

The following "remote is not ready yet"  error is not required in case
of primary:

[root@server1 build]# rbd-a mirror group status data/grp1
grp1:
  global_id:   a04e99a8-fc4b-4345-adb8-b97a51dc4cac
  state:       up+error
  description: remote is not ready yet
  service:     admin on server1.lab.eng.blr.redhat.com
  last_update: 2025-03-05 12:02:41
  images:

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: group remove status as part of Group Replayer
Prasanna Kumar Kalever [Fri, 7 Mar 2025 09:51:59 +0000 (15:21 +0530)]
rbd-mirror: group remove status as part of Group Replayer

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: fix image map notifications for groups
N Balachandran [Fri, 7 Mar 2025 07:26:35 +0000 (12:56 +0530)]
rbd-mirror: fix image map notifications for groups

The Group replayer Bootstrap now sends mirroring notifications
when creating or deleting the local group. The ImageMap will only
send the acquire_group notifications once for each group.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agolibrbd/mirror: cleanup redundant parameters in CreatePrimaryRequest and
Ramana Raja [Wed, 5 Mar 2025 19:29:16 +0000 (14:29 -0500)]
librbd/mirror: cleanup redundant parameters in CreatePrimaryRequest and

... CreateNonPrimaryRequest constructors. The objects can figure out
the image's group ID and group pool ID from the group_spec stored in
their image_ctx data member. No need to pass in group ID and
group pool ID into the constructors.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/mirror: set valid group_spec in MirrorSnapshotNamespace
Ramana Raja [Tue, 4 Mar 2025 23:01:18 +0000 (18:01 -0500)]
librbd/mirror: set valid group_spec in MirrorSnapshotNamespace

... for a member image snap of a non-primary mirror group.

Previously, group_spec's group_id was always an empty string, and its
pool_id was set to be the image's metadata pool ID.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/mirror: change naming format of member image snap
Ramana Raja [Tue, 4 Mar 2025 17:43:19 +0000 (12:43 -0500)]
librbd/mirror: change naming format of member image snap

... of primary and non-primary mirror group snaps.

Set the naming format of member image snap of a mirror group snap to be,
mirror.primary.<global_image_id>.<global_group_id>.<group_pool_id>_<group_id>_<group_snap_id>,
or
mirror.non_primary.<global_image_id>.<global_group_id>.<group_pool_id>_<group_id>_<group_snap_id>

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/api: set `image_snap_name` as empty string for mirror gp snap
Ramana Raja [Fri, 28 Feb 2025 21:49:27 +0000 (16:49 -0500)]
librbd/api: set `image_snap_name` as empty string for mirror gp snap

The member image snapshots of a mirror group snap do not share a
common name unlike those of a user group snap. So set the
`image_snap_name` to an empty string.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/api: set group snap's namespace type to mirror
Ramana Raja [Fri, 28 Feb 2025 21:19:03 +0000 (16:19 -0500)]
librbd/api: set group snap's namespace type to mirror

... for a mirror group snap.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agorbd-mirror: reverting changes to ImageSync
Prasanna Kumar Kalever [Thu, 6 Mar 2025 07:53:19 +0000 (13:23 +0530)]
rbd-mirror: reverting changes to ImageSync

ImageSync should never be needed for snapshot-based mirroring.

Credits and thanks to Nithya and Ilya for highlighting it.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: avoid stuck group on secondary
Prasanna Kumar Kalever [Thu, 6 Mar 2025 07:19:15 +0000 (12:49 +0530)]
rbd-mirror: avoid stuck group on secondary

The following steps leaves stale group on seondary left undeleted,
1. Create and mirror enable a group with 2 images.
2. Let it sync to the secondary
3. Demote on the primary and promote on the secondary
4. Wait until it starts replaying on the original primary
5. Delete the group on the new primary

Credits to Nithya Balachandran for highlighting the issue with detailed steps.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: fix issues around resync
Prasanna Kumar Kalever [Wed, 5 Mar 2025 17:24:56 +0000 (22:54 +0530)]
rbd-mirror: fix issues around resync

Following issues are fixed:
* Allow to flag resync, but if remote is not primary do not resync or
  even delete the local group.
   - Wait for remote to turn primary, if it turns primary, then continue
     to resync.
   - Just in case if the same site is made primary right after issuing
     resync, then clear that flag immediatly.
* Revert some old code in PoolWatcher, unintentional edits/changes.
* Do not send MIRROR_GROUP_STATE_DISABLED notification from group_resync
  API, this will lead to release_group(). Credits to Nithya Balachandran
  for pointing about this notification deatils.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd_mirror: avoid rescans in busy loop to detect new snapshots
Prasanna Kumar Kalever [Tue, 4 Mar 2025 14:26:33 +0000 (19:56 +0530)]
rbd_mirror: avoid rescans in busy loop to detect new snapshots

Instead move the state to STATE_IDLE once the snapshot limits cannot be met
and move back to STATE_REPLAYING on a call from group_replayer to
set_remote_snap_id_end_limit()

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: further improve smoke and enable more tests
Prasanna Kumar Kalever [Mon, 3 Mar 2025 09:24:54 +0000 (14:54 +0530)]
qa/workunits/rbd: further improve smoke and enable more tests

-> Enable test_force_promote_delete_group() test.

-> Move test_enable_disable_repeat to the end, as there seems to be some
residual groups on secondary left uncleaned which might trouble other tests.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: early detect if the mirroring instance match global group id
Prasanna Kumar Kalever [Mon, 3 Mar 2025 09:17:54 +0000 (14:47 +0530)]
rbd-mirror: early detect if the mirroring instance match global group id

In a case where previous primary cluster is still alive,

-> Force promote site-b while site-a is still alive
-> Disable group on site-b
-> Re-enable group on site-b, while site-a is still alive

the group is getting removed. To avoid any such cases, better to makesure
the instance match with local group id, before taking any actions.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: improve smoke and enable more tests
Prasanna Kumar Kalever [Sun, 2 Mar 2025 12:48:15 +0000 (18:18 +0530)]
qa/workunits/rbd: improve smoke and enable more tests

* Make to start the right daemons
* Add tidy-up sequences couple of tests
* Enable below tests:
1. test_stopped_daemon
2. test_image_move_group
3. test_create_group_with_image_remove_then_repeat
4. test_multiple_user_snapshot_time
5. test_force_promote_scenarios -s 2
6. test_force_promote_scenarios -s 3
7. test_create_group_stop_daemon_then_recreate_scenarios -s 2
8. test_create_group_stop_daemon_then_recreate_scenarios -s 3

With the changes in the previous commits, along with this commit, the smoke
tests show stable behaviour.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd: defend for primary as part of disable request
Prasanna Kumar Kalever [Fri, 28 Feb 2025 19:44:17 +0000 (01:14 +0530)]
librbd: defend for primary as part of disable request

Other wise we will procceed with disable in below case and end-up not
able to promote the same later:

[root@dhcp53-181 build]#  ./bin/rbd --cluster site-a mirror group demote test_pool/test_group --debug-rbd=0
Group demoted to non-primary
[root@dhcp53-181 build]# ./bin/rbd --cluster site-a mirror group disable test_pool/test_group --debug-rbd=0
2025-03-01T00:46:55.214+0530 7f1cdd71a6c0 -1 librbd::mirror::DisableRequest: 0x56213d4b8f10 handle_get_mirror_info: mirrored image is not primary, add force option to disable mirroring
2025-03-01T00:46:55.215+0530 7f1ce4c39b80 -1 librbd::api::Mirror: image_disable: cannot disable mirroring: (22) Invalid argument
2025-03-01T00:46:55.218+0530 7f1ce4c39b80 -1 librbd::api::Mirror: group_disable: failed to disable mirroring on image: test_image1(22) Invalid argument
2025-03-01T00:46:55.238+0530 7f1cddf1b6c0 -1 librbd::mirror::DisableRequest: 0x56213d17d790 handle_get_mirror_info: mirrored image is not primary, add force option to disable mirroring
2025-03-01T00:46:55.238+0530 7f1ce4c39b80 -1 librbd::api::Mirror: image_disable: cannot disable mirroring: (22) Invalid argument
2025-03-01T00:46:55.241+0530 7f1ce4c39b80 -1 librbd::api::Mirror: group_disable: failed to disable mirroring on image: test_image2(22) Invalid argument
2025-03-01T00:46:55.265+0530 7f1cdd71a6c0 -1 librbd::mirror::DisableRequest: 0x56213d52f290 handle_get_mirror_info: mirrored image is not primary, add force option to disable mirroring
2025-03-01T00:46:55.265+0530 7f1ce4c39b80 -1 librbd::api::Mirror: image_disable: cannot disable mirroring: (22) Invalid argument
2025-03-01T00:46:55.269+0530 7f1ce4c39b80 -1 librbd::api::Mirror: group_disable: failed to disable mirroring on image: test_image3(22) Invalid argument
2025-03-01T00:46:55.293+0530 7f1cddf1b6c0 -1 librbd::mirror::DisableRequest: 0x56213d544620 handle_get_mirror_info: mirrored image is not primary, add force option to disable mirroring
2025-03-01T00:46:55.293+0530 7f1ce4c39b80 -1 librbd::api::Mirror: image_disable: cannot disable mirroring: (22) Invalid argument
2025-03-01T00:46:55.297+0530 7f1ce4c39b80 -1 librbd::api::Mirror: group_disable: failed to disable mirroring on image: test_image4(22) Invalid argument
2025-03-01T00:46:55.316+0530 7f1ce4c39b80 -1 librbd::api::Mirror: group_disable: failed to disable one or more images: (22) Invalid argument
[root@dhcp53-181 build]#  ./bin/rbd --cluster site-a mirror group promote test_pool/test_group --debug-rbd=0
rbd: mirroring not enabled on the group

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: fix bootstrapping
Prasanna Kumar Kalever [Fri, 28 Feb 2025 14:47:01 +0000 (20:17 +0530)]
rbd-mirror: fix bootstrapping

Issue I:
As part of the remove_local_mirror_group if local mirror group global_group_id
doesn't match with GroupReplayer instance (m_global_group_id), then
remove_local_group()

In a case where the daemon is down then group is disabled then removed/added
images then re-enabled groups and then brought the daemon back to life,
the Groupreplayer instances belonging to same group name will mess
leading to path of create_local_mirror_group(), which is wrong.

Issue II:
Also, in cases where local mirror group global_group_id doesn't match with
GroupReplayer instance, if there are ENOENT errors in the bootstrapping
then retry the bootstrapping.

With out this fix, this will lead to group_replayer destroy of a valid instance.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/api: disallow adding mirror enabled image to group
Ramana Raja [Thu, 20 Feb 2025 22:21:34 +0000 (17:21 -0500)]
librbd/api: disallow adding mirror enabled image to group

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/api: disallow mirror image operations on a group's member image
Ramana Raja [Thu, 20 Feb 2025 00:09:32 +0000 (19:09 -0500)]
librbd/api: disallow mirror image operations on a group's member image

Disallow the following mirror image APIs when called directly on an
image that is member of a group:
- mirror image demote
- mirror image disable
- mirror image enable
- mirror image promote
- mirror image resync
- mirror image snapshot

Only allow mirror operations on a group's member image via the mirror
group APIs.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agoqa/workunits/rbd: add new tests and improve existing
John Agombar [Thu, 13 Feb 2025 17:20:01 +0000 (17:20 +0000)]
qa/workunits/rbd: add new tests and improve existing

Change admin socket mirror group status checks to query status on normal CLI too
Disable test_stopped_daemon test whichhas intermittent failures.
Remove sleep 5 which is no longer needed in RBD mirror group tests
Fix test_image_replay_state() helper function to work without SHOW_CLI_CMD env variable set

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: align GroupReplayer register admin_socket_hook with ImageReplayer
Prasanna Kumar Kalever [Tue, 25 Feb 2025 11:46:15 +0000 (17:16 +0530)]
rbd-mirror: align GroupReplayer register admin_socket_hook with ImageReplayer

This will fix the following failing tests with asok issues:

rbd_mirror_group_simple.sh:
+ testlog 'TEST:test_multiple_user_snapshot_time scenario:1 parameters:'

rbd_mirror_group_group.sh:
+ testlog "TEST: add a large image to group and test replay"

Also this fix replace the set_finished(), which was removed in the previous
commit, which will cause a regression in the GroupReplayer destroy code path.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: fixes multiple issues in the group replayer
N Balachandran [Fri, 21 Feb 2025 06:17:19 +0000 (11:47 +0530)]
rbd-mirror: fixes multiple issues in the group replayer

The commit includes the following:
- Fixed crashes in the start/stop in GroupReplayer
- Fixed crashes in the shut_down sequence in group_replayer::Replayer
- ImageMap will now send release_group notifications for non-empty
  groups.
- InstanceReplayer no longer checks if the GroupReplayer needs to be
  restarted. The GroupReplayer will stop itself if it determines that it
  needs to be restarted.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agolibrbd: remove mirror APIs that change mirror group membership
Ramana Raja [Tue, 4 Feb 2025 00:55:08 +0000 (19:55 -0500)]
librbd: remove mirror APIs that change mirror group membership

Remove mirror APIs, group_image_add() and group_image_remove() that
are never called as we don't allow adding/removing images to/from a
mirrored group.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: disallow add/remove image to/from mirror enabled group
Ramana Raja [Sun, 2 Feb 2025 17:28:52 +0000 (12:28 -0500)]
librbd: disallow add/remove image to/from mirror enabled group

Do not allow adding/removing images to/from a group that is configured
for mirroring.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agorbd-mirror: fix crashes in unittest_rbd_mirror
Vinay Bhaskar Varada [Wed, 19 Feb 2025 08:24:15 +0000 (13:54 +0530)]
rbd-mirror: fix crashes in unittest_rbd_mirror

Signed-off-by: Vinay Bhaskar Varada <vvarada@redhat.com>
2 months agorbd-mirror: do not move the images to trash while the disabling is in progress
Prasanna Kumar Kalever [Tue, 18 Feb 2025 07:42:58 +0000 (13:12 +0530)]
rbd-mirror: do not move the images to trash while the disabling is in progress

Images cannot be moved to trash if the state is disabling because its a
transient state where some of the images might have got the oportunity to
disable and some of them part of the group might still be enabled
waiting for the oportunity while a group disable is in progress.

So we wait until the state DISABLING moves to next state, and see if there are
any stale image to move into a trash queue later.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: bootstrap wait for previous disabling group to cleanup
Prasanna Kumar Kalever [Mon, 17 Feb 2025 10:32:06 +0000 (16:02 +0530)]
rbd-mirror: bootstrap wait for previous disabling group to cleanup

Was seeing a case where the following operations are done:
1. daemon is stopped on secondary
2. then mirroring on the group is disabled
3. added/removed image[s] to/from the group
4. enabled group back for mirroring
5. Mirroring daemon is brought back to life

From the handling:
1. Two GroupReplayer's are started by the InstanceReplayer, one for old group
   and one for new group (not surprisingly both deal with the same pool images)
2. The GroupReplayer for old group instance enters into
   group_replayer::BootstrapRequest, notices remote_group_id is not found, and
   starts cleaning-up the group, """tries to remove local group and all the
   images. Finally returns to GroupReplayer, stop the GroupReplayer setting
   the state as stopped with description group removed and finally unregister
   admin socket hook."""
3. On the other hand the GroupReplayer for new group instance runs in concurrent
   to the old one, figures out local group_id by name exists and """tries to
   remove local group and all the images. Finally returns to GroupReplayer,
   stop the GroupReplayer setting the state as stopped with description group
   removed and finally unregister admin socket hook."""

You can see 2 and 3 are ending up in the same situation because of the
concurrent behaviour. i.e one has to add the group with a name and create
images in the pool. Where as the other has to remove the group with same name
from the same pool.

Thanks to Ilya for the suggestion here, according to the suggestion the
fix is simple. The way this is handled for standalone images is that the
second replayer (i.e. (3)) sees that the image is in MIRROR_IMAGE_STATE_DISABLING
state and backs off (i.e.second group waits and retries later).

If the second replayer backs off with ERESTART, the first replayer should
eventually clean up the old group which would allow the second replayer to
proceed with creating a new group.

fixes: issue#27

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd_mirror: cleanup group status keys in the rbd_mirroring object
Prasanna Kumar Kalever [Fri, 14 Feb 2025 10:51:10 +0000 (16:21 +0530)]
rbd_mirror: cleanup group status keys in the rbd_mirroring object

Keys & Values for "gremote_status_global_*" and "gstatus_global_*" are
getting readded in the rbd_mirroring object after they were removed at
group disable time as part of group_status_set(), as it doesn't defend
for disabled groups today.

Also librbd::cls_client::mirror_group_status_remove_down() was added in
the code but not levearaged, hence added code for unhappy path cleanup
as part of MirrorStatusWatcher::init() like it is calls
librbd::cls_client::mirror_image_status_remove_down() today.

fixes: issue#16

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agoqa/workunits/rbd: enable more tests that pass with latest changes
John Agombar [Wed, 12 Feb 2025 15:41:30 +0000 (15:41 +0000)]
qa/workunits/rbd: enable more tests that pass with latest changes

Also,
- Support image multiplier to allow tests to run with more images
- Change cli to allow image multiplier to be specified

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agolibrbd: fixed a crash in GroupUnlinkPeerRequest
N Balachandran [Thu, 13 Feb 2025 04:58:18 +0000 (10:28 +0530)]
librbd: fixed a crash in GroupUnlinkPeerRequest

Fixed a crash caused by using the wong ceph context when using the
python bindings to create mirror group snaps .

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agorbd_mirror: fix resync failure
Prasanna Kumar Kalever [Wed, 12 Feb 2025 14:24:44 +0000 (19:54 +0530)]
rbd_mirror: fix resync failure

Steps to reproduce:
$ rbd --cluster site-b mirror group promote test_pool/test_group --force
$ rbd --cluster site-a mirror group demote test_pool/test_group
$ rbd --cluster site-a mirror group resync test_pool/test_group

$ rbd --cluster site-b mirror group status test_pool/test_group

The group snapshots are are not re-syncing. And the group status shows image
snap as syncing always.

fixes: issue#11

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd/deep_copy: rename .group snapshot according with local information
Prasanna Kumar Kalever [Wed, 12 Feb 2025 06:18:08 +0000 (11:48 +0530)]
librbd/deep_copy: rename .group snapshot according with local information

.group snapshots are image snapshots part of user group snapshots

Today these snapshots follow naming:
".group." + primary_pool_id + "_" + primary_group_id + "_" + group_snap_id

[Note: group_snap_id is same on primary on secondary]

With this change the above naming will change as:
".group." + local_pool_id + "_" + local_group_id + "_" + group_snap_id

i.e pool_id and group_id differ on primary and seondary and hence names differ.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd_mirror: update local {group_pool, group_pool} in ImageSnapshotNamespaceGroup
Prasanna Kumar Kalever [Tue, 11 Feb 2025 17:55:54 +0000 (23:25 +0530)]
rbd_mirror: update local {group_pool, group_pool} in ImageSnapshotNamespaceGroup

* Fix user group snapshot not moving to complete when the pool_id differ on
  remote and locally.
* The Image snapshot namespace ImageSnapshotNamespaceGroup is copied
  from remote src directly to dst locally and {group_pool, group_pool}
  still hold remote details. This fix updates the namespace in the
  image snapshot.

fixes: issue#6

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd: fix group snapshot unlink
N Balachandran [Mon, 10 Feb 2025 15:10:07 +0000 (20:40 +0530)]
librbd: fix group snapshot unlink

Changes to the group mirror snapshot unlink:
- Fixes the group mirror snapshot unlink to behave like the
  image mirror unlink.
- Renames UnlinkGroupPeerRequest to GroupUnlinkPeerRequest
  and moves it into librbd/mirror/snapshot.
- Modifies prepare_group_images() to return the mirror_peer_uuids
  which are then passed as an argument to GroupUnlinkPeerRequest.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agoqa/workunits/rbd: improvements to smoke tests
John Agombar [Thu, 6 Feb 2025 12:03:36 +0000 (12:03 +0000)]
qa/workunits/rbd: improvements to smoke tests

- Remove dynamic group behaviour in rbd_mirror_group.sh tests
- Add test for group enable/disable after force promote
- Test new fields in group info cmd

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: add peer_uuids for non-primary demoted group snapshot
Prasanna Kumar Kalever [Mon, 10 Feb 2025 14:16:47 +0000 (19:46 +0530)]
rbd-mirror: add peer_uuids for non-primary demoted group snapshot

GroupReplayer should add peer uuid for group snapshot if it is a
non-primary demoted snapshot, other wise this snapshot will be
unconditionally unlinked further, as doesn't have a peer uuid leading to
split-brain scenarios.

Credits to N Balachandran <nithya.balachandran@ibm.com> for the find.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agolibrbd: fix remove_interim_snapshots()
N Balachandran [Fri, 7 Feb 2025 04:41:21 +0000 (10:11 +0530)]
librbd: fix remove_interim_snapshots()

Fixed remove_interim_snapshots() to remove the image
snaps.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agorbd: rbd group info now displays mirroring info
N Balachandran [Thu, 6 Feb 2025 14:33:02 +0000 (20:03 +0530)]
rbd: rbd group info now displays mirroring info

The "rbd group info" will now display the mirroring mode, global-id
and whether primary for mirror enabled groups.

Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
2 months agoqa/workunits/rbd: updates to mirror group bash scripts
John Agombar [Thu, 30 Jan 2025 13:04:13 +0000 (13:04 +0000)]
qa/workunits/rbd: updates to mirror group bash scripts

- support cli parameters to specify the test to run
- support cli parameter to specify the number of times to repeat the test
- new tests
- added RBD_MIRROR_NEW_IMPLICIT_BEHAVIOUR env variable in preparation for
  changes to group snapshot behaviour

Signed-off-by: John Agombar <agombar@uk.ibm.com>
2 months agorbd-mirror: improvements to unlink group snapshot
Prasanna Kumar Kalever [Thu, 6 Feb 2025 06:01:48 +0000 (11:31 +0530)]
rbd-mirror: improvements to unlink group snapshot

1. If group snapshot is syncing do not remove mirror peer uuid of last complete
   snapshot [ i.e. currently incomlpete - 1] (on remote)
2. If group snapshot is synced do not remove mirror peer uuid of its
   respective on remote yet.

both will lead to split brain issues.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agorbd-mirror: synchronize image replayer and group replayer
Prasanna Kumar Kalever [Thu, 6 Feb 2025 08:10:01 +0000 (13:40 +0530)]
rbd-mirror: synchronize image replayer and group replayer

Here on, image replayer waits for group_replayer to set remote image snap
end id inorder for image replayer to proceed with the image snapshot copy.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2 months agomgr/rbd_support: create mirror gp snap asynchronously
Ramana Raja [Fri, 24 Jan 2025 20:04:05 +0000 (15:04 -0500)]
mgr/rbd_support: create mirror gp snap asynchronously

... in the mirror group snapshot scheduler.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: add C and Python bindings for async mirror group snapshot create
Ramana Raja [Wed, 22 Jan 2025 17:26:57 +0000 (12:26 -0500)]
librbd: add C and Python bindings for async mirror group snapshot create

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/group/UnlinkPeerGroupRequest: convert sync to async calls
Ramana Raja [Sun, 26 Jan 2025 06:30:50 +0000 (01:30 -0500)]
librbd/group/UnlinkPeerGroupRequest: convert sync to async calls

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/mirror/snapshot: add async create mirror group snapshot
Ramana Raja [Sun, 12 Jan 2025 21:01:23 +0000 (16:01 -0500)]
librbd/mirror/snapshot: add async create mirror group snapshot

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd/mirror/snapshot: add async group prepare images
Ramana Raja [Sun, 12 Jan 2025 20:56:20 +0000 (15:56 -0500)]
librbd/mirror/snapshot: add async group prepare images

... state machine class.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: add C and python bindings for async mirror gp snap info
Ramana Raja [Fri, 24 Jan 2025 23:34:06 +0000 (18:34 -0500)]
librbd: add C and python bindings for async mirror gp snap info

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: add async state machine for "mirror_group_get_info"
Ramana Raja [Fri, 24 Jan 2025 22:29:51 +0000 (17:29 -0500)]
librbd: add async state machine for "mirror_group_get_info"

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agomgr/rbd_support: disallow scheduling mirror snapshots
Ramana Raja [Thu, 12 Dec 2024 21:38:29 +0000 (16:38 -0500)]
mgr/rbd_support: disallow scheduling mirror snapshots

... of images that are part of a group.

Creating a global, pool or namespace level mirror snapshot
schedule shouldn't schedule mirror snapshots of images that
are part of a group and reside in the pool or namespace.

Also disallow directly scheduling mirror image snapshots on
images that are part of a group.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agoqa/workunits/rbd: add basic tests for mirror group snapshot scheduler
Ramana Raja [Tue, 5 Nov 2024 16:12:54 +0000 (11:12 -0500)]
qa/workunits/rbd: add basic tests for mirror group snapshot scheduler

Add tests to check the basic functionality of the
mirror_group_snapshot_schedule module. Check that
- `rbd mirror group snapshot schedule add/rm/status/ls` commands work
- the module can recover from blocklisting of its client and continue
  to process requests

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agotools/rbd: add CLI to call into mirror group snapshot scheduler
Ramana Raja [Wed, 23 Oct 2024 17:20:37 +0000 (13:20 -0400)]
tools/rbd: add CLI to call into mirror group snapshot scheduler

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agopybind/mgr/rbd_support: add scheduler for mirror group snapshots
Ramana Raja [Sun, 20 Oct 2024 15:54:55 +0000 (11:54 -0400)]
pybind/mgr/rbd_support: add scheduler for mirror group snapshots

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: add API to fetch group name using group ID
Ramana Raja [Fri, 1 Nov 2024 22:02:16 +0000 (18:02 -0400)]
librbd: add API to fetch group name using group ID

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agolibrbd: add python and C bindings for group_list2
Ramana Raja [Sun, 20 Oct 2024 23:50:04 +0000 (19:50 -0400)]
librbd: add python and C bindings for group_list2

... that lists IDs and names of groups in a pool.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2 months agopybind/rbd: add interface to create mirror group snapshot
Ramana Raja [Tue, 6 Aug 2024 22:34:59 +0000 (18:34 -0400)]
pybind/rbd: add interface to create mirror group snapshot

... synchronously.

Signed-off-by: Ramana Raja <rraja@redhat.com>