git-server-git.apps.pok.os.sepia.ceph.com Git

librbd: support group rollback to a snapshot with different membership

support rollback to a snapshot with a different group membership, which is
required for the semi-dynamic groups feature.

one scenario where this is needed is when a force-promote operation is issued
on a secondary node while it is synchronizing a snapshot that contains
add-image information.

this commit also adds the required test cases for the semi-dynamic groups
add-image capability, along with a dedicated test covering rollback after
an image has been added to the group.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: implement semi-dynamic groups with support for adding images

this allows images to be added to an already enabled mirror group without
disabling and re-enabling the group. As a result, it avoids the overhead of
re-mirroring the entire group over the network.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: allow resync while a group snapshot is still syncing

currently we do not allow resync operation if the snapshot is still inprogress
to sync until its fully done. This means that if snapshot synchronization
becomes stuck for any reason, a resync cannot be triggered, resulting in an
undesirable operational limitation.

this change enables resync requests to be processed even when a group snapshot
is still syncing, allowing resync in the middle of syncing a group snapshot.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Merge pull request #67164 from imran-imtiaz/wip-pybind-mirror-group-control

pybind/rbd: add Python bindings for mirror group control

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

pybind/rbd: add Python bindings for mirror group control

Add methods to the rbd.Group class for mirror group operations:
- mirror_group_enable(mode) / mirror_group_disable(force)
- mirror_group_promote(force) / mirror_group_demote()
- mirror_group_resync()
- mirror_group_get_global_status()
- mirror_group_get_instance_id()

Also adds the required Cython declarations in c_rbd.pxd for
rbd_mirror_group_status_state_t, rbd_mirror_group_site_status_t,
and rbd_mirror_group_global_status_t structs.

Fixes: https://tracker.ceph.com/issues/74707
Signed-off-by: Imran Imtiaz <imran.imtiaz@uk.ibm.com>

Merge pull request #66600 from pkalever/fix-resync-on-sync

rbd-mirror: allow resync while a group snapshot is still syncing

Reviewed-by: VinayBhaskar-V <vvarada@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>

rbd-mirror: allow resync while a group snapshot is still syncing

currently we do not allow resync operation if the snapshot is still inprogress
to sync until its fully done. This means that if snapshot synchronization
becomes stuck for any reason, a resync cannot be triggered, resulting in an
undesirable operational limitation.

this change enables resync requests to be processed even when a group snapshot
is still syncing, allowing resync in the middle of syncing a group snapshot.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Merge pull request #66653 from pkalever/remove-creating-snaps-on-restart

rbd-mirror: fix stuck to sync mirror group snaps on restart of daemon

Reviewed-by: VinayBhaskar-V <vvarada@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>

cleanup: minor improvements throughout the replayer

Defined below routines which makes calls to image replayers:
prune_image_snapshot()
get_replayers_by_image_id()
set_image_replayer_end_limits()

this commit start using them.

Also use get_replayers_by_image_id() in other places of group replayer

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: prune group snapshots stuck in CREATING state on restart

after a daemon restart, prune the entire group snapshot if it remains in
GROUP_SNAPSHOT_STATE_CREATING. This aligns group snapshot handling with the
image replayer logic and ensures that all member image snapshots are cleanly
deleted and recreated.

This is required because some image snapshots in the group may not have started
object copying prior to the restart, which can otherwise lead to missing image
state, object-map, or related metadata.

Also, add the necessary tests to validate interrupted synchronization during
group snapshot syncing. Specifically, cover the following scenarios:
Scenario 1: The snapshot on the secondary is in the creating phase when the
daemon is restarted.
Scenario 2: The snapshot on the secondary is in the created phase when the
daemon is restarted.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: avoid deleting image snapshots that are part of a group snapshot

On daemon restart, the image replayer currently deletes and recreates image
snapshots if object copying has not yet started, in order to avoid missing
image state such as object-map or metadata.

This logic is unnecessary for image snapshot part of mirror group snapshots. By
the time a group snapshot reaches GROUP_SNAPSHOT_STATE_CREATED, all member
image snapshots are already guaranteed to be in the CREATED state. Deleting
such image snapshots provides no benefit and can cause group snapshots to
become stuck (in current case) waiting for such image snapshots.

Skip image snapshot deletion when the snapshot is part of a group snapshot.
A follow-up commit will address handling group snapshots that remain in
GROUP_SNAPSHOT_STATE_CREATING across a daemon restart by deleting and
allowing the syncing recreating the group snapshot as a whole.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

cleanup: simplify group image snapshot validation

validation of an image snapshot association with a group snapshot requires
checking either a valid group_spec or the presence of a group_snap_id
in cls::rbd::MirrorSnapshotNamespace.

this commit tries to remove redundant validation of checking for both and rely
on this minimal condition of checking for a valid group_spec.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Merge pull request #66535 from pkalever/restore-readonly-check

librbd: restore readonly image check in snap_remove()

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

Merge pull request #66534 from pkalever/avoid-unnecessary-log

rbd-mirror: suppress unnecessary log message during snapshot unlink

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

librbd: restore readonly image check in snap_remove()

The check preventing snapshot removal on read-only images was previously
commented out. This commit restores the original behavior to ensure that
snap_remove() correctly rejects operations on images that are not writable.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: suppress unnecessary log message during snapshot unlink

remote snapshots without a mirror peer UUID are filtered out early. Once the
peer UUID is removed from a remote snapshot, it no longer appears in
m_remote_group_snaps locally. As a result, mirror_group_snapshot_unlink_peer()
will not find that snapshot in m_remote_group_snaps, and this condition is
most expected now.

avoid printing the log message that warns about the snapshot not being present,
as this is not the true case (as it is filtered) and can be misleading.
Moreover, the message would otherwise be printed repeatedly until the local
snapshot is eventually removed, creating unnecessary noise in the logs.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Merge pull request #66424 from VinayBhaskar-V/wip-fix-test

qa/workunits/rbd: fix check_snapshot_info in rbd_groups.sh

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

qa/workunits/rbd: fix check_snapshot_info in rbd_groups.sh

Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>

Merge pull request #65164 from VinayBhaskar-V/wip-add-complete-field

rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

rbd-mirror: integration of the new GroupSnapshotNamespaceMirror::complete field

This commit introduces the new field **complete**, of type **MirrorGroupSnapshotCompleteState** enum,
to the GroupSnapshotNamespaceMirror structure. This change is necessary to align behavior of
mirror group snapshots with that of mirror image snapshots, allowing for a precise differentiation
between a group snapshot that has been created and one that has been fully synced.

**1. Handling Old-Style Snapshots**

Decoding Old Snapshots: The original GroupSnapshotNamespaceMirror structure lacked the complete field,
which implicitly defaulted to a bool value of false upon initialization.
When an old snapshot (lacking the complete field) is decoded by an upgraded client,
the implicit default value maps to MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED.

Completion Check: A snapshot is determined old by checking it's complete filed i.e
complete == MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED and if it's old the sync completion
for these group snapshots is determined by checking the state field
i.e state == GROUP_SNAPSHOT_STATE_CREATED.

During a upgrade where **OSDs have not yet been updated**, the new client will be forced to create
snapshots using the old style. These snapshots will be initialized with MIRROR_GROUP_SNAPSHOT_COMPLETE_IF_CREATED
and will stay on that to prevent immediate, incorrect cleanup by the old OSDs and in this case
state field is set to **GROUP_SNAPSHOT_STATE_CREATED** only after snapshot completed it's sync.

**2. Handling New-Style Snapshots**

New snapshots are initialized with complete == **MIRROR_GROUP_SNAPSHOT_INCOMPLETE**,
state == GROUP_SNAPSHOT_STATE_CREATING. The group snapshot's state is marked as GROUP_SNAPSHOT_STATE_CREATED
as soon as its metadata is fully available and stored.

Completion Check: The snapshot's sync is confirmed only when complete == MIRROR_GROUP_SNAPSHOT_COMPLETE
along with state check (state == GROUP_SNAPSHOT_STATE_CREATED) is satisfied.

This approach ensures seamless transition and compatibility, allowing the system to correctly interpret the
synchronization status of both old and newly created group snapshots.

Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>

Merge pull request #66315 from pkalever/clean-in-complete-snaps

librbd: fix incomplete group snapshot not being removed on creation failure

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

librbd: fix incomplete group snapshot not being removed on creation failure

Problem:
GroupCreatePrimaryRequest doesn't remove group snapshot when group
snapshot creation encounters an error in notify_quiesce(). As a result,
INCOMPLETE snapshots from previous failed attempts remain uncleaned.

Log snippet:
librbd::watcher::Notifier: 0x7fbdac0168b0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_quiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  notify_unquiesce:
librbd::watcher::Notifier: 0x7fbda83c59a0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_unquiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_notify_unquiesce: failed to notify the unquiesce requests: (110) Connection timed out
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  close_images:
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  handle_close_images: r=0
librbd::mirror::snapshot::GroupCreatePrimaryRequest:  finish: r=-110

When snapshot creation fails, the remove snap path that cleans the snapshot is
skipped, leaving behind INCOMPLETE snapshot entries.

Solution:
Ensure remove_snap_metadata() is executed on failed to quience scenario like
above, allowing INCOMPLETE snapshot to be consistently cleaned up.

Note:
Another issue identified and fixed around GroupUnlinkPeerRequest::remove_peer_uuid(),
i.e in case of INCOMPLETE snapshot, group_snap_set() is expected to return
EEXIST error, and that is now handled.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Merge pull request #65958 from VinayBhaskar-V/wip-sync-demote

rbd-mirror: allow incomplete group demote snapshot to sync after rbd-mirror daemon restart

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>