Problem:
GroupCreatePrimaryRequest doesn't remove group snapshot when group
snapshot creation encounters an error in notify_quiesce(). As a result,
INCOMPLETE snapshots from previous failed attempts remain uncleaned.
Log snippet:
librbd::watcher::Notifier: 0x7fbdac0168b0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest: handle_notify_quiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest: notify_unquiesce:
librbd::watcher::Notifier: 0x7fbda83c59a0 handle_notify: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest: handle_notify_unquiesce: r=-110
librbd::mirror::snapshot::GroupCreatePrimaryRequest: handle_notify_unquiesce: failed to notify the unquiesce requests: (110) Connection timed out
librbd::mirror::snapshot::GroupCreatePrimaryRequest: close_images:
librbd::mirror::snapshot::GroupCreatePrimaryRequest: handle_close_images: r=0
librbd::mirror::snapshot::GroupCreatePrimaryRequest: finish: r=-110
When snapshot creation fails, the remove snap path that cleans the snapshot is
skipped, leaving behind INCOMPLETE snapshot entries.
Solution:
Ensure remove_snap_metadata() is executed on failed to quience scenario like
above, allowing INCOMPLETE snapshot to be consistently cleaned up.
Note:
Another issue identified and fixed around GroupUnlinkPeerRequest::remove_peer_uuid(),
i.e in case of INCOMPLETE snapshot, group_snap_set() is expected to return
EEXIST error, and that is now handled.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Resolves: rhbz#
2415401
if (r < 0 &&
(m_snap_create_flags & SNAP_CREATE_FLAG_IGNORE_NOTIFY_QUIESCE_ERROR) == 0) {
+ lderr(m_cct) << "failed to notify the quiesce requests: "
+ << cpp_strerror(r) << dendl;
m_ret_code = r;
- notify_unquiesce();
+ remove_snap_metadata();
return;
}
const auto& ns = std::get<cls::rbd::GroupSnapshotNamespaceMirror>(
group_snap.snapshot_namespace);
- if (ns.mirror_peer_uuids.empty()) {
+ if (ns.mirror_peer_uuids.empty() ||
+ group_snap.state == cls::rbd::GROUP_SNAPSHOT_STATE_INCOMPLETE) {
remove_group_snapshot(group_snap);
} else {
+ // Note: avoid calling remove_peer_uuid() for INCOMPLETE snapshots as
+ // group_snap_set() returns EEXIST error
remove_peer_uuid(group_snap, mirror_peer_uuid);
}
}