From: Ilya Dryomov <idryomov@gmail.com>
Date: Sun, 1 Mar 2026 21:55:52 +0000 (+0100)
Subject: qa/workunits/rbd: short-circuit status() if "ceph -s" fails
X-Git-Tag: v21.0.0~147^2~1
X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=82717e43a08a1262987f5e271fd72d4433c4fb3b;p=ceph.git

qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---

diff --git a/qa/workunits/rbd/rbd_mirror_helpers.sh b/qa/workunits/rbd/rbd_mirror_helpers.sh
index a069853fb71..f5d7fe92624 100755
--- a/qa/workunits/rbd/rbd_mirror_helpers.sh
+++ b/qa/workunits/rbd/rbd_mirror_helpers.sh
@@ -514,7 +514,11 @@ status()
     for cluster in ${CLUSTER1} ${CLUSTER2}
     do
         echo "${cluster} status"
-        CEPH_ARGS='' ceph --cluster ${cluster} -s
+        # if "ceph -s" fails, assume that the cluster is broken or
+        # unavailable and skip gathering details for it
+        CEPH_ARGS='' ceph --cluster ${cluster} -s || continue
+
+        echo "${cluster} service status"
         CEPH_ARGS='' ceph --cluster ${cluster} service dump
         CEPH_ARGS='' ceph --cluster ${cluster} service status
         echo