]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
authorIlya Dryomov <idryomov@gmail.com>
Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)
committerIlya Dryomov <idryomov@gmail.com>
Tue, 3 Mar 2026 10:39:59 +0000 (11:39 +0100)
In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
qa/workunits/rbd/rbd_mirror_helpers.sh

index a069853fb71a2aa88f9c915b5c35661b81aea238..f5d7fe9262495a2b9f5f2600e8975b8e21f375d7 100755 (executable)
@@ -514,7 +514,11 @@ status()
     for cluster in ${CLUSTER1} ${CLUSTER2}
     do
         echo "${cluster} status"
-        CEPH_ARGS='' ceph --cluster ${cluster} -s
+        # if "ceph -s" fails, assume that the cluster is broken or
+        # unavailable and skip gathering details for it
+        CEPH_ARGS='' ceph --cluster ${cluster} -s || continue
+
+        echo "${cluster} service status"
         CEPH_ARGS='' ceph --cluster ${cluster} service dump
         CEPH_ARGS='' ceph --cluster ${cluster} service status
         echo