In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}
In this scenario all commands that are invoked from the loop body
are going to time out anyway.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
for cluster in ${CLUSTER1} ${CLUSTER2}
do
echo "${cluster} status"
- CEPH_ARGS='' ceph --cluster ${cluster} -s
+ # if "ceph -s" fails, assume that the cluster is broken or
+ # unavailable and skip gathering details for it
+ CEPH_ARGS='' ceph --cluster ${cluster} -s || continue
+
+ echo "${cluster} service status"
CEPH_ARGS='' ceph --cluster ${cluster} service dump
CEPH_ARGS='' ceph --cluster ${cluster} service status
echo