]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected
authorIlya Dryomov <idryomov@gmail.com>
Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)
committerIlya Dryomov <idryomov@gmail.com>
Tue, 3 Mar 2026 10:39:59 +0000 (11:39 +0100)
In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
qa/workunits/rbd/rbd_mirror_fsx_compare.sh

index 79c36546d4fb0438246f94f465991d09a839fab1..856e168e1d5c0abf9a80070ba095bbc99bd97241 100755 (executable)
@@ -6,6 +6,7 @@
 #
 
 set -ex
+set -o pipefail
 
 . $(dirname $0)/rbd_mirror_helpers.sh
 
@@ -14,11 +15,12 @@ trap 'cleanup $?' INT TERM EXIT
 setup_tempdir
 
 testlog "TEST: wait for all images"
-image_count=$(rbd --cluster ${CLUSTER1} --pool ${POOL} ls | wc -l)
+expected_image_count=$(rbd --cluster ${CLUSTER1} --pool ${POOL} ls | wc -l)
 retrying_seconds=0
 sleep_seconds=10
 while [ ${retrying_seconds} -le 7200 ]; do
-    [ $(rbd --cluster ${CLUSTER2} --pool ${POOL} ls | wc -l) -ge ${image_count} ] && break
+    actual_image_count=$(rbd --cluster ${CLUSTER2} --pool ${POOL} ls | wc -l)
+    [ ${actual_image_count} -ge ${expected_image_count} ] && break
     sleep ${sleep_seconds}
     retrying_seconds=$(($retrying_seconds+${sleep_seconds}))
 done