From: Ilya Dryomov Date: Sun, 1 Mar 2026 16:45:51 +0000 (+0100) Subject: qa: rbd_mirror_fsx_compare.sh doesn't error out as expected X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2;p=ceph.git qa: rbd_mirror_fsx_compare.sh doesn't error out as expected In mirror-thrash tests, one of the clusters can be effectively stopped due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is in the middle of the "wait for all images" loop: 2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040 2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']' 2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls 2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l 2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']' 2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10 ... 2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed 2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons In this scenario "rbd ls" is going to time out repeatedly, turning the loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute timeout + 10-second sleep per iteration). Fixes: https://tracker.ceph.com/issues/75239 Signed-off-by: Ilya Dryomov --- diff --git a/qa/workunits/rbd/rbd_mirror_fsx_compare.sh b/qa/workunits/rbd/rbd_mirror_fsx_compare.sh index 79c36546d4f..856e168e1d5 100755 --- a/qa/workunits/rbd/rbd_mirror_fsx_compare.sh +++ b/qa/workunits/rbd/rbd_mirror_fsx_compare.sh @@ -6,6 +6,7 @@ # set -ex +set -o pipefail . $(dirname $0)/rbd_mirror_helpers.sh @@ -14,11 +15,12 @@ trap 'cleanup $?' INT TERM EXIT setup_tempdir testlog "TEST: wait for all images" -image_count=$(rbd --cluster ${CLUSTER1} --pool ${POOL} ls | wc -l) +expected_image_count=$(rbd --cluster ${CLUSTER1} --pool ${POOL} ls | wc -l) retrying_seconds=0 sleep_seconds=10 while [ ${retrying_seconds} -le 7200 ]; do - [ $(rbd --cluster ${CLUSTER2} --pool ${POOL} ls | wc -l) -ge ${image_count} ] && break + actual_image_count=$(rbd --cluster ${CLUSTER2} --pool ${POOL} ls | wc -l) + [ ${actual_image_count} -ge ${expected_image_count} ] && break sleep ${sleep_seconds} retrying_seconds=$(($retrying_seconds+${sleep_seconds})) done