From: Nitzan Mordechai Date: Mon, 1 Dec 2025 09:24:42 +0000 (+0000) Subject: test/ceph-helpers: Pass timeout and add timeout for commands in test_pg_scrub X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=81e8df0c14d2daaef3074890273a4772551fdd56;p=ceph-ci.git test/ceph-helpers: Pass timeout and add timeout for commands in test_pg_scrub In test_pg_scrub, after killing an OSD, subsequent pg_scrub checks and calls to flush_pg_stats can hang or timeout with the default time because the OSD is no longer running. This was causing test failures. This fix addresses two issues: 1. test_pg_scrub: Explicitly pass the WAIT_FOR_CLEAN_TIMEOUT and TIMEOUT variables (both set to 2) to the pg_scrub call after the OSD is killed. This prevents a hang in the wait_for_clean check within pg_scrub. 2. flush_pg_stats: Add an explicit timeout to the ceph tell osd.$osd flush_pg_stats command, allowing it to fail quickly when an OSD is unresponsive. Fixes: https://tracker.ceph.com/issues/74004 Signed-off-by: Nitzan Mordechai --- diff --git a/qa/standalone/ceph-helpers.sh b/qa/standalone/ceph-helpers.sh index ccf5f63ee87..31501af88b6 100755 --- a/qa/standalone/ceph-helpers.sh +++ b/qa/standalone/ceph-helpers.sh @@ -1948,7 +1948,7 @@ function test_pg_scrub() { wait_for_clean || return 1 pg_scrub 1.0 || return 1 kill_daemons $dir KILL osd || return 1 - ! TIMEOUT=2 pg_scrub 1.0 || return 1 + ! WAIT_FOR_CLEAN_TIMEOUT=2 TIMEOUT=2 pg_scrub 1.0 || return 1 teardown $dir || return 1 } @@ -2255,10 +2255,9 @@ function flush_pg_stats() ids=`ceph osd ls` seqs='' for osd in $ids; do - seq=`ceph tell osd.$osd flush_pg_stats` - if test -z "$seq" - then - continue + seq=$(timeout $timeout ceph tell osd.$osd flush_pg_stats) || return 1 + if test -z "$seq"; then + continue fi seqs="$seqs $osd-$seq" done