From: Kefu Chai Date: Sat, 18 Apr 2026 08:19:31 +0000 (+0800) Subject: test/mds: fix flaky RepeatedQuiesceAwait X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=7a826727ebd004b1c22086925bbc8aad13274025;p=ceph.git test/mds: fix flaky RepeatedQuiesceAwait The test has been flaky on loaded CI, failing at different assertions depending on where the expiration budget runs out: [ RUN ] QuiesceDbTest.RepeatedQuiesceAwait src/test/mds/TestQuiesceDb.cc:1112: Failure Expected equality of these values: ERR(115) Which is: Operation now in progress(115) run_request(...) Which is: Connection timed out(110) [ FAILED ] QuiesceDbTest.RepeatedQuiesceAwait (627 ms) The test proves its central invariant -- "await resets the expiration timer" -- indirectly, by having the set survive a series of sleep_for(expiration/2) + await pairs. With expiration=0.1s that leaves only ~20ms above the 80ms consumed by the two 40ms release-awaits, and the margin is regularly eaten by scheduler jitter. A similar overrun inside the quiesce loop produces the other variant, where sleep_for(expiration/2) overshoots and the gap between two awaits exceeds the expiration. The per-iteration budget needs to cover ~3 CFS schedule latencies (test wakes from sleep_for, manager wakes on notify, test wakes on completion) plus mutex/queue processing, which is roughly 20ms nominal on Linux with sched_latency_ns=6ms and up to 30-40ms under heavy contention. Kernels built with a lower CONFIG_HZ coarsen ceph::coarse_real_clock's CLOCK_REALTIME_COARSE reads, but that affects only read precision, not the dominant scheduling-latency term. Fix by: 1. Raising expiration from 0.1s to 0.2s, so the sleep=expiration/2 margin is ~3x the worst-case per-iteration overhead. 2. Reducing the quiesce loop from 10 to 3 iterations, since three iterations already span 1.5x expiration cumulatively -- enough to prove the resets are extending the deadline. 3. Replacing the 2-iteration release-EINPROGRESS loop with a single release+await(sec(0)) and an at_age equality assertion, so "release does not reset the timer" is checked directly rather than inferred from multiple awaits racing the expiration. 4. Using await = 2*expiration for the final ETIMEDOUT so the expiration is guaranteed to fire inside the wait window rather than on its boundary. 5. Tracking at_age across the quiesce loop and asserting it advances. The loop's ASSERT_EQ(OK(),...) already fails if resets stop working, but the at_age check also catches regressions where at_age is updated to a stuck or stale value. The test now runs in ~500ms, comparable to the original. Fixes: https://tracker.ceph.com/issues/68543 Signed-off-by: Kefu Chai --- diff --git a/src/test/mds/TestQuiesceDb.cc b/src/test/mds/TestQuiesceDb.cc index d589da5a5415..f3ef9660eeac 100644 --- a/src/test/mds/TestQuiesceDb.cc +++ b/src/test/mds/TestQuiesceDb.cc @@ -1075,8 +1075,12 @@ TEST_F(QuiesceDbTest, RepeatedQuiesceAwait) { // let us reach quiescing managers.at(mds_gid_t(1))->reset_agent_callback(QUIESCING_AGENT_CB); - // pick an expiration timeout - auto expiration = sec(0.1); + // pick an expiration timeout. each loop iteration pays ~3 CFS sched + // latencies (sleep_for wake, manager wake on notify, completion + // wake) plus some processing, so the per-iteration overhead is + // 20-40ms on loaded CI. at sec(0.2) sleep_for(expiration/2) leaves + // a 100ms margin, roughly 3x the worst case. + auto expiration = sec(0.2); // create a set and let it quiesce ASSERT_EQ(OK(), run_request([=](auto& r) { @@ -1089,12 +1093,18 @@ TEST_F(QuiesceDbTest, RepeatedQuiesceAwait) { EXPECT_EQ(QS_QUIESCED, last_request->response.sets.at("set1").rstate.state); - // sleep for half the expiration interval multiple times - // each time sending another await request - // the expectation is that every time we call await - // the expiration timer is reset, hence we should be able to - // sustain the loop for arbitrarily long - for (int i = 0; i < 10; i++) { + // sleep for half the expiration multiple times, each time sending + // another await. after 3 iterations the cumulative sleep is 1.5x + // the expiration, so without timer resets the set would have + // expired. surviving the loop proves the resets are extending the + // deadline. + // + // EXPECT_GE inside the loop catches a backward at_age; EXPECT_GT + // after the loop catches at_age never being updated (which + // EXPECT_GE(x, x) can't, since it passes trivially). + auto initial_at_age = last_request->response.sets.at("set1").rstate.at_age; + auto prev_at_age = initial_at_age; + for (int i = 0; i < 3; i++) { std::this_thread::sleep_for(expiration/2); ASSERT_EQ(OK(), run_request([i](auto& r) { r.set_id = "set1"; @@ -1104,33 +1114,39 @@ TEST_F(QuiesceDbTest, RepeatedQuiesceAwait) { } r.await = sec(0); })); + auto at_age = last_request->response.sets.at("set1").rstate.at_age; + EXPECT_GE(at_age, prev_at_age); + prev_at_age = at_age; } + EXPECT_GT(prev_at_age, initial_at_age); // Prevent the set from reaching the RELEASED state managers.at(mds_gid_t(1))->reset_agent_callback(SILENT_AGENT_CB); - // start releasing and observe that the timer isn't reset in this case, - // so after a few EINPROGRESS we eventually reach timeout due to expiration - for (int i = 0; i < 2; i++) { - ASSERT_EQ(ERR(EINPROGRESS), run_request([=](auto& r) { - r.set_id = "set1"; - r.release(); - r.await = (expiration*2)/5; - })); - } + // release must not reset the timer: at_age should still match the + // last successful await. + auto last_await_at_age = prev_at_age; + ASSERT_EQ(ERR(EINPROGRESS), run_request([=](auto& r) { + r.set_id = "set1"; + r.release(); + r.await = sec(0); + })); + EXPECT_EQ(last_await_at_age, last_request->response.sets.at("set1").rstate.at_age); + // await = 2*expiration so the expiration fires inside the window, + // not on its boundary. // NB: the ETIMEDOUT is the await result, while the set itself should be EXPIRED EXPECT_EQ(ERR(ETIMEDOUT), run_request([=](auto& r) { r.set_id = "set1"; r.release(); - r.await = expiration; + r.await = expiration * 2; })); EXPECT_EQ(QS_EXPIRED, last_request->response.sets.at("set1").rstate.state); EXPECT_EQ(ERR(EPERM), run_request([](auto& r) { r.set_id = "set1"; - r.await = sec(0.1); + r.await = sec(0); })); EXPECT_EQ(ERR(EPERM), run_request([](auto& r) {