From: Kefu Chai Date: Tue, 3 Mar 2026 03:30:58 +0000 (+0800) Subject: qa/cephadm: ignore transient CEPHADM_FAILED_DAEMON in smoke-singlehost X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=6e25724364c52d80c32a9c2af3ff4c50a233d780;p=ceph.git qa/cephadm: ignore transient CEPHADM_FAILED_DAEMON in smoke-singlehost Add CEPHADM_FAILED_DAEMON to the log-ignorelist for smoke-singlehost tests to prevent false failures from transient daemon states during deployment. Background: ----------- During daemon deployment (especially OSDs), there's a brief window (typically 2-4 seconds) where the daemon status is reported as 'unknown' before the daemon fully starts and registers with the cluster. This triggers the CEPHADM_FAILED_DAEMON health warning which clears itself automatically once the daemon completes startup. This is expected and documented behavior during daemon deployment. Other cephadm test suites already ignore this warning (see commit 53b462764c6 "qa: fix log errors for cephadm tests" which added CEPHADM_FAILED_DAEMON to the ignorelists for smoke-small, smoke-roleless, osds, upgrade tests, and many workunits). The smoke-singlehost test was inadvertently missed in that commit, causing intermittent false failures when the test's health check happens to run during the brief transient state. Failure Example: ---------------- Job 50357 from test run dgalloway-2026-02-13_23:06:25 failed with: 2026-02-17T00:13:31.081 cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON) Timeline: 00:13:28 - Deploying daemon osd.1 on trial167 00:13:30 - Reconfiguring daemon osd.1 on trial167 00:13:31 - Health check: daemon osd.1 in unknown state (CEPHADM_FAILED_DAEMON) 00:13:34 - Health check cleared: CEPHADM_FAILED_DAEMON (daemon started successfully) 00:13:35+ - osd.1 running normally The test framework flagged this as a failure because it detected the warning in the cluster log, even though the daemon successfully started and the warning cleared within 3 seconds. This brings smoke-singlehost in line with other cephadm test suites that already handle this expected transient state. References: ----------- Similar fixes: - commit 53b462764c6: Added CEPHADM_FAILED_DAEMON to multiple test suites - commit 69076ae1022: Added CEPHADM_FAILED_DAEMON to nvmeof tests Fixes: https://tracker.ceph.com/issues/75277 Signed-off-by: Kefu Chai --- diff --git a/qa/suites/orch/cephadm/smoke-singlehost/1-start.yaml b/qa/suites/orch/cephadm/smoke-singlehost/1-start.yaml index f350954d13a7..fd952f9644ac 100644 --- a/qa/suites/orch/cephadm/smoke-singlehost/1-start.yaml +++ b/qa/suites/orch/cephadm/smoke-singlehost/1-start.yaml @@ -24,6 +24,7 @@ overrides: ceph: log-ignorelist: - OSD_DOWN + - CEPHADM_FAILED_DAEMON conf: osd: osd shutdown pgref assert: true