From 2c17b92430b92b393a5991f3d4cf04d4ad0f99c2 Mon Sep 17 00:00:00 2001 From: Milind Changire Date: Fri, 12 Jul 2024 08:56:41 +0530 Subject: [PATCH] qa: failfast mount for better performance During teuthology tests, the tearing down of the cluster between two tests causes the config to be reset and a config_notify generated. This leads to a race to create a new mount using the old fscid. But by the time the mount is attempted the new fs gets created with a new fscid. This situation leads to the client mount waiting for a connection completion notification from the mds for 5 minutes (default timeout) and eventually giving up. However, the default teuthology command timeout is 2 minutes. So, teuthology fails the command and declares the job as failed way before the mount can timeout. The resolution to this case is to lower the client mount timeout to 30 seconds so that the config_notify fails fast paving the way for successive commands to get executed with the new fs. An unhandled cluster warning about an unresponsive client also gets emitted later during qa job termination which leads to teuthology declaring the job as failed. As of now this warning seems harmless since it is emitted during cluster cleanup phase. So, this warning is added to the log-ignorelist section in the snap-schedule YAML. Fixes: https://tracker.ceph.com/issues/66009 Signed-off-by: Milind Changire (cherry picked from commit daf4798086b009fb6af6d93198e20ded5e0b5dc0) --- qa/cephfs/conf/mgr.yaml | 1 + qa/suites/fs/functional/tasks/snap-schedule.yaml | 1 + 2 files changed, 2 insertions(+) diff --git a/qa/cephfs/conf/mgr.yaml b/qa/cephfs/conf/mgr.yaml index fb6e9b09fa1..d7e95b9feb9 100644 --- a/qa/cephfs/conf/mgr.yaml +++ b/qa/cephfs/conf/mgr.yaml @@ -2,6 +2,7 @@ overrides: ceph: conf: mgr: + client mount timeout: 30 debug client: 20 debug mgr: 20 debug ms: 1 diff --git a/qa/suites/fs/functional/tasks/snap-schedule.yaml b/qa/suites/fs/functional/tasks/snap-schedule.yaml index 26922abeda4..7d7f62f16a8 100644 --- a/qa/suites/fs/functional/tasks/snap-schedule.yaml +++ b/qa/suites/fs/functional/tasks/snap-schedule.yaml @@ -15,6 +15,7 @@ overrides: - is full \(reached quota - POOL_FULL - POOL_BACKFILLFULL + - cluster \[WRN\] evicting unresponsive client tasks: - cephfs_test_runner: -- 2.39.5