git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit

author	Sridhar Seshasayee <sseshasa@redhat.com>
	Wed, 19 May 2021 15:22:15 +0000 (20:52 +0530)
committer	Sridhar Seshasayee <sseshasa@redhat.com>
	Wed, 2 Jun 2021 08:49:48 +0000 (14:19 +0530)
commit	328271d587d099e78dcd020c17e7465043c1bb6b
tree	4843fea5d795891e733af24afe28c92526d35db7	tree \| snapshot
parent	9a95492b66341f7351e80f0386b4439f713debc6	commit \| diff

qa/tasks: Enhance wait_until_true() to check & retry recovery progress

With mclock scheduler enabled, the recovery throughput is throttled based
on factors like the type of mclock profile enabled, the OSD capacity among
others. Due to this the recovery times may vary and therefore the existing
timeout of 120 secs may not be sufficient.

To address the above, a new method called _is_inprogress_or_complete() is
introduced in the TestProgress Class that checks if the event with the
specified 'id' is in progress by checking the 'progress' key of the
progress command response. This method also handles the corner case where
the event completes just before it's called.

The existing wait_until_true() method in the CephTestCase Class is
modified to accept another function argument called "check_fn". This is
set to the _is_inprogress_or_complete() function described earlier in the
"test_turn_off_module" test that has been observed to fail due to the
reasons already described above. A retry mechanism of a maximum of 5
attempts is introduced after the first timeout is hit. This means that
the wait can extend up to a maximum of 600 secs (120 secs * 5) as long as
there is recovery progress reported by the 'ceph progress' command result.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

qa/tasks/ceph_test_case.py		diff \| blob \| history
qa/tasks/mgr/test_progress.py		diff \| blob \| history