qa: whitelist slow requests progress.yaml

author Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>

Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)

committer Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>

Fri, 6 Mar 2026 17:58:09 +0000 (17:58 +0000)
author Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)
committer Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
Fri, 6 Mar 2026 17:58:09 +0000 (17:58 +0000)
diff --git a/qa/suites/rados/mgr/tasks/4-units/progress.yaml b/qa/suites/rados/mgr/tasks/4-units/progress.yaml

index 6ed4f442955f54bcd06c642447c18697affcd72d..e09b6cc63c5479ab0149e47e071be1637447fc0d 100644 (file)
--- a/qa/suites/rados/mgr/tasks/4-units/progress.yaml
+++ b/qa/suites/rados/mgr/tasks/4-units/progress.yaml
@@ -12,6 +12,8 @@ overrides:
        - \(FS_WITH_FAILED_MDS\)
        - \(FS_DEGRADED\)
        - \(OSDMAP_FLAGS\)
+      - \(slow requests\)
+
  tasks:
    - cephfs_test_runner:
        modules:
diff --git a/qa/tasks/mgr/test_progress.py b/qa/tasks/mgr/test_progress.py

index 3a13055c09228d4e26ea5aa2b35d9bdf387b8c5c..fd9f5a737b648fc73019a56c9607167f12bd445d 100644 (file)
--- a/qa/tasks/mgr/test_progress.py
+++ b/qa/tasks/mgr/test_progress.py
@@ -9,6 +9,33 @@ log = logging.getLogger(__name__)
  
  
  class TestProgress(MgrTestCase):
+    """
+    Test suite for the progress module.
+
+    IMPORTANT: Slow Requests / IO Stalls
+    =====================================
+    These tests intentionally trigger slow requests and IO stalls as a side effect
+    of the testing methodology. This is expected behavior and should be ignored
+    when evaluating test results.
+
+    Why this occurs? here is an example:
+    - Tests run 16 concurrent 4 MB writes (via rados bench -t 16) to populate the cluster
+    - While recovery/backfill is disabled (nobackfill, norecover flags set), OSDs are
+      marked out and then back in, causing PG remapping
+    - Because recovery/backfill is disabled, PGs cannot restore their replicas after
+      remapping, leaving them in degraded/remapped states
+    - This causes writes to get stuck in the replicated write path, leading to IO
+      stalls and slow ops being reported
+
+    Why we do this:
+    - The purpose is to test the progress module's event tracking, NOT the OSD write paths
+    - We intentionally disable recovery/backfill to prolong recovery events, giving us
+      time to observe and validate that progress events are correctly generated, tracked,
+      and completed by the progress module
+    - Without disabling recovery, events would complete too quickly to properly test
+
+    Test configurations should include slow request warnings in their ignorelist.
+    """
      POOL = "progress_data"
  
      # How long we expect to wait at most between taking an OSD out
author	Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
	Fri, 6 Mar 2026 17:20:18 +0000 (17:20 +0000)
committer	Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
	Fri, 6 Mar 2026 17:58:09 +0000 (17:58 +0000)
qa/suites/rados/mgr/tasks/4-units/progress.yaml		patch \| blob \| history
qa/tasks/mgr/test_progress.py		patch \| blob \| history