mgr, common, qa, doc: issue health error after max expiration is exceeded

author Laura Flores <lflores@ibm.com>

Tue, 29 Jul 2025 22:46:46 +0000 (22:46 +0000)

committer Laura Flores <lflores@ibm.com>

Tue, 21 Apr 2026 15:56:37 +0000 (10:56 -0500)
author Laura Flores <lflores@ibm.com>
Tue, 29 Jul 2025 22:46:46 +0000 (22:46 +0000)
committer Laura Flores <lflores@ibm.com>
Tue, 21 Apr 2026 15:56:37 +0000 (10:56 -0500)
diff --git a/doc/mgr/administrator.rst b/doc/mgr/administrator.rst

index 12c6a9bb420cee0ecc3e02e95437b3abe40d1881..ebc90ff9e22f75c57298c9fa1337535d28f3171b 100644 (file)
--- a/doc/mgr/administrator.rst
+++ b/doc/mgr/administrator.rst
@@ -31,35 +31,35 @@ which should now include a mgr status line::
  
      mgr active: $name
  
-Interpreting Ceph-Mgr Statuses
-==============================
+Interpreting Manager Daemon Status
+==================================
  
  A cluster's health status will show each ``ceph-mgr`` daemon in one of three states:
  
  1. **active**
  
-   This mgr daemon has been fully initialized, which means it is ready to receive
-   and execute commands. Only one mgr will be in this state at a time.
+   This Manager daemon has been fully initialized, which means it is ready to receive
+   and execute commands. Only one Manager will be in this state at a time.
  
  2. **active (starting)**
  
-   This mgr daemon has been chosen to be ``active``, but it is not done initializing.
+   This Manager daemon has been chosen to be ``active``, but it is not done initializing.
     Although it is not yet ready to execute commands, an operator may still issue commands,
-   which will be held and executed once the manager becomes ``active``. Only one mgr will
-   be in this state at a time.
+   which will be held and executed once the Manager becomes ``active``. Only one Manager
+   will be in this state at a time.
  
  3. **standby**
  
-   This mgr daemon is not currently receiving or executing commands, but it is there to
-   take over if the current active mgr becomes unavailable. An operator may also manually
-   promote standby manager to active via ``ceph mgr fail`` if desired. All other mgr daemons
-   which are not ``active`` or ``active (starting)`` will be in this state.
+   This Manager daemon is not currently receiving or executing commands, but it is ready to
+   take over if the current active Manager becomes unavailable. An administrator may
+   manually promote a standby to become active via ``ceph mgr fail`` if desired. All other
+   Manager daemons which are not ``active`` or ``active (starting)`` will be in this state.
  
-Each of these states are visible in the output of the ``ceph -s``. For example:
+Each of these states are visible in the output of the ``ceph status`` command. For example:
  
  .. code-block:: console
  
-   $ ceph -s
+   $ ceph status
       cluster:
         id:     b150f540-745a-460c-a566-376b28b95ac3
         health: HEALTH_OK
diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst

index 013553c7f1dfdd35a0339b3abe0b1cf89e72de5a..74cb1424e46b94744b06bb3355839a4b22fd739f 100644 (file)
--- a/doc/rados/operations/health-checks.rst
+++ b/doc/rados/operations/health-checks.rst
@@ -314,6 +314,39 @@ However, if you believe the error is transient, you may restart your manager
  daemon(s) or use ``ceph mgr fail`` on the active daemon in order to force
  failover to another daemon.
  
+**Module failed to initialize**
+
+If the ``ceph health detail`` looks something like this, it means that some
+modules took too long to initialize after a Manager failover, and are unable to process
+commands:
+
+.. code-block:: console
+
+   HEALTH_ERR 4 mgr modules have failed
+   [ERR] MGR_MODULE_ERROR: 14 mgr modules have failed
+       Module 'rbd_support' has failed: Module failed to initialize.
+       Module 'status' has failed: Module failed to initialize.
+       Module 'telemetry' has failed: Module failed to initialize.
+       Module 'volumes' has failed: Module failed to initialize.
+
+
+You can also see these modules listed under ``pending_modules``
+in the output of the following command:
+
+.. prompt:: bash $
+
+   ceph tell mgr mgr_status
+
+To troubleshoot, you may run ``ceph mgr fail`` to reboot
+module initialization.
+
+Note that the health error may clear on its own since modules
+will continue to initialize in the background.
+
+If the modules are still failing to initialize, please file a bug
+report under the `"mgr" project <https://tracker.ceph.com/projects/mgr>`_
+for further assistance.
+
  OSDs
  ----
  
diff --git a/qa/workunits/mgr/test_mgr_module_loading_time.sh b/qa/workunits/mgr/test_mgr_module_loading_time.sh

index e55a761c27a4ff3e43bbd1eb7119ca3ad90153ee..dd81a52760787bbacb7665c3acfd8a3bae84edfa 100755 (executable)
--- a/qa/workunits/mgr/test_mgr_module_loading_time.sh
+++ b/qa/workunits/mgr/test_mgr_module_loading_time.sh
@@ -1,53 +1,167 @@
  #!/bin/bash
  
-setup_cephadm() {
-    # This will create CEPHADM_STRAY_HOST warnings, but we just want to be able to run an orch command.
-    echo "Enabling cephadm module..."
-    ceph mgr module enable cephadm
-    ceph orch set backend cephadm
-}
-
-check_cluster_status() {
-    echo "Checking cluster status..."
-    ceph -s
-}
-
-set_balancer_delay() {
-    echo "Setting balancer module load delay..."
-    ceph config set mgr mgr_module_load_delay_name balancer
-    ceph config set mgr mgr_module_load_delay 10000
-}
-
-test_loading_time() {
-    echo "Testing with module load delay of 10000 ms..."
-    ceph mgr fail
-
-    local orch_status_output
-    if ! orch_status_output=$(ceph orch status 2>&1); then
-        echo "FAIL: 'ceph orch status' failed to run:"
-        echo "$orch_status_output"
-        exit 1
-    fi
+# This script tests how the mgr handles different module loading times post failover.
+# The motivation is this tracker ticket: https://tracker.ceph.com/issues/71631
  
+# To run this script on a vstart cluter, use the following command:
+#   cd ceph/build
+#   ../qa/workunits/mgr/test_mgr_module_loading_time.sh --vstart
+
+vstart=0
+if [ "$1" = "--vstart" ]; then
+    vstart=1
+fi
+
+ceph="ceph"
+if [ $vstart -eq 1 ]; then
+    ceph="./bin/ceph"
+fi
+
+
+# This will create CEPHADM_STRAY_HOST warnings, but we just want to be able to run an orch command.
+echo "Enabling cephadm module..."
+"$ceph" mgr module enable cephadm
+"$ceph" orch set backend cephadm
+
+echo "Checking cluster status..."
+"$ceph" -s
+
+# ------ Test 1 ------
+echo "Test 1: Test normal module loading behavior without any injected delays"
+
+echo "Ensure that no module is set for a load delay..."
+"$ceph" config set mgr mgr_module_load_delay_name ""
+
+echo "Test 1: Ensure that there is no injected load delay..."
+"$ceph" config set mgr mgr_module_load_delay 0
+
+"$ceph" mgr fail
+orch_status_output=$("$ceph" orch status 2>&1)
+
+echo "$orch_status_output"
+if [[ "$orch_status_output" == *"Backend: cephadm"* ]]; then
+    echo "PASS: orch command succeeded during normal behavior."
+elif [[ "$orch_status_output" == *"Error ENOTSUP: Module 'orchestrator' is not enabled/loaded"* ]]; then
+    echo "FAIL: orch command failed during normal behavior."
+    exit 1
+else
+    echo "FAIL: Unexpected error in orch command during normal behavior."
+    echo "$orch_status_output"
+    exit 1
+fi
+
+echo "Ensure health detail DOES NOT warn about any modules that failed initialization..."
+health=$("$ceph" health detail 2>&1)
+if [[ "$health" == *"Module failed to initialize"* ]]; then
+    echo "FAIL: One or more modules failed to initialize during small delay."
+    echo "$health"
+    exit 1
+fi
+
+echo "Verify that mgr is active..."
+stat=$("$ceph" -s 2>&1)
+if [[ "$stat" != *"active, since"* ]]; then
+    echo "FAIL: Mgr should be in 'active' state."
+    echo "$stat"
+    exit 1
+fi
+
+# ------ Test 2 ------
+echo "Select balancer module to receive loading delays..."
+"$ceph" config set mgr mgr_module_load_delay_name balancer
+
+echo "Test 2: Inject small delay (10000 ms) that should not exceed max loading retries"
+"$ceph" config set mgr mgr_module_load_delay 10000
+
+"$ceph" mgr fail
+orch_status_output=$("$ceph" orch status 2>&1)
+
+echo "$orch_status_output"
+if [[ "$orch_status_output" == *"Backend: cephadm"* ]]; then
+    echo "PASS: orch command succeeded during small delay."
+elif [[ "$orch_status_output" == *"Error ENOTSUP: Module 'orchestrator' is not enabled/loaded"* ]]; then
+    echo "FAIL: orch command failed during small delay."
+    exit 1
+else
+    echo "FAIL: Unexpected error in orch command during small delay."
+    echo "$orch_status_output"
+    exit 1
+fi
+
+echo "Ensure health detail DOES NOT warn about any modules that failed initialization..."
+health=$("$ceph" health detail 2>&1)
+if [[ "$health" == *"Module failed to initialize"* ]]; then
+    echo "FAIL: One or more modules failed to initialize during small delay."
+    echo "$health"
+    exit 1
+fi
+
+echo "Verify that mgr is active..."
+stat=$("$ceph" -s 2>&1)
+if [[ "$stat" != *"active, since"* ]]; then
+    echo "FAIL: Mgr should be in 'active' state."
+    echo "$stat"
+    exit 1
+fi
+
+# ------ Test 3 ------
+echo "Test 3: Inject large delay (10000000000 ms) that exceeds max loading retries and emits cluster error"
+"$ceph" config set mgr mgr_module_load_delay 10000000000
+
+"$ceph" mgr fail
+orch_status_output=$("$ceph" orch status 2>&1)
+
+echo "$orch_status_output"
+if [[ "$orch_status_output" == *"Error ENOTSUP: Module 'orchestrator' is not enabled/loaded"* ]]; then
+    echo "PASS: orch command failed during large delay as expected."
+else
+    echo "FAIL: Unexpected error in orch command during large delay."
      echo "$orch_status_output"
+    exit 1
+fi
+
+echo "Ensure health detail DOES warn about any modules that failed initialization..."
+health=$("$ceph" health detail 2>&1)
+if [[ "$health" == *"Module failed to initialize"* ]]; then
+    echo "PASS: Cluster properly issued error about modules that failed to initialize."
+    echo "$health"
+else
+    echo "FAIL: Cluster did not properly issue error about modules that failed to initialize."
+    echo "$health"
+    exit 1
+fi
+
+echo "Verify that mgr is active..."
+stat=$("$ceph" -s 2>&1)
+if [[ "$stat" != *"active, since"* ]]; then
+    echo "FAIL: Mgr should be in 'active' state."
+    echo "$stat"
+    exit 1
+fi
+
+# ----- Test 4 -----
+echo "Test 4: Disable the problematic module and confirm that the health error goes away"
+
+echo "Disabling the balancer module..."
+"$ceph" mgr module force disable balancer --yes-i-really-mean-it
+
+echo "Sleeping for 10 seconds to allow the health error to clear up..."
+sleep 10
+
+echo "Ensure health detail no longer warns about any modules that failed initialization..."
+health=$("$ceph" health detail 2>&1)
+if [[ "$health" == *"Module failed to initialize"* ]]; then
+    echo "FAIL: One or more modules failed to initialize despite problem module being disabled."
+    echo "$health"
+    exit 1
+fi
+
+echo "Verify that mgr is active..."
+stat=$("$ceph" -s 2>&1)
+if [[ "$stat" != *"active, since"* ]]; then
+    echo "FAIL: Mgr should be in 'active' state."
+    echo "$stat"
+    exit 1
+fi
  
-    if [[ "$orch_status_output" == *"Backend: cephadm"* ]]; then
-        echo "PASS: Excess loading time was properly supported."
-    elif [[ "$orch_status_output" == *"Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date"* ]]; then
-        echo "FAIL: Excess loading time was not properly supported."
-        exit 1
-    else
-        echo "FAIL: Unexpected error in 'ceph orch status':"
-        echo "$orch_status_output"
-        exit 1
-    fi
-}
-
-main() {
-    setup_cephadm || return 1
-    check_cluster_status || return 1
-    set_balancer_delay || return 1
-    test_loading_time || return 1
-}
-
-main "$@"
+echo "All tests passed."
diff --git a/src/common/options/mgr.yaml.in b/src/common/options/mgr.yaml.in

index 87f6c36cca1e86c2d03202c4cd83a06b2665853a..5c9fb5ddbd3dffa8c7d853bdb3e541d846428fd8 100644 (file)
--- a/src/common/options/mgr.yaml.in
+++ b/src/common/options/mgr.yaml.in
@@ -183,6 +183,18 @@ options:
    - runtime
    services:
    - mgr
+- name: mgr_module_load_expiration
+  type: millisecs
+  level: dev
+  default: 20000
+  desc: Maximum number of milliseconds the active mgr is allowed to load the mgr modules before declaring availability.
+  long_desc: Maximum number of milliseconds the active mgr is allowed to load the mgr modules. If any modules are still
+    uninitialized after the expiration is exceeded, the mgr proceeds to declare availability, but a health error will be
+    issued indicating which modules didn't load in time.
+  flags:
+  - runtime
+  services:
+  - mgr
  - name: cephadm_path
    type: str
    level: advanced
diff --git a/src/mgr/Mgr.cc b/src/mgr/Mgr.cc

index bde8732ae66ed94f6a7ff5fbc6a2b53de299e91d..f848fd4ba4dcab9ed345b1ba03538f5a03bc041e 100644 (file)
--- a/src/mgr/Mgr.cc
+++ b/src/mgr/Mgr.cc
@@ -68,7 +68,8 @@ Mgr::Mgr(MonClient *monc_, const MgrMap& mgrmap,
    clog(clog_),
    audit_clog(audit_clog_),
    initialized(false),
-  initializing(false)
+  initializing(false),
+  initialization_start_time(ceph::coarse_mono_clock::zero())
  {
    cluster_state.set_objecter(objecter);
  }
@@ -166,6 +167,7 @@ void Mgr::background_init(Context *completion)
    ceph_assert(!initializing);
    ceph_assert(!initialized);
    initializing = true;
+  initialization_start_time = ceph::coarse_mono_clock::now();
  
    finisher.start();
  
@@ -762,6 +764,32 @@ bool Mgr::got_mgr_map(const MgrMap& m)
    return false;
  }
  
+bool Mgr::exceeded_initialization_expiration()
+{
+  // initialization_start_time=0 when initialization hasn't started yet,
+  // so know we can't have exceeded the time expiration.
+  if (ceph::coarse_mono_clock::is_zero(initialization_start_time)) {
+    return false;
+  }
+
+  // Save the amount of time elapsed
+  auto time_elapsed = ceph::coarse_mono_clock::now() - initialization_start_time;
+  dout(20) << "time elapsed since mgr initialization: " << time_elapsed << dendl;
+
+  // Reset start time if the expiration time has been exceeded.
+  // Signal initialization=true so the mgr forcibly sends an "active" beacon
+  auto expiration = g_conf().get_val<std::chrono::milliseconds>("mgr_module_load_expiration");
+  bool exceeded_expiration = time_elapsed > expiration;
+  if (exceeded_expiration) {
+    std::lock_guard l(lock);
+    initialization_start_time = ceph::coarse_mono_clock::zero();
+    initializing = false;
+    initialized = true;
+  }
+
+  return exceeded_expiration;
+}
+
  void Mgr::handle_mgr_digest(ref_t<MMgrDigest> m)
  {
    dout(10) << m->mon_status_json.length() << dendl;
diff --git a/src/mgr/Mgr.h b/src/mgr/Mgr.h

index 9bfb8c5bc4efb39da10ee19cc00a908c3578e4f9..3db26b357ffd8a801dcfc1f8971879d43245e21e 100644 (file)
--- a/src/mgr/Mgr.h
+++ b/src/mgr/Mgr.h
@@ -66,6 +66,7 @@ protected:
  
    bool initialized;
    bool initializing;
+  ceph::coarse_mono_time initialization_start_time;
  
  public:
    Mgr(MonClient *monc_, const MgrMap& mgrmap,
@@ -75,6 +76,7 @@ public:
    ~Mgr();
  
    bool is_initialized() const {return initialized;}
+  bool exceeded_initialization_expiration();
    entity_addrvec_t get_server_addrs() const {
      return server.get_myaddrs();
    }
diff --git a/src/mgr/MgrStandby.cc b/src/mgr/MgrStandby.cc

index c7976aa1e50d7188de87446da00a6f226cba08d9..bb0def69099b36b48fee537ece5d65b94363af0b 100644 (file)
--- a/src/mgr/MgrStandby.cc
+++ b/src/mgr/MgrStandby.cc
@@ -277,8 +277,23 @@ void MgrStandby::send_beacon()
    }
  
    // Whether I think I am available (request MgrMonitor to set me
-  // as available in the map)
-  bool available = active_mgr != nullptr && active_mgr->is_initialized();
+  // as available in the map).
+  //
+  // The active mgr is marked available if:
+  // 1. The mon has chosen a standby to be active
+  // 2. The chosen active mgr has all of its modules initialized
+  //
+  // In extreme cases, if modules take very long to initialize (a buffer of extra time
+  // is allowed; see "mgr_module_load_expiration"), we will proceed to mark the chosen
+  // active mgr "available" to unblock other mgr functionality such as reporting PG
+  // availability. If this happens, a health error will be issued indicating which
+  // mgr modules got stuck initializing (See src/mgr/PyModuleRegistry.cc). This unblocks
+  // the rest of the mgr's functionality while making it clear that some modules
+  // are unusuable.
+  bool available = false;
+  if (active_mgr != nullptr) {
+    available = active_mgr->is_initialized() || active_mgr->exceeded_initialization_expiration();
+  }
  
    auto addrs = available ? active_mgr->get_server_addrs() : entity_addrvec_t();
    dout(10) << "sending beacon as gid " << monc.get_global_id() << dendl;
diff --git a/src/mgr/PyModuleRegistry.cc b/src/mgr/PyModuleRegistry.cc

index e4e0b3cd6b79c0f0166f1b6d39d8a3090cc0ca96..05ff2bce3c9bb496e64b52a0b4f3a13d69b64b06 100644 (file)
--- a/src/mgr/PyModuleRegistry.cc
+++ b/src/mgr/PyModuleRegistry.cc
@@ -428,6 +428,8 @@ void PyModuleRegistry::get_health_checks(health_check_map_t *checks)
          //   checks (to avoid outputting two health messages about a
          //   module that said can_run=false but we tried running it anyway)
          failed_modules[module->get_name()] = module->get_error_string();
+      } else if ((active_modules->is_pending(module->get_name()))) {
+        failed_modules[module->get_name()] = "Module failed to initialize.";
        }
      }
author	Laura Flores <lflores@ibm.com>
	Tue, 29 Jul 2025 22:46:46 +0000 (22:46 +0000)
committer	Laura Flores <lflores@ibm.com>
	Tue, 21 Apr 2026 15:56:37 +0000 (10:56 -0500)
doc/mgr/administrator.rst		patch \| blob \| history
doc/rados/operations/health-checks.rst		patch \| blob \| history
qa/workunits/mgr/test_mgr_module_loading_time.sh		patch \| blob \| history
src/common/options/mgr.yaml.in		patch \| blob \| history
src/mgr/Mgr.cc		patch \| blob \| history
src/mgr/Mgr.h		patch \| blob \| history
src/mgr/MgrStandby.cc		patch \| blob \| history
src/mgr/PyModuleRegistry.cc		patch \| blob \| history