From: Kamoltat Sirivadhna Date: Wed, 30 Jul 2025 13:57:47 +0000 (+0000) Subject: doc/health-checks: update MON_NETSPLIT documentation X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=5ee0bbf36899e180055d3c0c71266158630a71d4;p=ceph.git doc/health-checks: update MON_NETSPLIT documentation Update the MON_NETSPLIT health check documentation to reflect the introduction of the configurable mon_netsplit_grace_period option. Fixes: https://tracker.ceph.com/issues/71344 Signed-off-by: Kamoltat Sirivadhna --- diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 30a9bd64405f..91e4f07da204 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -167,6 +167,12 @@ which are frequently updated. This warning only appears when the cluster is provisioned with at least three Ceph Monitors and are using the ``connectivity`` election strategy. +To reduce false alarms from transient network issues, detected netsplits are +not immediately reported as health warnings. Instead, they must persist for at +least ``mon_netsplit_grace_period`` seconds (default: 9 seconds) before being +reported. If the network partition resolves within this grace period, no health +warning is emitted. + Network partitions are reported in two ways: - As location-level netsplits (e.g., "Netsplit detected between dc1 and dc2") when @@ -177,6 +183,18 @@ Network partitions are reported in two ways: The system prioritizes reporting at the highest topology level (``datacenter``, ``rack``, etc.) when possible, to better help operators identify infrastructure-level network issues. +To adjust the grace period threshold, run the following command: + +.. prompt:: bash $ + + ceph config set mon mon_netsplit_grace_period + +To disable the grace period entirely (immediate reporting), set the value to 0: + +.. prompt:: bash $ + + ceph config set mon mon_netsplit_grace_period 0 + AUTH_INSECURE_GLOBAL_ID_RECLAIM _______________________________