From: Paul Cuzner Date: Tue, 2 Mar 2021 02:40:46 +0000 (+1300) Subject: mgr/cephadm:Document the cephadm config-check feature X-Git-Tag: v16.2.0~106^2~72 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=eb51cbf26d0a3576fc9a4cd03bfa903b86f85862;p=ceph.git mgr/cephadm:Document the cephadm config-check feature This patch updates the docs to describe the config-check feature, describing how these checks can be enabled and managed. Signed-off-by: Paul Cuzner (cherry picked from commit e407c6343d419b7003d143c213fdf3f69462cd83) --- diff --git a/doc/cephadm/operations.rst b/doc/cephadm/operations.rst index ea629f3432fa..da8e14999d70 100644 --- a/doc/cephadm/operations.rst +++ b/doc/cephadm/operations.rst @@ -106,9 +106,18 @@ that it does not fill up the root file system. Health checks ============= +The cephadm module provides additional healthchecks to supplement the default healthchecks +provided by the Cluster. These additional healthchecks fall into two categories; + +- **cephadm operations**: Healthchecks in this category are always executed when the cephadm module is active. +- **cluster configuration**: These healthchecks are *optional*, and focus on the configuration of the hosts in + the cluster + +CEPHADM Operations +------------------ CEPHADM_PAUSED --------------- +^^^^^^^^^^^^^^ Cephadm background work has been paused with ``ceph orch pause``. Cephadm continues to perform passive monitoring activities (like checking @@ -122,7 +131,7 @@ Resume cephadm work with:: .. _cephadm-stray-host: CEPHADM_STRAY_HOST ------------------- +^^^^^^^^^^^^^^^^^^ One or more hosts have running Ceph daemons but are not registered as hosts managed by *cephadm*. This means that those services cannot @@ -148,7 +157,7 @@ See :ref:`cephadm-fqdn` for more information about host names and domain names. CEPHADM_STRAY_DAEMON --------------------- +^^^^^^^^^^^^^^^^^^^^ One or more Ceph daemons are running but not are not managed by *cephadm*. This may be because they were deployed using a different @@ -166,7 +175,7 @@ This warning can be disabled entirely with:: ceph config set mgr mgr/cephadm/warn_on_stray_daemons false CEPHADM_HOST_CHECK_FAILED -------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^ One or more hosts have failed the basic cephadm host check, which verifies that (1) the host is reachable and cephadm can be executed there, and (2) @@ -186,10 +195,114 @@ You can disable this health warning with:: ceph config set mgr mgr/cephadm/warn_on_failed_host_check false +Cluster Configuration Checks +---------------------------- +Cephadm periodically scans each of the hosts in the cluster, to understand the state +of the OS, disks, NICs etc. These facts can then be analysed for consistency across the hosts +in the cluster to identify any configuration anomalies. + +The configuration checks are an **optional** feature, enabled by the following command +:: + + ceph config set mgr mgr/cephadm/config_checks_enabled true + +The configuration checks are triggered after each host scan (1m). The cephadm log entries will +show the current state and outcome of the configuration checks as follows; + +Disabled state (config_checks_enabled false) +:: + + ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable + +Enabled state (config_checks_enabled true) +:: + + CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected + +The configuration checks themselves are managed through several cephadm sub-commands. + +To determine whether the configuration checks are enabled, you can use the following command +:: + + ceph cephadm config-check status + +This command will return the status of the configuration checker as either "Enabled" or "Disabled". + + +Listing all the configuration checks and their current state +:: + + ceph cephadm config-check ls + + e.g. + NAME HEALTHCHECK STATUS DESCRIPTION + kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts + os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts + public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork + osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting + osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed + network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts + ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active) + kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent + +The name of each configuration check, can then be used to enable or disable a specific check. +:: + + ceph cephadm config-check disable + + eg. + ceph cephadm config-check disable kernel_security + +CEPHADM_CHECK_KERNEL_LSM +^^^^^^^^^^^^^^^^^^^^^^^^ +Each host within the cluster is expected to operate within the same Linux Security Module (LSM) state. For example, +if the majority of the hosts are running with SELINUX in enforcing mode, any host not running in this mode +would be flagged as an anomaly and a healtcheck (WARNING) state raised. + +CEPHADM_CHECK_SUBSCRIPTION +^^^^^^^^^^^^^^^^^^^^^^^^^^ +This check relates to the status of vendor subscription. This check is only performed for hosts using RHEL, but helps +to confirm that all your hosts are covered by an active subscription so patches and updates +are available. + +CEPHADM_CHECK_PUBLIC_MEMBERSHIP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts +that are not on the public network will rely on routing which may affect performance + +CEPHADM_CHECK_MTU +^^^^^^^^^^^^^^^^^ +The MTU of the NICs on OSDs can be a key factor in consistent performance. This check examines hosts +that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is +determined by establishing the MTU setting that the majority of hosts are using, with any anomalies being +resulting in a Ceph healthcheck. + +CEPHADM_CHECK_LINKSPEED +^^^^^^^^^^^^^^^^^^^^^^^ +Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance. +This check determines the linkspeed shared by the majority of "OSD hosts", resulting in a healthcheck for +any hosts that are set at a lower linkspeed rate. + +CEPHADM_CHECK_NETWORK_MISSING +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The public_network and cluster_network settings support subnet definitions for IPv4 and IPv6. If these +settings are not found on any host in the cluster a healthcheck is raised. + +CEPHADM_CHECK_CEPH_RELEASE +^^^^^^^^^^^^^^^^^^^^^^^^^^ +Under normal operations, the ceph cluster should be running daemons under the same ceph release (i.e. all +pacific). This check looks at the active release for each daemon, and reports any anomalies as a +healthcheck. *This check is bypassed if an upgrade process is active within the cluster.* + +CEPHADM_CHECK_KERNEL_VERSION +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The OS kernel version (maj.min) is checked for consistency across the hosts. Once again, the +majority of the hosts is used as the basis of identifying anomalies. + /etc/ceph/ceph.conf =================== -Cephadm distributes a minimized ``ceph.conf`` that only contains +Cephadm distributes a minimized ``ceph.conf`` that only contains a minimal set of information to connect to the Ceph cluster. To update the configuration settings, instead of manually editing @@ -197,7 +310,7 @@ the ``ceph.conf`` file, use the config database instead:: ceph config set ... -See :ref:`ceph-conf-database` for details. +See :ref:`ceph-conf-database` for details. By default, cephadm does not deploy that minimized ``ceph.conf`` across the cluster. To enable the management of ``/etc/ceph/ceph.conf`` files on all @@ -205,7 +318,7 @@ hosts, please enable this by running:: ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true -To set up an initial configuration before bootstrapping +To set up an initial configuration before bootstrapping the cluster, create an initial ``ceph.conf`` file. For example:: cat < /etc/ceph/ceph.conf