From c12af828caf3c5d529e85a3205bdc865d8266fcf Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Thu, 8 Jul 2021 23:18:54 +1000 Subject: [PATCH] doc/cephadm: operations: Data location & ... This (very long) PR does a few things: - Rewrites the "Data Location" section of the Operations docs - Rewrites the "Health Checks" section of the Operations docs - Adds prompts to commands - Adds console-output formatting to the places where it is appropriate - Adds several section headers where appropriate, to signpost to the reader what is currently under discussion Signed-off-by: Zac Dover --- doc/cephadm/operations.rst | 308 ++++++++++++++++++++++++------------- 1 file changed, 200 insertions(+), 108 deletions(-) diff --git a/doc/cephadm/operations.rst b/doc/cephadm/operations.rst index c49d2d0cff5..1972fd92570 100644 --- a/doc/cephadm/operations.rst +++ b/doc/cephadm/operations.rst @@ -77,11 +77,12 @@ Logging to files ---------------- You can also configure Ceph daemons to log to files instead of to -stderr if you prefer logs to appear in files (as they did in earlier -versions of Ceph). When Ceph logs to files, the logs appear in -``/var/log/ceph/``. If you choose to configure Ceph to -log to files instead of to stderr, remember to configure Ceph so that -it will not log to stderr (the commands for this are covered below). +stderr if you prefer logs to appear in files (as they did in earlier, +pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files, +the logs appear in ``/var/log/ceph/``. If you choose to +configure Ceph to log to files instead of to stderr, remember to +configure Ceph so that it will not log to stderr (the commands for +this are covered below). Enabling logging to files ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -116,12 +117,13 @@ files. You can configure the logging retention schedule by modifying Data location ============= -Cephadm daemon data and logs in slightly different locations than older -versions of ceph: +Cephadm stores daemon data and logs in different locations than did +older, pre-cephadm (pre Octopus) versions of ceph: -* ``/var/log/ceph/`` contains all cluster logs. Note - that by default cephadm logs via stderr and the container runtime, - so these logs are normally not present. +* ``/var/log/ceph/`` contains all cluster logs. By + default, cephadm logs via stderr and the container runtime. These + logs will not exist unless you have enabled logging to files as + described in `cephadm-logs`_. * ``/var/lib/ceph/`` contains all cluster daemon data (besides logs). * ``/var/lib/ceph//`` contains all data for @@ -135,20 +137,22 @@ versions of ceph: Disk usage ---------- -Because a few Ceph daemons may store a significant amount of data in -``/var/lib/ceph`` (notably, the monitors and prometheus), we recommend -moving this directory to its own disk, partition, or logical volume so -that it does not fill up the root file system. +Because a few Ceph daemons (notably, the monitors and prometheus) store a +large amount of data in ``/var/lib/ceph`` , we recommend moving this +directory to its own disk, partition, or logical volume so that it does not +fill up the root file system. Health checks ============= -The cephadm module provides additional healthchecks to supplement the default healthchecks -provided by the Cluster. These additional healthchecks fall into two categories; +The cephadm module provides additional health checks to supplement the +default health checks provided by the Cluster. These additional health +checks fall into two categories: -- **cephadm operations**: Healthchecks in this category are always executed when the cephadm module is active. -- **cluster configuration**: These healthchecks are *optional*, and focus on the configuration of the hosts in - the cluster +- **cephadm operations**: Health checks in this category are always + executed when the cephadm module is active. +- **cluster configuration**: These health checks are *optional*, and + focus on the configuration of the hosts in the cluster. CEPHADM Operations ------------------ @@ -156,12 +160,14 @@ CEPHADM Operations CEPHADM_PAUSED ~~~~~~~~~~~~~~ -Cephadm background work has been paused with ``ceph orch pause``. Cephadm -continues to perform passive monitoring activities (like checking -host and daemon status), but it will not make any changes (like deploying -or removing daemons). +This indicates that cephadm background work has been paused with +``ceph orch pause``. Cephadm continues to perform passive monitoring +activities (like checking host and daemon status), but it will not +make any changes (like deploying or removing daemons). -Resume cephadm work with:: +Resume cephadm work by running the following command: + +.. prompt:: bash # ceph orch resume @@ -170,23 +176,30 @@ Resume cephadm work with:: CEPHADM_STRAY_HOST ~~~~~~~~~~~~~~~~~~ -One or more hosts have running Ceph daemons but are not registered as -hosts managed by *cephadm*. This means that those services cannot -currently be managed by cephadm (e.g., restarted, upgraded, included -in `ceph orch ps`). +This indicates that one or more hosts have Ceph daemons that are +running, but are not registered as hosts managed by *cephadm*. This +means that those services cannot currently be managed by cephadm +(e.g., restarted, upgraded, included in `ceph orch ps`). -You can manage the host(s) with:: +You can manage the host(s) by running the following command: + +.. prompt:: bash # ceph orch host add ** -Note that you may need to configure SSH access to the remote host -before this will work. +.. note:: + + You might need to configure SSH access to the remote host + before this will work. Alternatively, you can manually connect to the host and ensure that services on that host are removed or migrated to a host that is managed by *cephadm*. -You can also disable this warning entirely with:: +This warning can be disabled entirely by running the following +command: + +.. prompt:: bash # ceph config set mgr mgr/cephadm/warn_on_stray_hosts false @@ -207,7 +220,9 @@ by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is usually easiest to provision a new daemon with the ``ceph orch apply`` command and then stop the unmanaged daemon. -This warning can be disabled entirely with:: +This warning can be disabled entirely by running the following command: + +.. prompt:: bash # ceph config set mgr mgr/cephadm/warn_on_stray_daemons false @@ -220,58 +235,80 @@ that the host satisfies basic prerequisites, like a working container runtime (podman or docker) and working time synchronization. If this test fails, cephadm will no be able to manage services on that host. -You can manually run this check with:: +You can manually run this check by running the following command: + +.. prompt:: bash # ceph cephadm check-host ** -You can remove a broken host from management with:: +You can remove a broken host from management by running the following command: + +.. prompt:: bash # ceph orch host rm ** -You can disable this health warning with:: +You can disable this health warning by running the following command: + +.. prompt:: bash # ceph config set mgr mgr/cephadm/warn_on_failed_host_check false Cluster Configuration Checks ---------------------------- -Cephadm periodically scans each of the hosts in the cluster, to understand the state -of the OS, disks, NICs etc. These facts can then be analysed for consistency across the hosts -in the cluster to identify any configuration anomalies. +Cephadm periodically scans each of the hosts in the cluster in order +to understand the state of the OS, disks, NICs etc. These facts can +then be analysed for consistency across the hosts in the cluster to +identify any configuration anomalies. + +Enabling Cluster Configuration Checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The configuration checks are an **optional** feature, enabled by the following command -:: +The configuration checks are an **optional** feature, and are enabled +by running the following command: + +.. prompt:: bash # ceph config set mgr mgr/cephadm/config_checks_enabled true -The configuration checks are triggered after each host scan (1m). The cephadm log entries will -show the current state and outcome of the configuration checks as follows; +States Returned by Cluster Configuration Checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The configuration checks are triggered after each host scan (1m). The +cephadm log entries will show the current state and outcome of the +configuration checks as follows: -Disabled state (config_checks_enabled false) -:: +Disabled state (config_checks_enabled false): + +.. code-block:: bash ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable -Enabled state (config_checks_enabled true) -:: +Enabled state (config_checks_enabled true): + +.. code-block:: bash CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected -The configuration checks themselves are managed through several cephadm sub-commands. +Managing Configuration Checks (subcommands) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The configuration checks themselves are managed through several cephadm subcommands. -To determine whether the configuration checks are enabled, you can use the following command -:: +To determine whether the configuration checks are enabled, run the following command: + +.. prompt:: bash # ceph cephadm config-check status -This command will return the status of the configuration checker as either "Enabled" or "Disabled". +This command returns the status of the configuration checker as either "Enabled" or "Disabled". + +To list all the configuration checks and their current states, run the following command: -Listing all the configuration checks and their current state -:: +.. code-block:: console - ceph cephadm config-check ls + # ceph cephadm config-check ls - e.g. NAME HEALTHCHECK STATUS DESCRIPTION kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts @@ -282,128 +319,183 @@ Listing all the configuration checks and their current state ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active) kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent -The name of each configuration check, can then be used to enable or disable a specific check. -:: +The name of each configuration check can be used to enable or disable a specific check by running a command of the following form: +: + +.. prompt:: bash # ceph cephadm config-check disable - eg. +For example: + +.. prompt:: bash # + ceph cephadm config-check disable kernel_security CEPHADM_CHECK_KERNEL_LSM ~~~~~~~~~~~~~~~~~~~~~~~~ -Each host within the cluster is expected to operate within the same Linux Security Module (LSM) state. For example, -if the majority of the hosts are running with SELINUX in enforcing mode, any host not running in this mode -would be flagged as an anomaly and a healtcheck (WARNING) state raised. +Each host within the cluster is expected to operate within the same Linux +Security Module (LSM) state. For example, if the majority of the hosts are +running with SELINUX in enforcing mode, any host not running in this mode is +flagged as an anomaly and a healtcheck (WARNING) state raised. CEPHADM_CHECK_SUBSCRIPTION ~~~~~~~~~~~~~~~~~~~~~~~~~~ -This check relates to the status of vendor subscription. This check is only performed for hosts using RHEL, but helps -to confirm that all your hosts are covered by an active subscription so patches and updates -are available. +This check relates to the status of vendor subscription. This check is +performed only for hosts using RHEL, but helps to confirm that all hosts are +covered by an active subscription, which ensures that patches and updates are +available. CEPHADM_CHECK_PUBLIC_MEMBERSHIP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts -that are not on the public network will rely on routing which may affect performance +All members of the cluster should have NICs configured on at least one of the +public network subnets. Hosts that are not on the public network will rely on +routing, which may affect performance. CEPHADM_CHECK_MTU ~~~~~~~~~~~~~~~~~ -The MTU of the NICs on OSDs can be a key factor in consistent performance. This check examines hosts -that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is -determined by establishing the MTU setting that the majority of hosts are using, with any anomalies being -resulting in a Ceph healthcheck. +The MTU of the NICs on OSDs can be a key factor in consistent performance. This +check examines hosts that are running OSD services to ensure that the MTU is +configured consistently within the cluster. This is determined by establishing +the MTU setting that the majority of hosts is using. Any anomalies result in a +Ceph health check. CEPHADM_CHECK_LINKSPEED ~~~~~~~~~~~~~~~~~~~~~~~ -Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance. -This check determines the linkspeed shared by the majority of "OSD hosts", resulting in a healthcheck for -any hosts that are set at a lower linkspeed rate. +This check is similar to the MTU check. Linkspeed consistency is a factor in +consistent cluster performance, just as the MTU of the NICs on the OSDs is. +This check determines the linkspeed shared by the majority of OSD hosts, and a +health check is run for any hosts that are set at a lower linkspeed rate. CEPHADM_CHECK_NETWORK_MISSING ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The public_network and cluster_network settings support subnet definitions for IPv4 and IPv6. If these -settings are not found on any host in the cluster a healthcheck is raised. +The `public_network` and `cluster_network` settings support subnet definitions +for IPv4 and IPv6. If these settings are not found on any host in the cluster, +a health check is raised. CEPHADM_CHECK_CEPH_RELEASE ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Under normal operations, the ceph cluster should be running daemons under the same ceph release (i.e. all -pacific). This check looks at the active release for each daemon, and reports any anomalies as a -healthcheck. *This check is bypassed if an upgrade process is active within the cluster.* +Under normal operations, the Ceph cluster runs daemons under the same ceph +release (that is, the Ceph cluster runs all daemons under (for example) +Octopus). This check determines the active release for each daemon, and +reports any anomalies as a healthcheck. *This check is bypassed if an upgrade +process is active within the cluster.* CEPHADM_CHECK_KERNEL_VERSION ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The OS kernel version (maj.min) is checked for consistency across the hosts. Once again, the -majority of the hosts is used as the basis of identifying anomalies. +The OS kernel version (maj.min) is checked for consistency across the hosts. +The kernel version of the majority of the hosts is used as the basis for +identifying anomalies. Client keyrings and configs =========================== -Cephadm can distribute copies of the ``ceph.conf`` and client keyring -files to hosts. For example, it is usually a good idea to store a -copy of the config and ``client.admin`` keyring on any hosts that will -be used to administer the cluster via the CLI. By default, cephadm will do -this for any nodes with the ``_admin`` label (which normally includes the bootstrap -host). +Cephadm can distribute copies of the ``ceph.conf`` and client keyring files to +hosts. It is usually a good idea to store a copy of the config and +``client.admin`` keyring on any hosts that is used to administer the cluster +via the CLI. By default, cephadm does this for any nodes that have the +``_admin`` label (which normally includes the bootstrap host). When a client keyring is placed under management, cephadm will: - - build a list of target hosts based on the specified placement spec (see :ref:`orchestrator-cli-placement-spec`) + - build a list of target hosts based on the specified placement spec (see + :ref:`orchestrator-cli-placement-spec`) - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s) - store a copy of the keyring file on the specified host(s) - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors) - - update the keyring file if the entity's key is changed (e.g., via ``ceph auth ...`` commands) - - ensure the keyring file has the specified ownership and mode + - update the keyring file if the entity's key is changed (e.g., via ``ceph + auth ...`` commands) + - ensure that the keyring file has the specified ownership and specified mode - remove the keyring file when client keyring management is disabled - - remove the keyring file from old hosts if the keyring placement spec is updated (as needed) + - remove the keyring file from old hosts if the keyring placement spec is + updated (as needed) + +Listing Client Keyrings +----------------------- -To view which client keyrings are currently under management:: +To see the list of client keyrings are currently under management, run the following command: + +.. prompt:: bash # ceph orch client-keyring ls -To place a keyring under management:: +Putting a Keyring Under Management +---------------------------------- + +To put a keyring under management, run a command of the following form: + +.. prompt:: bash # ceph orch client-keyring set [--mode=] [--owner=.] [--path=] -- By default, the *path* will be ``/etc/ceph/client.{entity}.keyring``, which is where - Ceph looks by default. Be careful specifying alternate locations as existing files - may be overwritten. +- By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is + where Ceph looks by default. Be careful when specifying alternate locations, + as existing files may be overwritten. - A placement of ``*`` (all hosts) is common. - The mode defaults to ``0600`` and ownership to ``0:0`` (user root, group root). -For example, to create and deploy a ``client.rbd`` key to hosts with the ``rbd-client`` label and group readable by uid/gid 107 (qemu),:: +For example, to create a ``client.rbd`` key and deploy it to hosts with the +``rbd-client`` label and make it group readable by uid/gid 107 (qemu), run the +following commands: + +.. prompt:: bash # ceph auth get-or-create-key client.rbd mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd pool=my_rbd_pool' ceph orch client-keyring set client.rbd label:rbd-client --owner 107:107 --mode 640 -The resulting keyring file is:: +The resulting keyring file is: + +.. code-block:: console -rw-r-----. 1 qemu qemu 156 Apr 21 08:47 /etc/ceph/client.client.rbd.keyring -To disable management of a keyring file:: +Disabling Management of a Keyring File +-------------------------------------- + +To disable management of a keyring file, run a command of the following form: + +.. prompt:: bash # ceph orch client-keyring rm -Note that this will delete any keyring files for this entity that were previously written -to cluster nodes. +.. note:: + + This deletes any keyring files for this entity that were previously written + to cluster nodes. /etc/ceph/ceph.conf =================== -It may also be useful to distribute ``ceph.conf`` files to hosts without an associated -client keyring file. By default, cephadm only deploys a ``ceph.conf`` file to hosts where a client keyring -is also distributed (see above). To write config files to hosts without client keyrings:: +Distributing ceph.conf to hosts that have no keyrings +----------------------------------------------------- + +It might be useful to distribute ``ceph.conf`` files to hosts without an +associated client keyring file. By default, cephadm deploys only a +``ceph.conf`` file to hosts where a client keyring is also distributed (see +above). To write config files to hosts without client keyrings, run the +following command: + +.. prompt:: bash # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true -By default, the configs are written to all hosts (i.e., those listed -by ``ceph orch host ls``). To specify which hosts get a ``ceph.conf``:: +Using Placement Specs to specify which hosts get keyrings +--------------------------------------------------------- + +By default, the configs are written to all hosts (i.e., those listed by ``ceph +orch host ls``). To specify which hosts get a ``ceph.conf``, run a command of +the following form: + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts - ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts +For example, to distribute configs to hosts with the ``bare_config`` label, run +the following command: -For example, to distribute configs to hosts with the ``bare_config`` label,:: +.. prompt:: bash # - ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config + ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config (See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.) -- 2.39.5