From cc2a296fd3aa4b0ac9395429a28655d35be6cd23 Mon Sep 17 00:00:00 2001 From: Ville Ojamo <14869000+bluikko@users.noreply.github.com> Date: Thu, 8 Jan 2026 16:44:26 +0700 Subject: [PATCH] doc/cephadm: small improvements to operations.rst Remove "*" around CLI command placeholders that was rendered in literal strings. Leave only the usual "" string. Use title case consistently in section titles. Add links to more information. Move a label about logs to the right section. Use double backticks consistently for literal strings. Use confval role instead of literal inline for config keys in text. Fix word capitalization and minor changes to words. Use Monitor, Manager, etc. instead of MON, MGR, etc. Improve markup. Improve prompt usage. Fix duplicate text and remove an unnecessary section. Delete spaces at end of lines. Add full stops or colons. Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com> --- doc/cephadm/operations.rst | 365 +++++++++++++++++++++---------------- 1 file changed, 209 insertions(+), 156 deletions(-) diff --git a/doc/cephadm/operations.rst b/doc/cephadm/operations.rst index c6aeedd9d01a..ad7cf880550a 100644 --- a/doc/cephadm/operations.rst +++ b/doc/cephadm/operations.rst @@ -4,7 +4,7 @@ Cephadm Operations .. _watching_cephadm_logs: -Watching cephadm log messages +Watching Cephadm Log Messages ============================= The cephadm orchestrator module writes logs to the ``cephadm`` cluster log @@ -35,16 +35,13 @@ You can see recent events by running the following command: ceph log last cephadm These events are also logged to the ``ceph.cephadm.log`` file on -monitor hosts as well as to the monitor daemons' stderr. +monitor hosts as well as to the Monitor daemons' stderr. -.. _cephadm-logs: - - -Ceph daemon control +Ceph Daemon Control =================== -Starting and stopping daemons +Starting and Stopping Daemons ----------------------------- You can stop, start, or restart a daemon with: @@ -55,7 +52,7 @@ You can stop, start, or restart a daemon with: ceph orch daemon start ceph orch daemon restart -You can also do the same for all daemons for a service with: +You can also do the same for all daemons for a service with: .. prompt:: bash # @@ -70,7 +67,7 @@ You can also do the same for all daemons for a service with: cases even data loss. -Redeploying or reconfiguring a daemon +Redeploying or Reconfiguring a Daemon ------------------------------------- The container for a daemon can be stopped, recreated, and restarted with @@ -84,7 +81,7 @@ A container image name can optionally be provided to force a particular image to be used (instead of the image specified by the ``container_image`` config value). -If only the ceph configuration needs to be regenerated, you can also +If only the Ceph configuration needs to be regenerated, you can also issue a ``reconfig`` command, which will rewrite the ``ceph.conf`` file but will not trigger a restart of the daemon. @@ -93,7 +90,7 @@ file but will not trigger a restart of the daemon. ceph orch daemon reconfig -Rotating a daemon's authenticate key +Rotating a Daemon's Authenticate Key ------------------------------------ All Ceph and gateway daemons in the cluster have a secret key that is used to connect @@ -104,23 +101,26 @@ new key) with the following command: ceph orch daemon rotate-key -For MDS, OSD, and MGR daemons, this does not require a daemon restart. For other +For MDS, OSD, and Manager daemons, this does not require a daemon restart. For other daemons, however (e.g., RGW), the daemon may be restarted to switch to the new key. -Ceph daemon logs +.. _cephadm-logs: + +Ceph Daemon Logs ================ -Logging to journald +Logging to Journald ------------------- Ceph daemons traditionally write logs to ``/var/log/ceph``. Ceph daemons log to journald by default and Ceph logs are captured by the container runtime environment. They are accessible via ``journalctl``. -.. note:: Prior to Quincy, ceph daemons logged to stderr. +.. note:: Prior to Quincy, Ceph daemons logged to stderr. + -Example of logging to journald +Example of Logging to Journald ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For example, to view the logs for the daemon ``mon.foo`` for a cluster @@ -133,18 +133,20 @@ something like: This works well for normal operations when logging levels are low. -Logging to files + +Logging to Files ---------------- -You can also configure Ceph daemons to log to files instead of to +You can also configure Ceph daemons to log to files instead of journald if you prefer logs to appear in files (as they did in earlier, -pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files, +pre-cephadm, pre-Octopus versions of Ceph). When Ceph logs to files, the logs appear in ``/var/log/ceph/``. If you choose to -configure Ceph to log to files instead of to journald, remember to +configure Ceph to log to files instead of journald, remember to configure Ceph so that it will not log to journald (the commands for this are covered below). -Enabling logging to files + +Enabling Logging to Files ~~~~~~~~~~~~~~~~~~~~~~~~~ To enable logging to files, run the following commands: @@ -154,7 +156,8 @@ To enable logging to files, run the following commands: ceph config set global log_to_file true ceph config set global mon_cluster_log_to_file true -Disabling logging to journald + +Disabling Logging to Journald ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you choose to log to files, we recommend disabling logging to journald or else @@ -168,10 +171,11 @@ to stderr: ceph config set global log_to_journald false ceph config set global mon_cluster_log_to_journald false -.. note:: You can change the default by passing --log-to-file during +.. note:: You can change the default by passing ``--log-to-file`` during bootstrapping a new cluster. -Modifying the log retention schedule + +Modifying the Log Retention Schedule ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, cephadm sets up log rotation on each host to rotate these @@ -179,7 +183,7 @@ files. You can configure the logging retention schedule by modifying ``/etc/logrotate.d/ceph.``. -Per-node cephadm logs +Per-node Cephadm Logs ===================== The cephadm executable, either run directly by a user or by the cephadm @@ -190,7 +194,8 @@ logs to the file ``/var/log/ceph/cephadm.log``. This logging destination is configurable and you may choose to log to the file, to the syslog/journal, or to both. -Setting a cephadm log destination during bootstrap + +Setting a Cephadm Log Destination During Bootstrap -------------------------------------------------- The ``cephadm`` command may be executed with the option ``--log-dest=file`` @@ -208,25 +213,26 @@ For example: If you want to manually specify exactly what log destination to use during bootstrap, independent from the ``--log-dest`` options, you may add -a configuration key ``mgr/cephadm/cephadm_log_destination`` to the +a configuration key :confval:`mgr/cephadm/cephadm_log_destination` to the initial configuration file, under the ``[mgr]`` section. Valid values for the key are: ``file``, ``syslog``, and ``file,syslog``. For example: -.. prompt:: bash # +.. prompt:: bash # auto - cat >/tmp/bootstrap.conf </tmp/bootstrap.conf <`` contains all cluster logs. By default, cephadm logs via stderr and the container runtime. These logs will not exist unless you have enabled logging to files as - described in `cephadm-logs`_. + described in :ref:`cephadm-logs`. * ``/var/lib/ceph/`` contains all cluster daemon data (besides logs). * ``/var/lib/ceph//`` contains all data for @@ -265,22 +271,23 @@ older, pre-cephadm (pre Octopus) versions of ceph: * ``/var/lib/ceph//crash`` contains crash reports for the cluster. * ``/var/lib/ceph//removed`` contains old daemon - data directories for stateful daemons (e.g., monitor, prometheus) + data directories for stateful daemons (e.g., Monitor, Prometheus) that have been removed by cephadm. -Disk usage + +Disk Usage ---------- -Because a few Ceph daemons (notably, the monitors and prometheus) store a +Because a few Ceph daemons (notably, the Monitors and Prometheus) store a large amount of data in ``/var/lib/ceph`` , we recommend moving this directory to its own disk, partition, or logical volume so that it does not fill up the root file system. -Health checks +Health Checks ============= The cephadm module provides additional health checks to supplement the -default health checks provided by the Cluster. These additional health +default health checks provided by the cluster. These additional health checks fall into two categories: - **cephadm operations**: Health checks in this category are always @@ -288,11 +295,12 @@ checks fall into two categories: - **cluster configuration**: These health checks are *optional*, and focus on the configuration of the hosts in the cluster. -CEPHADM Operations + +Cephadm Operations ------------------ -CEPHADM_PAUSED -~~~~~~~~~~~~~~ +``CEPHADM_PAUSED`` +~~~~~~~~~~~~~~~~~~ This indicates that cephadm background work has been paused with ``ceph orch pause``. Cephadm continues to perform passive monitoring @@ -305,27 +313,30 @@ Resume cephadm work by running the following command: ceph orch resume + .. _cephadm-stray-host: -CEPHADM_STRAY_HOST -~~~~~~~~~~~~~~~~~~ +``CEPHADM_STRAY_HOST`` +~~~~~~~~~~~~~~~~~~~~~~ This indicates that one or more hosts have Ceph daemons that are running, but are not registered as hosts managed by *cephadm*. This means that those services cannot currently be managed by cephadm -(e.g., restarted, upgraded, included in `ceph orch ps`). +(e.g., restarted, upgraded, included in ``ceph orch ps``). * You can manage the host(s) by running the following command: .. prompt:: bash # - ceph orch host add ** + ceph orch host add .. note:: You might need to configure SSH access to the remote host before this will work. + See :ref:`cephadm-adding-hosts` for more information. + * See :ref:`cephadm-fqdn` for more information about host names and domain names. @@ -340,31 +351,35 @@ means that those services cannot currently be managed by cephadm ceph config set mgr mgr/cephadm/warn_on_stray_hosts false -CEPHADM_STRAY_DAEMON -~~~~~~~~~~~~~~~~~~~~ + +``CEPHADM_STRAY_DAEMON`` +~~~~~~~~~~~~~~~~~~~~~~~~ One or more Ceph daemons are running but not are not managed by *cephadm*. This may be because they were deployed using a different tool, or because they were started manually. Those services cannot currently be managed by cephadm (e.g., restarted, -upgraded, or included in `ceph orch ps`). +upgraded, or included in ``ceph orch ps``). -* If the daemon is a stateful one (monitor or OSD), it should be adopted +* If the daemon is a stateful one (Monitor or OSD), it should be adopted by cephadm; see :ref:`cephadm-adoption`. For stateless daemons, it is usually easiest to provision a new daemon with the ``ceph orch apply`` command and then stop the unmanaged daemon. -* If the stray daemon(s) are running on hosts not managed by cephadm, you can manage the host(s) by running the following command: +* If the stray daemon(s) are running on hosts not managed by cephadm, you can + manage the host(s) by running the following command: .. prompt:: bash # - ceph orch host add ** + ceph orch host add .. note:: You might need to configure SSH access to the remote host before this will work. + See :ref:`cephadm-adding-hosts` for more information. + * See :ref:`cephadm-fqdn` for more information about host names and domain names. @@ -374,26 +389,29 @@ upgraded, or included in `ceph orch ps`). ceph config set mgr mgr/cephadm/warn_on_stray_daemons false -CEPHADM_HOST_CHECK_FAILED -~~~~~~~~~~~~~~~~~~~~~~~~~ + +``CEPHADM_HOST_CHECK_FAILED`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One or more hosts have failed the basic cephadm host check, which verifies that (1) the host is reachable and cephadm can be executed there, and (2) that the host satisfies basic prerequisites, like a working container -runtime (podman or docker) and working time synchronization. +runtime (Podman or Docker) and working time synchronization. If this test fails, cephadm will not be able to manage services on that host. You can manually run this check by running the following command: .. prompt:: bash # - ceph cephadm check-host ** + ceph cephadm check-host You can remove a broken host from management by running the following command: .. prompt:: bash # - ceph orch host rm ** + ceph orch host rm + +See :ref:`cephadm-removing-hosts` for more information. You can disable this health warning by running the following command: @@ -401,13 +419,16 @@ You can disable this health warning by running the following command: ceph config set mgr mgr/cephadm/warn_on_failed_host_check false + Cluster Configuration Checks ---------------------------- + Cephadm periodically scans each host in the cluster in order to understand the state of the OS, disks, network interfacess etc. This information can then be analyzed for consistency across the hosts in the cluster to identify any configuration anomalies. + Enabling Cluster Configuration Checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -418,6 +439,7 @@ by running the following command: ceph config set mgr mgr/cephadm/config_checks_enabled true + States Returned by Cluster Configuration Checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -425,19 +447,20 @@ Configuration checks are triggered after each host scan. The cephadm log entries will show the current state and outcome of the configuration checks as follows: -Disabled state (config_checks_enabled false): +Disabled state (``config_checks_enabled`` set to false): -.. code-block:: bash +.. code-block:: console ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable -Enabled state (config_checks_enabled true): +Enabled state (``config_checks_enabled`` set to true): -.. code-block:: bash +.. code-block:: console CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected -Managing Configuration Checks (subcommands) + +Managing Configuration Checks (Subcommands) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The configuration checks themselves are managed through several cephadm subcommands. @@ -453,11 +476,13 @@ This command returns the status of the configuration checker as either "Enabled" To list all the configuration checks and their current states, run the following command: -.. code-block:: console +.. prompt:: bash # + + ceph cephadm config-check ls - # ceph cephadm config-check ls +.. code-block:: console - NAME HEALTHCHECK STATUS DESCRIPTION + NAME HEALTHCHECK STATUS DESCRIPTION kernel_security CEPHADM_CHECK_KERNEL_LSM enabled check that SELINUX/Apparmor profiles are consistent across cluster hosts os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled check that subscription states are consistent for all cluster hosts public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a network interface on the Ceph public_network @@ -468,11 +493,10 @@ To list all the configuration checks and their current states, run the following kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the maj.min version of the kernel is consistent across Ceph hosts The name of each configuration check can be used to enable or disable a specific check by running a command of the following form: -: .. prompt:: bash # - ceph cephadm config-check disable + ceph cephadm config-check [enable|disable] For example: @@ -480,73 +504,91 @@ For example: ceph cephadm config-check disable kernel_security -CEPHADM_CHECK_KERNEL_LSM -~~~~~~~~~~~~~~~~~~~~~~~~ + +``CEPHADM_CHECK_KERNEL_LSM`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Each host within the cluster is expected to operate within the same Linux Security Module (LSM) state. For example, if the majority of the hosts are -running with SELINUX in enforcing mode, any host not running in this mode is -flagged as an anomaly and a healthcheck (WARNING) state raised. +running with SELinux in enforcing mode, any host not running in this mode is +flagged as an anomaly and a health check warning is raised. + + +``CEPHADM_CHECK_SUBSCRIPTION`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -CEPHADM_CHECK_SUBSCRIPTION -~~~~~~~~~~~~~~~~~~~~~~~~~~ This check relates to the status of OS vendor subscription. This check is performed only for hosts using RHEL and helps to confirm that all hosts are covered by an active subscription, which ensures that patches and updates are available. -CEPHADM_CHECK_PUBLIC_MEMBERSHIP -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``CEPHADM_CHECK_PUBLIC_MEMBERSHIP`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + All members of the cluster should have a network interface configured on at least one of the public network subnets. Hosts that are not on the public network will rely on routing, which may affect performance. -CEPHADM_CHECK_MTU -~~~~~~~~~~~~~~~~~ + +``CEPHADM_CHECK_MTU`` +~~~~~~~~~~~~~~~~~~~~~ + The MTU of the network interfaces on OSD hosts can be a key factor in consistent performance. This check examines hosts that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is determined by determining the MTU setting that the majority of hosts is using. Any anomalies result in a -health check. +health check warning. + + +``CEPHADM_CHECK_LINKSPEED`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -CEPHADM_CHECK_LINKSPEED -~~~~~~~~~~~~~~~~~~~~~~~ This check is similar to the MTU check. Link speed consistency is a factor in consistent cluster performance, as is the MTU of the OSD node network interfaces. This check determines the link speed shared by the majority of OSD hosts, and a -health check is run for any hosts that are set at a lower link speed rate. +health check warning is raised for any hosts that are set at a lower link speed rate. -CEPHADM_CHECK_NETWORK_MISSING -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The `public_network` and `cluster_network` settings support subnet definitions + +``CEPHADM_CHECK_NETWORK_MISSING`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :confval:`public_network` and :confval:`cluster_network` settings support subnet definitions for IPv4 and IPv6. If these settings are not found on any host in the cluster, -a health check is raised. +a health check warning is raised. + + +``CEPHADM_CHECK_CEPH_RELEASE`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -CEPHADM_CHECK_CEPH_RELEASE -~~~~~~~~~~~~~~~~~~~~~~~~~~ Under normal operations, the Ceph cluster runs daemons that are of the same Ceph release (for example, Reef). This check determines the active release for each daemon, and -reports any anomalies as a healthcheck. *This check is bypassed if an upgrade +reports any anomalies as a health check warning. *This check is bypassed if an upgrade is in process.* -CEPHADM_CHECK_KERNEL_VERSION -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``CEPHADM_CHECK_KERNEL_VERSION`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The OS kernel version (maj.min) is checked for consistency across hosts. The kernel version of the majority of the hosts is used as the basis for identifying anomalies. + .. _client_keyrings_and_configs: -Client keyrings and configs -=========================== +Management of Client Keyrings and Configs +========================================= + Cephadm can distribute copies of the ``ceph.conf`` file and client keyring files to hosts. Starting from versions 16.2.10 (Pacific) and 17.2.1 (Quincy), -in addition to the default location ``/etc/ceph/`` cephadm also stores config +in addition to the default location ``/etc/ceph`` cephadm also stores config and keyring files in the ``/var/lib/ceph//config`` directory. It is usually a good idea to store a copy of the config and ``client.admin`` keyring on any host used to administer the cluster via the CLI. By default, cephadm does this for any nodes that have the ``_admin`` label (which normally includes the bootstrap host). -.. note:: Ceph daemons will still use files on ``/etc/ceph/``. The new configuration +.. note:: Ceph daemons will still use files on ``/etc/ceph``. The new configuration location ``/var/lib/ceph//config`` is used by cephadm only. Having this config directory under the fsid helps cephadm to load the configuration associated with the cluster. @@ -557,10 +599,12 @@ When a client keyring is placed under management, cephadm will: - build a list of target hosts based on the specified placement spec (see :ref:`orchestrator-cli-placement-spec`) - store a copy of the ``/etc/ceph/ceph.conf`` file on the specified host(s) - - store a copy of the ``ceph.conf`` file at ``/var/lib/ceph//config/ceph.conf`` on the specified host(s) - - store a copy of the ``ceph.client.admin.keyring`` file at ``/var/lib/ceph//config/ceph.client.admin.keyring`` on the specified host(s) + - store a copy of the ``ceph.conf`` file at ``/var/lib/ceph//config/ceph.conf`` + on the specified host(s) + - store a copy of the ``ceph.client.admin.keyring`` file at ``/var/lib/ceph//config/ceph.client.admin.keyring`` + on the specified host(s) - store a copy of the keyring file on the specified host(s) - - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster monitors) + - update the ``ceph.conf`` file as needed (e.g., due to a change in the cluster Monitors) - update the keyring file if the entity's key is changed (e.g., via ``ceph auth ...`` commands) - ensure that the keyring file has the specified ownership and specified mode @@ -568,6 +612,7 @@ When a client keyring is placed under management, cephadm will: - remove the keyring file from old hosts if the keyring placement spec is updated (as needed) + Listing Client Keyrings ----------------------- @@ -577,16 +622,17 @@ To see the list of client keyrings are currently under management, run the follo ceph orch client-keyring ls -Putting a Keyring Under Management ----------------------------------- -To put a keyring under management, run a command of the following form: +Enabling Management of a Keyring File +------------------------------------- + +To enable management of a keyring file, run a command of the following form: .. prompt:: bash # ceph orch client-keyring set [--mode=] [--owner=.] [--path=] -- By default, the *path* is ``/etc/ceph/client.{entity}.keyring``, which is +- By default, the ``path`` is ``/etc/ceph/client.{entity}.keyring``, which is where Ceph looks by default. Be careful when specifying alternate locations, as existing files may be overwritten. - A placement of ``*`` (all hosts) is common. @@ -614,6 +660,7 @@ This feature can be suppressed by passing ``--no-ceph-conf`` when setting the ke ceph orch client-keyring set client.foo label:foo 0:0 --no-ceph-conf + Disabling Management of a Keyring File -------------------------------------- @@ -628,25 +675,27 @@ To disable management of a keyring file, run a command of the following form: This deletes any keyring files for this entity that were previously written to cluster nodes. + .. _etc_ceph_conf_distribution: -/etc/ceph/ceph.conf -=================== +Management of ``/etc/ceph/ceph.conf`` +===================================== -Distributing ceph.conf to hosts that have no keyrings ------------------------------------------------------ +Distributing ``ceph.conf`` to Hosts that Have no Keyrings +--------------------------------------------------------- It might be useful to distribute ``ceph.conf`` files to hosts without an associated client keyring file. By default, cephadm deploys only a ``ceph.conf`` file to hosts where a client keyring is also distributed (see -above). To write config files to hosts without client keyrings, run the -following command: +:ref:`client_keyrings_and_configs` above). To write config files to hosts +without client keyrings, run the following command: .. prompt:: bash # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true -Using Placement Specs to specify which hosts get keyrings + +Using Placement Specs to Specify which Hosts Get Keyrings --------------------------------------------------------- By default, the configs are written to all hosts (i.e., those listed by ``ceph @@ -660,32 +709,27 @@ the following form: For example, to distribute configs to hosts with the ``bare_config`` label, run the following command: -Distributing ceph.conf to hosts tagged with bare_config -------------------------------------------------------- - -For example, to distribute configs to hosts with the ``bare_config`` label, run the following command: - .. prompt:: bash # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf_hosts label:bare_config -(See :ref:`orchestrator-cli-placement-spec` for more information about placement specs.) +See :ref:`orchestrator-cli-placement-spec` for more information about placement specs. -Limiting Password-less sudo Access -================================== +Limiting Passwordless Sudo Access +================================= -By default, the cephadm install guide recommends enabling password-less +By default, the cephadm install guide recommends enabling passwordless ``sudo`` for the cephadm user. This option is the most flexible and future-proof but may not be preferred in all environments. An administrator can restrict ``sudo`` to only running an exact list of commands without password access. Note that this list may change between Ceph versions and administrators choosing this option should read the release notes and review this list in the destination version of the Ceph documentation. If the list -differs one must extend the list of password-less ``sudo`` commands prior to +differs one must extend the list of passwordless ``sudo`` commands prior to upgrade. -Commands requiring password-less sudo support: +Commands requiring passwordless sudo support: - ``chmod`` - ``chown`` @@ -699,7 +743,7 @@ Commands requiring password-less sudo support: - ``which`` (see note) - ``/usr/bin/cephadm`` or python executable (see note) -.. note:: Typically cephadm will execute ``which`` to determine what python3 +.. note:: Typically cephadm will execute ``which`` to determine which ``python3`` command is available and then use the command returned by ``which`` in subsequent commands. Before configuring ``sudo`` run ``which python3`` to determine what @@ -710,7 +754,7 @@ Commands requiring password-less sudo support: Configuring the ``sudoers`` file can be performed using a tool like ``visudo`` and adding or replacing a user configuration line such as the following: -.. code-block:: +.. code-block:: console # assuming the cephadm user is named "ceph" ceph ALL=(ALL) NOPASSWD:/usr/bin/chmod,/usr/bin/chown,/usr/bin/ls,/usr/bin/mkdir,/usr/bin/mv,/usr/bin/rm,/usr/sbin/sysctl,/usr/bin/touch,/usr/bin/true,/usr/bin/which,/usr/bin/cephadm,/usr/bin/python3 @@ -719,10 +763,10 @@ and adding or replacing a user configuration line such as the following: Purging a cluster ================= -.. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER +.. danger:: THIS OPERATION WILL DESTROY ALL DATA STORED IN THIS CLUSTER! In order to destroy a cluster and delete all data stored in this cluster, disable -cephadm to stop all orchestration operations (so we avoid deploying new daemons). +cephadm to stop all orchestration operations (so we avoid deploying new daemons): .. prompt:: bash # @@ -734,7 +778,7 @@ Then verify the FSID of the cluster: ceph fsid -Purge ceph daemons from all hosts in the cluster +Purge Ceph daemons from all hosts in the cluster: .. prompt:: bash # @@ -742,7 +786,7 @@ Purge ceph daemons from all hosts in the cluster cephadm rm-cluster --force --zap-osds --fsid -Replacing a device +Replacing a Device ================== The ``ceph orch device replace`` command automates the process of replacing the underlying device of an OSD. @@ -750,49 +794,51 @@ Previously, this process required manual intervention at various stages. With this new command, all necessary operations are performed automatically, streamlining the replacement process and improving the overall user experience. -.. note:: This only supports LVM-based deployed OSD(s) +.. note:: This supports only LVM-based deployed OSD(s). .. prompt:: bash # ceph orch device replace -In the case the device being replaced is shared by multiple OSDs (eg: DB/WAL device shared by multiple OSDs), the orchestrator will warn you. +If the device being replaced is shared by multiple OSDs (e.g., DB/WAL +device shared by multiple OSDs), the orchestrator will warn you: -.. prompt:: bash # - - [ceph: root@ceph /]# ceph orch device replace osd-1 /dev/vdd +.. prompt:: bash # auto + # ceph orch device replace osd-1 /dev/vdd Error EINVAL: /dev/vdd is a shared device. Replacing /dev/vdd implies destroying OSD(s): ['0', '1']. Please, *be very careful*, this can be a very dangerous operation. If you know what you are doing, pass --yes-i-really-mean-it -If you know what you are doing, you can go ahead and pass ``--yes-i-really-mean-it``. +If you know what you are doing, you can go ahead and pass ``--yes-i-really-mean-it``: -.. prompt:: bash # +.. prompt:: bash # auto - [ceph: root@ceph /]# ceph orch device replace osd-1 /dev/vdd --yes-i-really-mean-it - Scheduled to destroy osds: ['6', '7', '8'] and mark /dev/vdd as being replaced. + # ceph orch device replace osd-1 /dev/vdd --yes-i-really-mean-it + Scheduled to destroy osds: ['6', '7', '8'] and mark /dev/vdd as being replaced. -``cephadm`` will make ``ceph-volume`` zap and destroy all related devices and mark the corresponding OSD as ``destroyed`` so the -different OSD(s) ID(s) will be preserved: +``cephadm`` will execute ``ceph-volume`` to zap and destroy all related devices +and mark the corresponding OSD as ``destroyed`` so the different OSD(s) ID(s) +will be preserved: -.. prompt:: bash # +.. prompt:: bash # auto - [ceph: root@ceph-1 /]# ceph osd tree - ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF - -1 0.97659 root default - -3 0.97659 host devel-1 - 0 hdd 0.29300 osd.0 destroyed 1.00000 1.00000 - 1 hdd 0.29300 osd.1 destroyed 1.00000 1.00000 - 2 hdd 0.19530 osd.2 up 1.00000 1.00000 - 3 hdd 0.19530 osd.3 up 1.00000 1.00000 + # ceph osd tree + ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF + -1 0.97659 root default + -3 0.97659 host devel-1 + 0 hdd 0.29300 osd.0 destroyed 1.00000 1.00000 + 1 hdd 0.29300 osd.1 destroyed 1.00000 1.00000 + 2 hdd 0.19530 osd.2 up 1.00000 1.00000 + 3 hdd 0.19530 osd.3 up 1.00000 1.00000 -The device being replaced is finally seen as ``being replaced`` preventing ``cephadm`` from redeploying the OSDs too fast: +The device being replaced is finally seen as ``being replaced`` +preventing ``cephadm`` from redeploying the OSDs too fast: -.. prompt:: bash # +.. prompt:: bash # auto - [ceph: root@ceph-1 /]# ceph orch device ls + # ceph orch device ls HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS osd-1 /dev/vdb hdd 200G Yes 13s ago osd-1 /dev/vdc hdd 200G Yes 13s ago @@ -800,12 +846,19 @@ The device being replaced is finally seen as ``being replaced`` preventing ``cep osd-1 /dev/vde hdd 200G No 13s ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected osd-1 /dev/vdf hdd 200G No 13s ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected -If for any reason you need to clear the 'device replace header' on a device, then you can use ``ceph orch device replace --clear``: +If for any reason you need to clear the 'device replace header' on a device, +run a command of the following form: -.. prompt:: bash # +.. prompt:: bash # auto + + # ceph orch device replace --clear + +For example: + +.. prompt:: bash # auto - [ceph: root@devel-1 /]# ceph orch device replace devel-1 /dev/vdk --clear + # ceph orch device replace devel-1 /dev/vdk --clear Replacement header cleared on /dev/vdk - [ceph: root@devel-1 /]# -After that, ``cephadm`` will redeploy the OSD service spec within a few minutes (unless the service is set to ``unmanaged``). +Cephadm will then redeploy the OSD service spec within a few minutes +(unless the service is set to ``unmanaged``). -- 2.47.3