From: Sebastian Wagner Date: Mon, 13 Sep 2021 15:15:33 +0000 (+0200) Subject: doc/cephadm: move services into services/ X-Git-Tag: v16.2.7~67^2~50 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=abc1aab98094d993f0c1bc93faae3c500ca1899c;p=ceph.git doc/cephadm: move services into services/ This is going to clean up the toctree a bit. Signed-off-by: Sebastian Wagner (cherry picked from commit 8c70398949f773d8b992ad8ae6c71383460d2932) --- diff --git a/doc/cephadm/custom-container.rst b/doc/cephadm/custom-container.rst deleted file mode 100644 index 542fcf16261d8..0000000000000 --- a/doc/cephadm/custom-container.rst +++ /dev/null @@ -1,78 +0,0 @@ -======================== -Custom Container Service -======================== - -The orchestrator enables custom containers to be deployed using a YAML file. -A corresponding :ref:`orchestrator-cli-service-spec` must look like: - -.. code-block:: yaml - - service_type: container - service_id: foo - placement: - ... - image: docker.io/library/foo:latest - entrypoint: /usr/bin/foo - uid: 1000 - gid: 1000 - args: - - "--net=host" - - "--cpus=2" - ports: - - 8080 - - 8443 - envs: - - SECRET=mypassword - - PORT=8080 - - PUID=1000 - - PGID=1000 - volume_mounts: - CONFIG_DIR: /etc/foo - bind_mounts: - - ['type=bind', 'source=lib/modules', 'destination=/lib/modules', 'ro=true'] - dirs: - - CONFIG_DIR - files: - CONFIG_DIR/foo.conf: - - refresh=true - - username=xyz - - "port: 1234" - -where the properties of a service specification are: - -* ``service_id`` - A unique name of the service. -* ``image`` - The name of the Docker image. -* ``uid`` - The UID to use when creating directories and files in the host system. -* ``gid`` - The GID to use when creating directories and files in the host system. -* ``entrypoint`` - Overwrite the default ENTRYPOINT of the image. -* ``args`` - A list of additional Podman/Docker command line arguments. -* ``ports`` - A list of TCP ports to open in the host firewall. -* ``envs`` - A list of environment variables. -* ``bind_mounts`` - When you use a bind mount, a file or directory on the host machine - is mounted into the container. Relative `source=...` paths will be - located below `/var/lib/ceph//`. -* ``volume_mounts`` - When you use a volume mount, a new directory is created within - Docker’s storage directory on the host machine, and Docker manages - that directory’s contents. Relative source paths will be located below - `/var/lib/ceph//`. -* ``dirs`` - A list of directories that are created below - `/var/lib/ceph//`. -* ``files`` - A dictionary, where the key is the relative path of the file and the - value the file content. The content must be double quoted when using - a string. Use '\\n' for line breaks in that case. Otherwise define - multi-line content as list of strings. The given files will be created - below the directory `/var/lib/ceph//`. - The absolute path of the directory where the file will be created must - exist. Use the `dirs` property to create them if necessary. diff --git a/doc/cephadm/index.rst b/doc/cephadm/index.rst index 373262d800606..d57fe288a01d6 100644 --- a/doc/cephadm/index.rst +++ b/doc/cephadm/index.rst @@ -21,21 +21,12 @@ either via the Ceph command-line interface (CLI) or via the dashboard (GUI). versions of Ceph. .. toctree:: - :maxdepth: 1 + :maxdepth: 2 compatibility install adoption host-management - mon - mgr - osd - rgw - mds - nfs - iscsi - custom-container - monitoring Service Management upgrade Cephadm operations diff --git a/doc/cephadm/iscsi.rst b/doc/cephadm/iscsi.rst deleted file mode 100644 index 581d2c9d40612..0000000000000 --- a/doc/cephadm/iscsi.rst +++ /dev/null @@ -1,74 +0,0 @@ -============= -iSCSI Service -============= - -.. _cephadm-iscsi: - -Deploying iSCSI -=============== - -To deploy an iSCSI gateway, create a yaml file containing a -service specification for iscsi: - -.. code-block:: yaml - - service_type: iscsi - service_id: iscsi - placement: - hosts: - - host1 - - host2 - spec: - pool: mypool # RADOS pool where ceph-iscsi config data is stored. - trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2" - api_port: ... # optional - api_user: ... # optional - api_password: ... # optional - api_secure: true/false # optional - ssl_cert: | # optional - ... - ssl_key: | # optional - ... - -For example: - -.. code-block:: yaml - - service_type: iscsi - service_id: iscsi - placement: - hosts: - - [...] - spec: - pool: iscsi_pool - trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2,IP_ADDRESS_3,..." - api_user: API_USERNAME - api_password: API_PASSWORD - api_secure: true - ssl_cert: | - -----BEGIN CERTIFICATE----- - MIIDtTCCAp2gAwIBAgIYMC4xNzc1NDQxNjEzMzc2MjMyXzxvQ7EcMA0GCSqGSIb3 - DQEBCwUAMG0xCzAJBgNVBAYTAlVTMQ0wCwYDVQQIDARVdGFoMRcwFQYDVQQHDA5T - [...] - -----END CERTIFICATE----- - ssl_key: | - -----BEGIN PRIVATE KEY----- - MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC5jdYbjtNTAKW4 - /CwQr/7wOiLGzVxChn3mmCIF3DwbL/qvTFTX2d8bDf6LjGwLYloXHscRfxszX/4h - [...] - -----END PRIVATE KEY----- - - -The specification can then be applied using: - -.. prompt:: bash # - - ceph orch apply -i iscsi.yaml - - -See :ref:`orchestrator-cli-placement-spec` for details of the placement specification. - -Further Reading -=============== - -* RBD: :ref:`ceph-iscsi` diff --git a/doc/cephadm/mds.rst b/doc/cephadm/mds.rst deleted file mode 100644 index 949a0fa5d8e7a..0000000000000 --- a/doc/cephadm/mds.rst +++ /dev/null @@ -1,49 +0,0 @@ -=========== -MDS Service -=========== - - -.. _orchestrator-cli-cephfs: - -Deploy CephFS -============= - -One or more MDS daemons is required to use the :term:`CephFS` file system. -These are created automatically if the newer ``ceph fs volume`` -interface is used to create a new file system. For more information, -see :ref:`fs-volumes-and-subvolumes`. - -For example: - -.. prompt:: bash # - - ceph fs volume create --placement="" - -where ``fs_name`` is the name of the CephFS and ``placement`` is a -:ref:`orchestrator-cli-placement-spec`. - -For manually deploying MDS daemons, use this specification: - -.. code-block:: yaml - - service_type: mds - service_id: fs_name - placement: - count: 3 - - -The specification can then be applied using: - -.. prompt:: bash # - - ceph orch apply -i mds.yaml - -See :ref:`orchestrator-cli-stateless-services` for manually deploying -MDS daemons on the CLI. - -Further Reading -=============== - -* :ref:`ceph-file-system` - - diff --git a/doc/cephadm/mgr.rst b/doc/cephadm/mgr.rst deleted file mode 100644 index 98a54398b18c0..0000000000000 --- a/doc/cephadm/mgr.rst +++ /dev/null @@ -1,37 +0,0 @@ -.. _mgr-cephadm-mgr: - -=========== -MGR Service -=========== - -The cephadm MGR service is hosting different modules, like the :ref:`mgr-dashboard` -and the cephadm manager module. - -.. _cephadm-mgr-networks: - -Specifying Networks -------------------- - -The MGR service supports binding only to a specific IP within a network. - -example spec file (leveraging a default placement): - -.. code-block:: yaml - - service_type: mgr - networks: - - 192.169.142.0/24 - -Allow co-location of MGR daemons -================================ - -In deployment scenarios with just a single host, cephadm still needs -to deploy at least two MGR daemons. See ``mgr_standby_modules`` in -the :ref:`mgr-administrator-guide` for further details. - -Further Reading -=============== - -* :ref:`ceph-manager-daemon` -* :ref:`cephadm-manually-deploy-mgr` - diff --git a/doc/cephadm/mon.rst b/doc/cephadm/mon.rst deleted file mode 100644 index 6326b73f46d39..0000000000000 --- a/doc/cephadm/mon.rst +++ /dev/null @@ -1,179 +0,0 @@ -=========== -MON Service -=========== - -.. _deploy_additional_monitors: - -Deploying additional monitors -============================= - -A typical Ceph cluster has three or five monitor daemons that are spread -across different hosts. We recommend deploying five monitors if there are -five or more nodes in your cluster. - -.. _CIDR: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation - -Ceph deploys monitor daemons automatically as the cluster grows and Ceph -scales back monitor daemons automatically as the cluster shrinks. The -smooth execution of this automatic growing and shrinking depends upon -proper subnet configuration. - -The cephadm bootstrap procedure assigns the first monitor daemon in the -cluster to a particular subnet. ``cephadm`` designates that subnet as the -default subnet of the cluster. New monitor daemons will be assigned by -default to that subnet unless cephadm is instructed to do otherwise. - -If all of the ceph monitor daemons in your cluster are in the same subnet, -manual administration of the ceph monitor daemons is not necessary. -``cephadm`` will automatically add up to five monitors to the subnet, as -needed, as new hosts are added to the cluster. - -By default, cephadm will deploy 5 daemons on arbitrary hosts. See -:ref:`orchestrator-cli-placement-spec` for details of specifying -the placement of daemons. - -Designating a Particular Subnet for Monitors --------------------------------------------- - -To designate a particular IP subnet for use by ceph monitor daemons, use a -command of the following form, including the subnet's address in `CIDR`_ -format (e.g., ``10.1.2.0/24``): - - .. prompt:: bash # - - ceph config set mon public_network ** - - For example: - - .. prompt:: bash # - - ceph config set mon public_network 10.1.2.0/24 - -Cephadm deploys new monitor daemons only on hosts that have IP addresses in -the designated subnet. - -You can also specify two public networks by using a list of networks: - - .. prompt:: bash # - - ceph config set mon public_network *,* - - For example: - - .. prompt:: bash # - - ceph config set mon public_network 10.1.2.0/24,192.168.0.1/24 - - -Deploying Monitors on a Particular Network ------------------------------------------- - -You can explicitly specify the IP address or CIDR network for each monitor and -control where each monitor is placed. To disable automated monitor deployment, -run this command: - - .. prompt:: bash # - - ceph orch apply mon --unmanaged - - To deploy each additional monitor: - - .. prompt:: bash # - - ceph orch daemon add mon * - - For example, to deploy a second monitor on ``newhost1`` using an IP - address ``10.1.2.123`` and a third monitor on ``newhost2`` in - network ``10.1.2.0/24``, run the following commands: - - .. prompt:: bash # - - ceph orch apply mon --unmanaged - ceph orch daemon add mon newhost1:10.1.2.123 - ceph orch daemon add mon newhost2:10.1.2.0/24 - - Now, enable automatic placement of Daemons - - .. prompt:: bash # - - ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run - - See :ref:`orchestrator-cli-placement-spec` for details of specifying - the placement of daemons. - - Finally apply this new placement by dropping ``--dry-run`` - - .. prompt:: bash # - - ceph orch apply mon --placement="newhost1,newhost2,newhost3" - - -Moving Monitors to a Different Network --------------------------------------- - -To move Monitors to a new network, deploy new monitors on the new network and -subsequently remove monitors from the old network. It is not advised to -modify and inject the ``monmap`` manually. - -First, disable the automated placement of daemons: - - .. prompt:: bash # - - ceph orch apply mon --unmanaged - -To deploy each additional monitor: - - .. prompt:: bash # - - ceph orch daemon add mon ** - -For example, to deploy a second monitor on ``newhost1`` using an IP -address ``10.1.2.123`` and a third monitor on ``newhost2`` in -network ``10.1.2.0/24``, run the following commands: - - .. prompt:: bash # - - ceph orch apply mon --unmanaged - ceph orch daemon add mon newhost1:10.1.2.123 - ceph orch daemon add mon newhost2:10.1.2.0/24 - - Subsequently remove monitors from the old network: - - .. prompt:: bash # - - ceph orch daemon rm *mon.* - - Update the ``public_network``: - - .. prompt:: bash # - - ceph config set mon public_network ** - - For example: - - .. prompt:: bash # - - ceph config set mon public_network 10.1.2.0/24 - - Now, enable automatic placement of Daemons - - .. prompt:: bash # - - ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run - - See :ref:`orchestrator-cli-placement-spec` for details of specifying - the placement of daemons. - - Finally apply this new placement by dropping ``--dry-run`` - - .. prompt:: bash # - - ceph orch apply mon --placement="newhost1,newhost2,newhost3" - -Futher Reading -============== - -* :ref:`rados-operations` -* :ref:`rados-troubleshooting-mon` -* :ref:`cephadm-restore-quorum` - diff --git a/doc/cephadm/monitoring.rst b/doc/cephadm/monitoring.rst deleted file mode 100644 index 91b8742f3cfb9..0000000000000 --- a/doc/cephadm/monitoring.rst +++ /dev/null @@ -1,371 +0,0 @@ -.. _mgr-cephadm-monitoring: - -Monitoring Services -=================== - -Ceph Dashboard uses `Prometheus `_, `Grafana -`_, and related tools to store and visualize detailed -metrics on cluster utilization and performance. Ceph users have three options: - -#. Have cephadm deploy and configure these services. This is the default - when bootstrapping a new cluster unless the ``--skip-monitoring-stack`` - option is used. -#. Deploy and configure these services manually. This is recommended for users - with existing prometheus services in their environment (and in cases where - Ceph is running in Kubernetes with Rook). -#. Skip the monitoring stack completely. Some Ceph dashboard graphs will - not be available. - -The monitoring stack consists of `Prometheus `_, -Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter -`_), `Prometheus Alert -Manager `_ and `Grafana -`_. - -.. note:: - - Prometheus' security model presumes that untrusted users have access to the - Prometheus HTTP endpoint and logs. Untrusted users have access to all the - (meta)data Prometheus collects that is contained in the database, plus a - variety of operational and debugging information. - - However, Prometheus' HTTP API is limited to read-only operations. - Configurations can *not* be changed using the API and secrets are not - exposed. Moreover, Prometheus has some built-in measures to mitigate the - impact of denial of service attacks. - - Please see `Prometheus' Security model - ` for more detailed - information. - -Deploying monitoring with cephadm ---------------------------------- - -The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It -is however possible that you have a Ceph cluster without a monitoring stack, -and you would like to add a monitoring stack to it. (Here are some ways that -you might have come to have a Ceph cluster without a monitoring stack: You -might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during -the installation of the cluster, or you might have converted an existing -cluster (which had no monitoring stack) to cephadm management.) - -To set up monitoring on a Ceph cluster that has no monitoring, follow the -steps below: - -#. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization: - - .. prompt:: bash # - - ceph orch apply node-exporter - -#. Deploy alertmanager: - - .. prompt:: bash # - - ceph orch apply alertmanager - -#. Deploy Prometheus. A single Prometheus instance is sufficient, but - for high availablility (HA) you might want to deploy two: - - .. prompt:: bash # - - ceph orch apply prometheus - - or - - .. prompt:: bash # - - ceph orch apply prometheus --placement 'count:2' - -#. Deploy grafana: - - .. prompt:: bash # - - ceph orch apply grafana - -.. _cephadm-monitoring-networks-ports: - -Networks and Ports -~~~~~~~~~~~~~~~~~~ - -All monitoring services can have the network and port they bind to configured with a yaml service specification - -example spec file: - -.. code-block:: yaml - - service_type: grafana - service_name: grafana - placement: - count: 1 - networks: - - 192.169.142.0/24 - spec: - port: 4200 - -Using custom images -~~~~~~~~~~~~~~~~~~~ - -It is possible to install or upgrade monitoring components based on other -images. To do so, the name of the image to be used needs to be stored in the -configuration first. The following configuration options are available. - -- ``container_image_prometheus`` -- ``container_image_grafana`` -- ``container_image_alertmanager`` -- ``container_image_node_exporter`` - -Custom images can be set with the ``ceph config`` command - -.. code-block:: bash - - ceph config set mgr mgr/cephadm/ - -For example - -.. code-block:: bash - - ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1 - -If there were already running monitoring stack daemon(s) of the type whose -image you've changed, you must redeploy the daemon(s) in order to have them -actually use the new image. - -For example, if you had changed the prometheus image - -.. prompt:: bash # - - ceph orch redeploy prometheus - - -.. note:: - - By setting a custom image, the default value will be overridden (but not - overwritten). The default value changes when updates become available. - By setting a custom image, you will not be able to update the component - you have set the custom image for automatically. You will need to - manually update the configuration (image name and tag) to be able to - install updates. - - If you choose to go with the recommendations instead, you can reset the - custom image you have set before. After that, the default value will be - used again. Use ``ceph config rm`` to reset the configuration option - - .. code-block:: bash - - ceph config rm mgr mgr/cephadm/ - - For example - - .. code-block:: bash - - ceph config rm mgr mgr/cephadm/container_image_prometheus - -Using custom configuration files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -By overriding cephadm templates, it is possible to completely customize the -configuration files for monitoring services. - -Internally, cephadm already uses `Jinja2 -`_ templates to generate the -configuration files for all monitoring components. To be able to customize the -configuration of Prometheus, Grafana or the Alertmanager it is possible to store -a Jinja2 template for each service that will be used for configuration -generation instead. This template will be evaluated every time a service of that -kind is deployed or reconfigured. That way, the custom configuration is -preserved and automatically applied on future deployments of these services. - -.. note:: - - The configuration of the custom template is also preserved when the default - configuration of cephadm changes. If the updated configuration is to be used, - the custom template needs to be migrated *manually*. - -Option names -"""""""""""" - -The following templates for files that will be generated by cephadm can be -overridden. These are the names to be used when storing with ``ceph config-key -set``: - -- ``services/alertmanager/alertmanager.yml`` -- ``services/grafana/ceph-dashboard.yml`` -- ``services/grafana/grafana.ini`` -- ``services/prometheus/prometheus.yml`` - -You can look up the file templates that are currently used by cephadm in -``src/pybind/mgr/cephadm/templates``: - -- ``services/alertmanager/alertmanager.yml.j2`` -- ``services/grafana/ceph-dashboard.yml.j2`` -- ``services/grafana/grafana.ini.j2`` -- ``services/prometheus/prometheus.yml.j2`` - -Usage -""""" - -The following command applies a single line value: - -.. code-block:: bash - - ceph config-key set mgr/cephadm/ - -To set contents of files as template use the ``-i`` argument: - -.. code-block:: bash - - ceph config-key set mgr/cephadm/ -i $PWD/ - -.. note:: - - When using files as input to ``config-key`` an absolute path to the file must - be used. - - -Then the configuration file for the service needs to be recreated. -This is done using `reconfig`. For more details see the following example. - -Example -""""""" - -.. code-block:: bash - - # set the contents of ./prometheus.yml.j2 as template - ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \ - -i $PWD/prometheus.yml.j2 - - # reconfig the prometheus service - ceph orch reconfig prometheus - -Deploying monitoring without cephadm ------------------------------------- - -If you have an existing prometheus monitoring infrastructure, or would like -to manage it yourself, you need to configure it to integrate with your Ceph -cluster. - -* Enable the prometheus module in the ceph-mgr daemon - - .. code-block:: bash - - ceph mgr module enable prometheus - - By default, ceph-mgr presents prometheus metrics on port 9283 on each host - running a ceph-mgr daemon. Configure prometheus to scrape these. - -* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`. - -* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`. - -Disabling monitoring --------------------- - -To disable monitoring and remove the software that supports it, run the following commands: - -.. code-block:: console - - $ ceph orch rm grafana - $ ceph orch rm prometheus --force # this will delete metrics data collected so far - $ ceph orch rm node-exporter - $ ceph orch rm alertmanager - $ ceph mgr module disable prometheus - -See also :ref:`orch-rm`. - -Setting up RBD-Image monitoring -------------------------------- - -Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see -:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana -and the metrics will not be visible in Prometheus. - -Setting up Grafana ------------------- - -Manually setting the Grafana URL -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Cephadm automatically configures Prometheus, Grafana, and Alertmanager in -all cases except one. - -In a some setups, the Dashboard user's browser might not be able to access the -Grafana URL that is configured in Ceph Dashboard. This can happen when the -cluster and the accessing user are in different DNS zones. - -If this is the case, you can use a configuration option for Ceph Dashboard -to set the URL that the user's browser will use to access Grafana. This -value will never be altered by cephadm. To set this configuration option, -issue the following command: - - .. prompt:: bash $ - - ceph dashboard set-grafana-frontend-api-url - -It might take a minute or two for services to be deployed. After the -services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: - -.. code-block:: console - - $ ceph orch ls - NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC - alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present - crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present - grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent - node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present - prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present - -Configuring SSL/TLS for Grafana -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -``cephadm`` deploys Grafana using the certificate defined in the ceph -key/value store. If no certificate is specified, ``cephadm`` generates a -self-signed certificate during the deployment of the Grafana service. - -A custom certificate can be configured using the following commands: - -.. prompt:: bash # - - ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem - ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem - -If you have already deployed Grafana, run ``reconfig`` on the service to -update its configuration: - -.. prompt:: bash # - - ceph orch reconfig grafana - -The ``reconfig`` command also sets the proper URL for Ceph Dashboard. - -Setting up Alertmanager ------------------------ - -Adding Alertmanager webhooks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To add new webhooks to the Alertmanager configuration, add additional -webhook urls like so: - -.. code-block:: yaml - - service_type: alertmanager - spec: - user_data: - default_webhook_urls: - - "https://foo" - - "https://bar" - -Where ``default_webhook_urls`` is a list of additional URLs that are -added to the default receivers' ```` configuration. - -Run ``reconfig`` on the service to update its configuration: - -.. prompt:: bash # - - ceph orch reconfig alertmanager - -Further Reading ---------------- - -* :ref:`mgr-prometheus` diff --git a/doc/cephadm/nfs.rst b/doc/cephadm/nfs.rst deleted file mode 100644 index c48d0f7658f71..0000000000000 --- a/doc/cephadm/nfs.rst +++ /dev/null @@ -1,120 +0,0 @@ -.. _deploy-cephadm-nfs-ganesha: - -=========== -NFS Service -=========== - -.. note:: Only the NFSv4 protocol is supported. - -The simplest way to manage NFS is via the ``ceph nfs cluster ...`` -commands; see :ref:`mgr-nfs`. This document covers how to manage the -cephadm services directly, which should only be necessary for unusual NFS -configurations. - -Deploying NFS ganesha -===================== - -Cephadm deploys NFS Ganesha daemon (or set of daemons). The configuration for -NFS is stored in the ``nfs-ganesha`` pool and exports are managed via the -``ceph nfs export ...`` commands and via the dashboard. - -To deploy a NFS Ganesha gateway, run the following command: - -.. prompt:: bash # - - ceph orch apply nfs ** [--port **] [--placement ...] - -For example, to deploy NFS with a service id of *foo* on the default -port 2049 with the default placement of a single daemon: - -.. prompt:: bash # - - ceph orch apply nfs foo - -See :ref:`orchestrator-cli-placement-spec` for the details of the placement -specification. - -Service Specification -===================== - -Alternatively, an NFS service can be applied using a YAML specification. - -.. code-block:: yaml - - service_type: nfs - service_id: mynfs - placement: - hosts: - - host1 - - host2 - spec: - port: 12345 - -In this example, we run the server on the non-default ``port`` of -12345 (instead of the default 2049) on ``host1`` and ``host2``. - -The specification can then be applied by running the following command: - -.. prompt:: bash # - - ceph orch apply -i nfs.yaml - -.. _cephadm-ha-nfs: - -High-availability NFS -===================== - -Deploying an *ingress* service for an existing *nfs* service will provide: - -* a stable, virtual IP that can be used to access the NFS server -* fail-over between hosts if there is a host failure -* load distribution across multiple NFS gateways (although this is rarely necessary) - -Ingress for NFS can be deployed for an existing NFS service -(``nfs.mynfs`` in this example) with the following specification: - -.. code-block:: yaml - - service_type: ingress - service_id: nfs.mynfs - placement: - count: 2 - spec: - backend_service: nfs.mynfs - frontend_port: 2049 - monitor_port: 9000 - virtual_ip: 10.0.0.123/24 - -A few notes: - - * The *virtual_ip* must include a CIDR prefix length, as in the - example above. The virtual IP will normally be configured on the - first identified network interface that has an existing IP in the - same subnet. You can also specify a *virtual_interface_networks* - property to match against IPs in other networks; see - :ref:`ingress-virtual-ip` for more information. - * The *monitor_port* is used to access the haproxy load status - page. The user is ``admin`` by default, but can be modified by - via an *admin* property in the spec. If a password is not - specified via a *password* property in the spec, the auto-generated password - can be found with: - - .. prompt:: bash # - - ceph config-key get mgr/cephadm/ingress.*{svc_id}*/monitor_password - - For example: - - .. prompt:: bash # - - ceph config-key get mgr/cephadm/ingress.nfs.myfoo/monitor_password - - * The backend service (``nfs.mynfs`` in this example) should include - a *port* property that is not 2049 to avoid conflicting with the - ingress service, which could be placed on the same host(s). - -Further Reading -=============== - -* CephFS: :ref:`cephfs-nfs` -* MGR: :ref:`mgr-nfs` diff --git a/doc/cephadm/osd.rst b/doc/cephadm/osd.rst deleted file mode 100644 index 4347d05aff5b7..0000000000000 --- a/doc/cephadm/osd.rst +++ /dev/null @@ -1,890 +0,0 @@ -*********** -OSD Service -*********** -.. _device management: ../rados/operations/devices -.. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt - -List Devices -============ - -``ceph-volume`` scans each host in the cluster from time to time in order -to determine which devices are present and whether they are eligible to be -used as OSDs. - -To print a list of devices discovered by ``cephadm``, run this command: - -.. prompt:: bash # - - ceph orch device ls [--hostname=...] [--wide] [--refresh] - -Example -:: - - Hostname Path Type Serial Size Health Ident Fault Available - srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No - srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No - srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No - srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No - srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No - srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No - srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No - srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No - srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No - srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No - srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No - -Using the ``--wide`` option provides all details relating to the device, -including any reasons that the device might not be eligible for use as an OSD. - -In the above example you can see fields named "Health", "Ident", and "Fault". -This information is provided by integration with `libstoragemgmt`_. By default, -this integration is disabled (because `libstoragemgmt`_ may not be 100% -compatible with your hardware). To make ``cephadm`` include these fields, -enable cephadm's "enhanced device scan" option as follows; - -.. prompt:: bash # - - ceph config set mgr mgr/cephadm/device_enhanced_scan true - -.. warning:: - Although the libstoragemgmt library performs standard SCSI inquiry calls, - there is no guarantee that your firmware fully implements these standards. - This can lead to erratic behaviour and even bus resets on some older - hardware. It is therefore recommended that, before enabling this feature, - you test your hardware's compatibility with libstoragemgmt first to avoid - unplanned interruptions to services. - - There are a number of ways to test compatibility, but the simplest may be - to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell - lsmcli ldl``. If your hardware is supported you should see something like - this: - - :: - - Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status - ---------------------------------------------------------------------------- - /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good - /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good - - -After you have enabled libstoragemgmt support, the output will look something -like this: - -:: - - # ceph orch device ls - Hostname Path Type Serial Size Health Ident Fault Available - srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No - srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No - : - -In this example, libstoragemgmt has confirmed the health of the drives and the ability to -interact with the Identification and Fault LEDs on the drive enclosures. For further -information about interacting with these LEDs, refer to `device management`_. - -.. note:: - The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based - local disks only. There is no official support for NVMe devices (PCIe) - -.. _cephadm-deploy-osds: - -Deploy OSDs -=========== - -Listing Storage Devices ------------------------ - -In order to deploy an OSD, there must be a storage device that is *available* on -which the OSD will be deployed. - -Run this command to display an inventory of storage devices on all cluster hosts: - -.. prompt:: bash # - - ceph orch device ls - -A storage device is considered *available* if all of the following -conditions are met: - -* The device must have no partitions. -* The device must not have any LVM state. -* The device must not be mounted. -* The device must not contain a file system. -* The device must not contain a Ceph BlueStore OSD. -* The device must be larger than 5 GB. - -Ceph will not provision an OSD on a device that is not available. - -Creating New OSDs ------------------ - -There are a few ways to create new OSDs: - -* Tell Ceph to consume any available and unused storage device: - - .. prompt:: bash # - - ceph orch apply osd --all-available-devices - -* Create an OSD from a specific device on a specific host: - - .. prompt:: bash # - - ceph orch daemon add osd **:** - - For example: - - .. prompt:: bash # - - ceph orch daemon add osd host1:/dev/sdb - -* You can use :ref:`drivegroups` to categorize device(s) based on their - properties. This might be useful in forming a clearer picture of which - devices are available to consume. Properties include device type (SSD or - HDD), device model names, size, and the hosts on which the devices exist: - - .. prompt:: bash # - - ceph orch apply -i spec.yml - -Dry Run -------- - -The ``--dry-run`` flag causes the orchestrator to present a preview of what -will happen without actually creating the OSDs. - -For example: - - .. prompt:: bash # - - ceph orch apply osd --all-available-devices --dry-run - - :: - - NAME HOST DATA DB WAL - all-available-devices node1 /dev/vdb - - - all-available-devices node2 /dev/vdc - - - all-available-devices node3 /dev/vdd - - - -.. _cephadm-osd-declarative: - -Declarative State ------------------ - -The effect of ``ceph orch apply`` is persistent. This means that drives that -are added to the system after the ``ceph orch apply`` command completes will be -automatically found and added to the cluster. It also means that drives that -become available (by zapping, for example) after the ``ceph orch apply`` -command completes will be automatically found and added to the cluster. - -We will examine the effects of the following command: - - .. prompt:: bash # - - ceph orch apply osd --all-available-devices - -After running the above command: - -* If you add new disks to the cluster, they will automatically be used to - create new OSDs. -* If you remove an OSD and clean the LVM physical volume, a new OSD will be - created automatically. - -To disable the automatic creation of OSD on available devices, use the -``unmanaged`` parameter: - -If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: - -.. prompt:: bash # - - ceph orch apply osd --all-available-devices --unmanaged=true - -.. note:: - - Keep these three facts in mind: - - - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. - - - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. - - - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. - -* For cephadm, see also :ref:`cephadm-spec-unmanaged`. - -.. _cephadm-osd-removal: - -Remove an OSD -============= - -Removing an OSD from a cluster involves two steps: - -#. evacuating all placement groups (PGs) from the cluster -#. removing the PG-free OSD from the cluster - -The following command performs these two steps: - -.. prompt:: bash # - - ceph orch osd rm [--replace] [--force] - -Example: - -.. prompt:: bash # - - ceph orch osd rm 0 - -Expected output:: - - Scheduled OSD(s) for removal - -OSDs that are not safe to destroy will be rejected. - -Monitoring OSD State --------------------- - -You can query the state of OSD operation with the following command: - -.. prompt:: bash # - - ceph orch osd rm status - -Expected output:: - - OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT - 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 - 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 - 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 - - -When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. - -.. note:: - After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. - For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. - -Stopping OSD Removal --------------------- - -It is possible to stop queued OSD removals by using the following command: - -.. prompt:: bash # - - ceph orch osd rm stop - -Example: - -.. prompt:: bash # - - ceph orch osd rm stop 4 - -Expected output:: - - Stopped OSD(s) removal - -This resets the initial state of the OSD and takes it off the removal queue. - - -Replacing an OSD ----------------- - -.. prompt:: bash # - - orch osd rm --replace [--force] - -Example: - -.. prompt:: bash # - - ceph orch osd rm 4 --replace - -Expected output:: - - Scheduled OSD(s) for replacement - -This follows the same procedure as the procedure in the "Remove OSD" section, with -one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is -instead assigned a 'destroyed' flag. - -**Preserving the OSD ID** - -The 'destroyed' flag is used to determine which OSD ids will be reused in the -next OSD deployment. - -If you use OSDSpecs for OSD deployment, your newly added disks will be assigned -the OSD ids of their replaced counterparts. This assumes that the new disks -still match the OSDSpecs. - -Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` -command does what you want it to. The ``--dry-run`` flag shows you what the -outcome of the command will be without making the changes you specify. When -you are satisfied that the command will do what you want, run the command -without the ``--dry-run`` flag. - -.. tip:: - - The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` - -Alternatively, you can use your OSDSpec file: - -.. prompt:: bash # - - ceph orch apply osd -i --dry-run - -Expected output:: - - NAME HOST DATA DB WAL - node1 /dev/vdb - - - - -When this output reflects your intention, omit the ``--dry-run`` flag to -execute the deployment. - - -Erasing Devices (Zapping Devices) ---------------------------------- - -Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume -zap`` on the remote host. - -.. prompt:: bash # - - ceph orch device zap - -Example command: - -.. prompt:: bash # - - ceph orch device zap my_hostname /dev/sdx - -.. note:: - If the unmanaged flag is unset, cephadm automatically deploys drives that - match the DriveGroup in your OSDSpec. For example, if you use the - ``all-available-devices`` option when creating OSDs, when you ``zap`` a - device the cephadm orchestrator automatically creates a new OSD in the - device. To disable this behavior, see :ref:`cephadm-osd-declarative`. - - -.. _osd_autotune: - -Automatically tuning OSD memory -=============================== - -OSD daemons will adjust their memory consumption based on the -``osd_memory_target`` config option (several gigabytes, by -default). If Ceph is deployed on dedicated nodes that are not sharing -memory with other services, cephadm can automatically adjust the per-OSD -memory consumption based on the total amount of RAM and the number of deployed -OSDs. - -This option is enabled globally with:: - - ceph config set osd osd_memory_target_autotune true - -Cephadm will start with a fraction -(``mgr/cephadm/autotune_memory_target_ratio``, which defaults to -``.7``) of the total RAM in the system, subtract off any memory -consumed by non-autotuned daemons (non-OSDs, for OSDs for which -``osd_memory_target_autotune`` is false), and then divide by the -remaining OSDs. - -The final targets are reflected in the config database with options like:: - - WHO MASK LEVEL OPTION VALUE - osd host:foo basic osd_memory_target 126092301926 - osd host:bar basic osd_memory_target 6442450944 - -Both the limits and the current memory consumed by each daemon are visible from -the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: - - NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID - osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c - osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 - osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 - -To exclude an OSD from memory autotuning, disable the autotune option -for that OSD and also set a specific memory target. For example, - - .. prompt:: bash # - - ceph config set osd.123 osd_memory_target_autotune false - ceph config set osd.123 osd_memory_target 16G - - -.. _drivegroups: - -Advanced OSD Service Specifications -=================================== - -:ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a -cluster layout, using the properties of disks. Service specifications give the -user an abstract way to tell Ceph which disks should turn into OSDs with which -configurations, without knowing the specifics of device names and paths. - -Service specifications make it possible to define a yaml or json file that can -be used to reduce the amount of manual work involved in creating OSDs. - -For example, instead of running the following command: - -.. prompt:: bash [monitor.1]# - - ceph orch daemon add osd **:** - -for each device and each host, we can define a yaml or json file that allows us -to describe the layout. Here's the most basic example. - -Create a file called (for example) ``osd_spec.yml``: - -.. code-block:: yaml - - service_type: osd - service_id: default_drive_group <- name of the drive_group (name can be custom) - placement: - host_pattern: '*' <- which hosts to target, currently only supports globs - data_devices: <- the type of devices you are applying specs to - all: true <- a filter, check below for a full list - -This means : - -#. Turn any available device (ceph-volume decides what 'available' is) into an - OSD on all hosts that match the glob pattern '*'. (The glob pattern matches - against the registered hosts from `host ls`) A more detailed section on - host_pattern is available below. - -#. Then pass it to `osd create` like this: - - .. prompt:: bash [monitor.1]# - - ceph orch apply osd -i /path/to/osd_spec.yml - - This instruction will be issued to all the matching hosts, and will deploy - these OSDs. - - Setups more complex than the one specified by the ``all`` filter are - possible. See :ref:`osd_filters` for details. - - A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a - synopsis of the proposed layout. - -Example - -.. prompt:: bash [monitor.1]# - - ceph orch apply osd -i /path/to/osd_spec.yml --dry-run - - - -.. _osd_filters: - -Filters -------- - -.. note:: - Filters are applied using an `AND` gate by default. This means that a drive - must fulfill all filter criteria in order to get selected. This behavior can - be adjusted by setting ``filter_logic: OR`` in the OSD specification. - -Filters are used to assign disks to groups, using their attributes to group -them. - -The attributes are based off of ceph-volume's disk query. You can retrieve -information about the attributes with this command: - -.. code-block:: bash - - ceph-volume inventory - -Vendor or Model -^^^^^^^^^^^^^^^ - -Specific disks can be targeted by vendor or model: - -.. code-block:: yaml - - model: disk_model_name - -or - -.. code-block:: yaml - - vendor: disk_vendor_name - - -Size -^^^^ - -Specific disks can be targeted by `Size`: - -.. code-block:: yaml - - size: size_spec - -Size specs -__________ - -Size specifications can be of the following forms: - -* LOW:HIGH -* :HIGH -* LOW: -* EXACT - -Concrete examples: - -To include disks of an exact size - -.. code-block:: yaml - - size: '10G' - -To include disks within a given range of size: - -.. code-block:: yaml - - size: '10G:40G' - -To include disks that are less than or equal to 10G in size: - -.. code-block:: yaml - - size: ':10G' - -To include disks equal to or greater than 40G in size: - -.. code-block:: yaml - - size: '40G:' - -Sizes don't have to be specified exclusively in Gigabytes(G). - -Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T). -Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. - - -Rotational -^^^^^^^^^^ - -This operates on the 'rotational' attribute of the disk. - -.. code-block:: yaml - - rotational: 0 | 1 - -`1` to match all disks that are rotational - -`0` to match all disks that are non-rotational (SSD, NVME etc) - - -All -^^^ - -This will take all disks that are 'available' - -Note: This is exclusive for the data_devices section. - -.. code-block:: yaml - - all: true - - -Limiter -^^^^^^^ - -If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: - -.. code-block:: yaml - - limit: 2 - -For example, if you used `vendor` to match all disks that are from `VendorA` -but want to use only the first two, you could use `limit`: - -.. code-block:: yaml - - data_devices: - vendor: VendorA - limit: 2 - -Note: `limit` is a last resort and shouldn't be used if it can be avoided. - - -Additional Options ------------------- - -There are multiple optional settings you can use to change the way OSDs are deployed. -You can add these options to the base level of a DriveGroup for it to take effect. - -This example would deploy all OSDs with encryption enabled. - -.. code-block:: yaml - - service_type: osd - service_id: example_osd_spec - placement: - host_pattern: '*' - data_devices: - all: true - encrypted: true - -See a full list in the DriveGroupSpecs - -.. py:currentmodule:: ceph.deployment.drive_group - -.. autoclass:: DriveGroupSpec - :members: - :exclude-members: from_json - -Examples --------- - -The simple case -^^^^^^^^^^^^^^^ - -All nodes with the same setup - -.. code-block:: none - - 20 HDDs - Vendor: VendorA - Model: HDD-123-foo - Size: 4TB - - 2 SSDs - Vendor: VendorB - Model: MC-55-44-ZX - Size: 512GB - -This is a common setup and can be described quite easily: - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_default - placement: - host_pattern: '*' - data_devices: - model: HDD-123-foo <- note that HDD-123 would also be valid - db_devices: - model: MC-55-44-XZ <- same here, MC-55-44 is valid - -However, we can improve it by reducing the filters on core properties of the drives: - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_default - placement: - host_pattern: '*' - data_devices: - rotational: 1 - db_devices: - rotational: 0 - -Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) - -If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_default - placement: - host_pattern: '*' - data_devices: - size: '2TB:' - db_devices: - size: ':2TB' - -Note: All of the above DriveGroups are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. - - -The advanced case -^^^^^^^^^^^^^^^^^ - -Here we have two distinct setups - -.. code-block:: none - - 20 HDDs - Vendor: VendorA - Model: HDD-123-foo - Size: 4TB - - 12 SSDs - Vendor: VendorB - Model: MC-55-44-ZX - Size: 512GB - - 2 NVMEs - Vendor: VendorC - Model: NVME-QQQQ-987 - Size: 256GB - - -* 20 HDDs should share 2 SSDs -* 10 SSDs should share 2 NVMes - -This can be described with two layouts. - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_hdd - placement: - host_pattern: '*' - data_devices: - rotational: 0 - db_devices: - model: MC-55-44-XZ - limit: 2 (db_slots is actually to be favoured here, but it's not implemented yet) - --- - service_type: osd - service_id: osd_spec_ssd - placement: - host_pattern: '*' - data_devices: - model: MC-55-44-XZ - db_devices: - vendor: VendorC - -This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. -The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. - -The advanced case (with non-uniform nodes) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The examples above assumed that all nodes have the same drives. That's however not always the case. - -Node1-5 - -.. code-block:: none - - 20 HDDs - Vendor: Intel - Model: SSD-123-foo - Size: 4TB - 2 SSDs - Vendor: VendorA - Model: MC-55-44-ZX - Size: 512GB - -Node6-10 - -.. code-block:: none - - 5 NVMEs - Vendor: Intel - Model: SSD-123-foo - Size: 4TB - 20 SSDs - Vendor: VendorA - Model: MC-55-44-ZX - Size: 512GB - -You can use the 'host_pattern' key in the layout to target certain nodes. Salt target notation helps to keep things easy. - - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_node_one_to_five - placement: - host_pattern: 'node[1-5]' - data_devices: - rotational: 1 - db_devices: - rotational: 0 - --- - service_type: osd - service_id: osd_spec_six_to_ten - placement: - host_pattern: 'node[6-10]' - data_devices: - model: MC-55-44-XZ - db_devices: - model: SSD-123-foo - -This applies different OSD specs to different hosts depending on the `host_pattern` key. - -Dedicated wal + db -^^^^^^^^^^^^^^^^^^ - -All previous cases co-located the WALs with the DBs. -It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. - -.. code-block:: none - - 20 HDDs - Vendor: VendorA - Model: SSD-123-foo - Size: 4TB - - 2 SSDs - Vendor: VendorB - Model: MC-55-44-ZX - Size: 512GB - - 2 NVMEs - Vendor: VendorC - Model: NVME-QQQQ-987 - Size: 256GB - - -The OSD spec for this case would look like the following (using the `model` filter): - -.. code-block:: yaml - - service_type: osd - service_id: osd_spec_default - placement: - host_pattern: '*' - data_devices: - model: MC-55-44-XZ - db_devices: - model: SSD-123-foo - wal_devices: - model: NVME-QQQQ-987 - - -It is also possible to specify directly device paths in specific hosts like the following: - -.. code-block:: yaml - - service_type: osd - service_id: osd_using_paths - placement: - hosts: - - Node01 - - Node02 - data_devices: - paths: - - /dev/sdb - db_devices: - paths: - - /dev/sdc - wal_devices: - paths: - - /dev/sdd - - -This can easily be done with other filters, like `size` or `vendor` as well. - -.. _cephadm-osd-activate: - -Activate existing OSDs -====================== - -In case the OS of a host was reinstalled, existing OSDs need to be activated -again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that -activates all existing OSDs on a host. - -.. prompt:: bash # - - ceph cephadm osd activate ... - -This will scan all existing disks for OSDs and deploy corresponding daemons. - -Futher Reading -============== - -* :ref:`ceph-volume` -* :ref:`rados-index` diff --git a/doc/cephadm/rgw.rst b/doc/cephadm/rgw.rst deleted file mode 100644 index f7bc72e1312a7..0000000000000 --- a/doc/cephadm/rgw.rst +++ /dev/null @@ -1,281 +0,0 @@ -=========== -RGW Service -=========== - -.. _cephadm-deploy-rgw: - -Deploy RGWs -=========== - -Cephadm deploys radosgw as a collection of daemons that manage a -single-cluster deployment or a particular *realm* and *zone* in a -multisite deployment. (For more information about realms and zones, -see :ref:`multisite`.) - -Note that with cephadm, radosgw daemons are configured via the monitor -configuration database instead of via a `ceph.conf` or the command line. If -that configuration isn't already in place (usually in the -``client.rgw.`` section), then the radosgw -daemons will start up with default settings (e.g., binding to port -80). - -To deploy a set of radosgw daemons, with an arbitrary service name -*name*, run the following command: - -.. prompt:: bash # - - ceph orch apply rgw ** [--realm=**] [--zone=**] --placement="** [** ...]" - -Trivial setup -------------- - -For example, to deploy 2 RGW daemons (the default) for a single-cluster RGW deployment -under the arbitrary service id *foo*: - -.. prompt:: bash # - - ceph orch apply rgw foo - -Designated gateways -------------------- - -A common scenario is to have a labeled set of hosts that will act -as gateways, with multiple instances of radosgw running on consecutive -ports 8000 and 8001: - -.. prompt:: bash # - - ceph orch host label add gwhost1 rgw # the 'rgw' label can be anything - ceph orch host label add gwhost2 rgw - ceph orch apply rgw foo '--placement=label:rgw count-per-host:2' --port=8000 - -.. _cephadm-rgw-networks: - -Specifying Networks -------------------- - -The RGW service can have the network they bind to configured with a yaml service specification. - -example spec file: - -.. code-block:: yaml - - service_type: rgw - service_name: foo - placement: - label: rgw - count-per-host: 2 - networks: - - 192.169.142.0/24 - spec: - port: 8000 - - -Multisite zones ---------------- - -To deploy RGWs serving the multisite *myorg* realm and the *us-east-1* zone on -*myhost1* and *myhost2*: - -.. prompt:: bash # - - ceph orch apply rgw east --realm=myorg --zone=us-east-1 --placement="2 myhost1 myhost2" - -Note that in a multisite situation, cephadm only deploys the daemons. It does not create -or update the realm or zone configurations. To create a new realm and zone, you need to do -something like: - -.. prompt:: bash # - - radosgw-admin realm create --rgw-realm= --default - -.. prompt:: bash # - - radosgw-admin zonegroup create --rgw-zonegroup= --master --default - -.. prompt:: bash # - - radosgw-admin zone create --rgw-zonegroup= --rgw-zone= --master --default - -.. prompt:: bash # - - radosgw-admin period update --rgw-realm= --commit - -See :ref:`orchestrator-cli-placement-spec` for details of the placement -specification. See :ref:`multisite` for more information of setting up multisite RGW. - -See also :ref:`multisite`. - -Setting up HTTPS ----------------- - -In order to enable HTTPS for RGW services, apply a spec file following this scheme: - -.. code-block:: yaml - - service_type: rgw - service_id: myrgw - spec: - rgw_frontend_ssl_certificate: | - -----BEGIN PRIVATE KEY----- - V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt - ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 - IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu - YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg - ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= - -----END PRIVATE KEY----- - -----BEGIN CERTIFICATE----- - V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt - ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 - IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu - YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg - ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= - -----END CERTIFICATE----- - ssl: true - -Then apply this yaml document: - -.. prompt:: bash # - - ceph orch apply -i myrgw.yaml - -Note the value of ``rgw_frontend_ssl_certificate`` is a literal string as -indicated by a ``|`` character preserving newline characters. - -.. _orchestrator-haproxy-service-spec: - -High availability service for RGW -================================= - -The *ingress* service allows you to create a high availability endpoint -for RGW with a minumum set of configuration options. The orchestrator will -deploy and manage a combination of haproxy and keepalived to provide load -balancing on a floating virtual IP. - -If SSL is used, then SSL must be configured and terminated by the ingress service -and not RGW itself. - -.. image:: ../images/HAProxy_for_RGW.svg - -There are N hosts where the ingress service is deployed. Each host -has a haproxy daemon and a keepalived daemon. A virtual IP is -automatically configured on only one of these hosts at a time. - -Each keepalived daemon checks every few seconds whether the haproxy -daemon on the same host is responding. Keepalived will also check -that the master keepalived daemon is running without problems. If the -"master" keepalived daemon or the active haproxy is not responding, -one of the remaining keepalived daemons running in backup mode will be -elected as master, and the virtual IP will be moved to that node. - -The active haproxy acts like a load balancer, distributing all RGW requests -between all the RGW daemons available. - -Prerequisites -------------- - -* An existing RGW service, without SSL. (If you want SSL service, the certificate - should be configured on the ingress service, not the RGW service.) - -Deploying ---------- - -Use the command:: - - ceph orch apply -i - -Service specification ---------------------- - -It is a yaml format file with the following properties: - -.. code-block:: yaml - - service_type: ingress - service_id: rgw.something # adjust to match your existing RGW service - placement: - hosts: - - host1 - - host2 - - host3 - spec: - backend_service: rgw.something # adjust to match your existing RGW service - virtual_ip: / # ex: 192.168.20.1/24 - frontend_port: # ex: 8080 - monitor_port: # ex: 1967, used by haproxy for load balancer status - virtual_interface_networks: [ ... ] # optional: list of CIDR networks - ssl_cert: | # optional: SSL certificate and key - -----BEGIN CERTIFICATE----- - ... - -----END CERTIFICATE----- - -----BEGIN PRIVATE KEY----- - ... - -----END PRIVATE KEY----- - -where the properties of this service specification are: - -* ``service_type`` - Mandatory and set to "ingress" -* ``service_id`` - The name of the service. We suggest naming this after the service you are - controlling ingress for (e.g., ``rgw.foo``). -* ``placement hosts`` - The hosts where it is desired to run the HA daemons. An haproxy and a - keepalived container will be deployed on these hosts. These hosts do not need - to match the nodes where RGW is deployed. -* ``virtual_ip`` - The virtual IP (and network) in CIDR format where the ingress service will be available. -* ``virtual_interface_networks`` - A list of networks to identify which ethernet interface to use for the virtual IP. -* ``frontend_port`` - The port used to access the ingress service. -* ``ssl_cert``: - SSL certificate, if SSL is to be enabled. This must contain the both the certificate and - private key blocks in .pem format. - -.. _ingress-virtual-ip: - -Selecting ethernet interfaces for the virtual IP ------------------------------------------------- - -You cannot simply provide the name of the network interface on which -to configure the virtual IP because interface names tend to vary -across hosts (and/or reboots). Instead, cephadm will select -interfaces based on other existing IP addresses that are already -configured. - -Normally, the virtual IP will be configured on the first network -interface that has an existing IP in the same subnet. For example, if -the virtual IP is 192.168.0.80/24 and eth2 has the static IP -192.168.0.40/24, cephadm will use eth2. - -In some cases, the virtual IP may not belong to the same subnet as an existing static -IP. In such cases, you can provide a list of subnets to match against existing IPs, -and cephadm will put the virtual IP on the first network interface to match. For example, -if the virtual IP is 192.168.0.80/24 and we want it on the same interface as the machine's -static IP in 10.10.0.0/16, you can use a spec like:: - - service_type: ingress - service_id: rgw.something - spec: - virtual_ip: 192.168.0.80/24 - virtual_interface_networks: - - 10.10.0.0/16 - ... - -A consequence of this strategy is that you cannot currently configure the virtual IP -on an interface that has no existing IP address. In this situation, we suggest -configuring a "dummy" IP address is an unroutable network on the correct interface -and reference that dummy network in the networks list (see above). - - -Useful hints for ingress ------------------------- - -* It is good to have at least 3 RGW daemons. -* We recommend at least 3 hosts for the ingress service. - -Further Reading -=============== - -* :ref:`object-gateway` diff --git a/doc/cephadm/services/custom-container.rst b/doc/cephadm/services/custom-container.rst new file mode 100644 index 0000000000000..542fcf16261d8 --- /dev/null +++ b/doc/cephadm/services/custom-container.rst @@ -0,0 +1,78 @@ +======================== +Custom Container Service +======================== + +The orchestrator enables custom containers to be deployed using a YAML file. +A corresponding :ref:`orchestrator-cli-service-spec` must look like: + +.. code-block:: yaml + + service_type: container + service_id: foo + placement: + ... + image: docker.io/library/foo:latest + entrypoint: /usr/bin/foo + uid: 1000 + gid: 1000 + args: + - "--net=host" + - "--cpus=2" + ports: + - 8080 + - 8443 + envs: + - SECRET=mypassword + - PORT=8080 + - PUID=1000 + - PGID=1000 + volume_mounts: + CONFIG_DIR: /etc/foo + bind_mounts: + - ['type=bind', 'source=lib/modules', 'destination=/lib/modules', 'ro=true'] + dirs: + - CONFIG_DIR + files: + CONFIG_DIR/foo.conf: + - refresh=true + - username=xyz + - "port: 1234" + +where the properties of a service specification are: + +* ``service_id`` + A unique name of the service. +* ``image`` + The name of the Docker image. +* ``uid`` + The UID to use when creating directories and files in the host system. +* ``gid`` + The GID to use when creating directories and files in the host system. +* ``entrypoint`` + Overwrite the default ENTRYPOINT of the image. +* ``args`` + A list of additional Podman/Docker command line arguments. +* ``ports`` + A list of TCP ports to open in the host firewall. +* ``envs`` + A list of environment variables. +* ``bind_mounts`` + When you use a bind mount, a file or directory on the host machine + is mounted into the container. Relative `source=...` paths will be + located below `/var/lib/ceph//`. +* ``volume_mounts`` + When you use a volume mount, a new directory is created within + Docker’s storage directory on the host machine, and Docker manages + that directory’s contents. Relative source paths will be located below + `/var/lib/ceph//`. +* ``dirs`` + A list of directories that are created below + `/var/lib/ceph//`. +* ``files`` + A dictionary, where the key is the relative path of the file and the + value the file content. The content must be double quoted when using + a string. Use '\\n' for line breaks in that case. Otherwise define + multi-line content as list of strings. The given files will be created + below the directory `/var/lib/ceph//`. + The absolute path of the directory where the file will be created must + exist. Use the `dirs` property to create them if necessary. diff --git a/doc/cephadm/services/index.rst b/doc/cephadm/services/index.rst index 6d382b6ec7b08..3fb7bddf5115d 100644 --- a/doc/cephadm/services/index.rst +++ b/doc/cephadm/services/index.rst @@ -2,10 +2,27 @@ Service Management ================== +A service is a group of daemons configured together. See these chapters +for details on individual services: + +.. toctree:: + :maxdepth: 1 + + mon + mgr + osd + rgw + mds + nfs + iscsi + custom-container + monitoring + Service Status ============== -A service is a group of daemons configured together. To see the status of one + +To see the status of one of the services running in the Ceph cluster, do the following: #. Use the command line to print a list of services. diff --git a/doc/cephadm/services/iscsi.rst b/doc/cephadm/services/iscsi.rst new file mode 100644 index 0000000000000..581d2c9d40612 --- /dev/null +++ b/doc/cephadm/services/iscsi.rst @@ -0,0 +1,74 @@ +============= +iSCSI Service +============= + +.. _cephadm-iscsi: + +Deploying iSCSI +=============== + +To deploy an iSCSI gateway, create a yaml file containing a +service specification for iscsi: + +.. code-block:: yaml + + service_type: iscsi + service_id: iscsi + placement: + hosts: + - host1 + - host2 + spec: + pool: mypool # RADOS pool where ceph-iscsi config data is stored. + trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2" + api_port: ... # optional + api_user: ... # optional + api_password: ... # optional + api_secure: true/false # optional + ssl_cert: | # optional + ... + ssl_key: | # optional + ... + +For example: + +.. code-block:: yaml + + service_type: iscsi + service_id: iscsi + placement: + hosts: + - [...] + spec: + pool: iscsi_pool + trusted_ip_list: "IP_ADDRESS_1,IP_ADDRESS_2,IP_ADDRESS_3,..." + api_user: API_USERNAME + api_password: API_PASSWORD + api_secure: true + ssl_cert: | + -----BEGIN CERTIFICATE----- + MIIDtTCCAp2gAwIBAgIYMC4xNzc1NDQxNjEzMzc2MjMyXzxvQ7EcMA0GCSqGSIb3 + DQEBCwUAMG0xCzAJBgNVBAYTAlVTMQ0wCwYDVQQIDARVdGFoMRcwFQYDVQQHDA5T + [...] + -----END CERTIFICATE----- + ssl_key: | + -----BEGIN PRIVATE KEY----- + MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC5jdYbjtNTAKW4 + /CwQr/7wOiLGzVxChn3mmCIF3DwbL/qvTFTX2d8bDf6LjGwLYloXHscRfxszX/4h + [...] + -----END PRIVATE KEY----- + + +The specification can then be applied using: + +.. prompt:: bash # + + ceph orch apply -i iscsi.yaml + + +See :ref:`orchestrator-cli-placement-spec` for details of the placement specification. + +Further Reading +=============== + +* RBD: :ref:`ceph-iscsi` diff --git a/doc/cephadm/services/mds.rst b/doc/cephadm/services/mds.rst new file mode 100644 index 0000000000000..949a0fa5d8e7a --- /dev/null +++ b/doc/cephadm/services/mds.rst @@ -0,0 +1,49 @@ +=========== +MDS Service +=========== + + +.. _orchestrator-cli-cephfs: + +Deploy CephFS +============= + +One or more MDS daemons is required to use the :term:`CephFS` file system. +These are created automatically if the newer ``ceph fs volume`` +interface is used to create a new file system. For more information, +see :ref:`fs-volumes-and-subvolumes`. + +For example: + +.. prompt:: bash # + + ceph fs volume create --placement="" + +where ``fs_name`` is the name of the CephFS and ``placement`` is a +:ref:`orchestrator-cli-placement-spec`. + +For manually deploying MDS daemons, use this specification: + +.. code-block:: yaml + + service_type: mds + service_id: fs_name + placement: + count: 3 + + +The specification can then be applied using: + +.. prompt:: bash # + + ceph orch apply -i mds.yaml + +See :ref:`orchestrator-cli-stateless-services` for manually deploying +MDS daemons on the CLI. + +Further Reading +=============== + +* :ref:`ceph-file-system` + + diff --git a/doc/cephadm/services/mgr.rst b/doc/cephadm/services/mgr.rst new file mode 100644 index 0000000000000..98a54398b18c0 --- /dev/null +++ b/doc/cephadm/services/mgr.rst @@ -0,0 +1,37 @@ +.. _mgr-cephadm-mgr: + +=========== +MGR Service +=========== + +The cephadm MGR service is hosting different modules, like the :ref:`mgr-dashboard` +and the cephadm manager module. + +.. _cephadm-mgr-networks: + +Specifying Networks +------------------- + +The MGR service supports binding only to a specific IP within a network. + +example spec file (leveraging a default placement): + +.. code-block:: yaml + + service_type: mgr + networks: + - 192.169.142.0/24 + +Allow co-location of MGR daemons +================================ + +In deployment scenarios with just a single host, cephadm still needs +to deploy at least two MGR daemons. See ``mgr_standby_modules`` in +the :ref:`mgr-administrator-guide` for further details. + +Further Reading +=============== + +* :ref:`ceph-manager-daemon` +* :ref:`cephadm-manually-deploy-mgr` + diff --git a/doc/cephadm/services/mon.rst b/doc/cephadm/services/mon.rst new file mode 100644 index 0000000000000..6326b73f46d39 --- /dev/null +++ b/doc/cephadm/services/mon.rst @@ -0,0 +1,179 @@ +=========== +MON Service +=========== + +.. _deploy_additional_monitors: + +Deploying additional monitors +============================= + +A typical Ceph cluster has three or five monitor daemons that are spread +across different hosts. We recommend deploying five monitors if there are +five or more nodes in your cluster. + +.. _CIDR: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation + +Ceph deploys monitor daemons automatically as the cluster grows and Ceph +scales back monitor daemons automatically as the cluster shrinks. The +smooth execution of this automatic growing and shrinking depends upon +proper subnet configuration. + +The cephadm bootstrap procedure assigns the first monitor daemon in the +cluster to a particular subnet. ``cephadm`` designates that subnet as the +default subnet of the cluster. New monitor daemons will be assigned by +default to that subnet unless cephadm is instructed to do otherwise. + +If all of the ceph monitor daemons in your cluster are in the same subnet, +manual administration of the ceph monitor daemons is not necessary. +``cephadm`` will automatically add up to five monitors to the subnet, as +needed, as new hosts are added to the cluster. + +By default, cephadm will deploy 5 daemons on arbitrary hosts. See +:ref:`orchestrator-cli-placement-spec` for details of specifying +the placement of daemons. + +Designating a Particular Subnet for Monitors +-------------------------------------------- + +To designate a particular IP subnet for use by ceph monitor daemons, use a +command of the following form, including the subnet's address in `CIDR`_ +format (e.g., ``10.1.2.0/24``): + + .. prompt:: bash # + + ceph config set mon public_network ** + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24 + +Cephadm deploys new monitor daemons only on hosts that have IP addresses in +the designated subnet. + +You can also specify two public networks by using a list of networks: + + .. prompt:: bash # + + ceph config set mon public_network *,* + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24,192.168.0.1/24 + + +Deploying Monitors on a Particular Network +------------------------------------------ + +You can explicitly specify the IP address or CIDR network for each monitor and +control where each monitor is placed. To disable automated monitor deployment, +run this command: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + + To deploy each additional monitor: + + .. prompt:: bash # + + ceph orch daemon add mon * + + For example, to deploy a second monitor on ``newhost1`` using an IP + address ``10.1.2.123`` and a third monitor on ``newhost2`` in + network ``10.1.2.0/24``, run the following commands: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + ceph orch daemon add mon newhost1:10.1.2.123 + ceph orch daemon add mon newhost2:10.1.2.0/24 + + Now, enable automatic placement of Daemons + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run + + See :ref:`orchestrator-cli-placement-spec` for details of specifying + the placement of daemons. + + Finally apply this new placement by dropping ``--dry-run`` + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" + + +Moving Monitors to a Different Network +-------------------------------------- + +To move Monitors to a new network, deploy new monitors on the new network and +subsequently remove monitors from the old network. It is not advised to +modify and inject the ``monmap`` manually. + +First, disable the automated placement of daemons: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + +To deploy each additional monitor: + + .. prompt:: bash # + + ceph orch daemon add mon ** + +For example, to deploy a second monitor on ``newhost1`` using an IP +address ``10.1.2.123`` and a third monitor on ``newhost2`` in +network ``10.1.2.0/24``, run the following commands: + + .. prompt:: bash # + + ceph orch apply mon --unmanaged + ceph orch daemon add mon newhost1:10.1.2.123 + ceph orch daemon add mon newhost2:10.1.2.0/24 + + Subsequently remove monitors from the old network: + + .. prompt:: bash # + + ceph orch daemon rm *mon.* + + Update the ``public_network``: + + .. prompt:: bash # + + ceph config set mon public_network ** + + For example: + + .. prompt:: bash # + + ceph config set mon public_network 10.1.2.0/24 + + Now, enable automatic placement of Daemons + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" --dry-run + + See :ref:`orchestrator-cli-placement-spec` for details of specifying + the placement of daemons. + + Finally apply this new placement by dropping ``--dry-run`` + + .. prompt:: bash # + + ceph orch apply mon --placement="newhost1,newhost2,newhost3" + +Futher Reading +============== + +* :ref:`rados-operations` +* :ref:`rados-troubleshooting-mon` +* :ref:`cephadm-restore-quorum` + diff --git a/doc/cephadm/services/monitoring.rst b/doc/cephadm/services/monitoring.rst new file mode 100644 index 0000000000000..91b8742f3cfb9 --- /dev/null +++ b/doc/cephadm/services/monitoring.rst @@ -0,0 +1,371 @@ +.. _mgr-cephadm-monitoring: + +Monitoring Services +=================== + +Ceph Dashboard uses `Prometheus `_, `Grafana +`_, and related tools to store and visualize detailed +metrics on cluster utilization and performance. Ceph users have three options: + +#. Have cephadm deploy and configure these services. This is the default + when bootstrapping a new cluster unless the ``--skip-monitoring-stack`` + option is used. +#. Deploy and configure these services manually. This is recommended for users + with existing prometheus services in their environment (and in cases where + Ceph is running in Kubernetes with Rook). +#. Skip the monitoring stack completely. Some Ceph dashboard graphs will + not be available. + +The monitoring stack consists of `Prometheus `_, +Prometheus exporters (:ref:`mgr-prometheus`, `Node exporter +`_), `Prometheus Alert +Manager `_ and `Grafana +`_. + +.. note:: + + Prometheus' security model presumes that untrusted users have access to the + Prometheus HTTP endpoint and logs. Untrusted users have access to all the + (meta)data Prometheus collects that is contained in the database, plus a + variety of operational and debugging information. + + However, Prometheus' HTTP API is limited to read-only operations. + Configurations can *not* be changed using the API and secrets are not + exposed. Moreover, Prometheus has some built-in measures to mitigate the + impact of denial of service attacks. + + Please see `Prometheus' Security model + ` for more detailed + information. + +Deploying monitoring with cephadm +--------------------------------- + +The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It +is however possible that you have a Ceph cluster without a monitoring stack, +and you would like to add a monitoring stack to it. (Here are some ways that +you might have come to have a Ceph cluster without a monitoring stack: You +might have passed the ``--skip-monitoring stack`` option to ``cephadm`` during +the installation of the cluster, or you might have converted an existing +cluster (which had no monitoring stack) to cephadm management.) + +To set up monitoring on a Ceph cluster that has no monitoring, follow the +steps below: + +#. Deploy a node-exporter service on every node of the cluster. The node-exporter provides host-level metrics like CPU and memory utilization: + + .. prompt:: bash # + + ceph orch apply node-exporter + +#. Deploy alertmanager: + + .. prompt:: bash # + + ceph orch apply alertmanager + +#. Deploy Prometheus. A single Prometheus instance is sufficient, but + for high availablility (HA) you might want to deploy two: + + .. prompt:: bash # + + ceph orch apply prometheus + + or + + .. prompt:: bash # + + ceph orch apply prometheus --placement 'count:2' + +#. Deploy grafana: + + .. prompt:: bash # + + ceph orch apply grafana + +.. _cephadm-monitoring-networks-ports: + +Networks and Ports +~~~~~~~~~~~~~~~~~~ + +All monitoring services can have the network and port they bind to configured with a yaml service specification + +example spec file: + +.. code-block:: yaml + + service_type: grafana + service_name: grafana + placement: + count: 1 + networks: + - 192.169.142.0/24 + spec: + port: 4200 + +Using custom images +~~~~~~~~~~~~~~~~~~~ + +It is possible to install or upgrade monitoring components based on other +images. To do so, the name of the image to be used needs to be stored in the +configuration first. The following configuration options are available. + +- ``container_image_prometheus`` +- ``container_image_grafana`` +- ``container_image_alertmanager`` +- ``container_image_node_exporter`` + +Custom images can be set with the ``ceph config`` command + +.. code-block:: bash + + ceph config set mgr mgr/cephadm/ + +For example + +.. code-block:: bash + + ceph config set mgr mgr/cephadm/container_image_prometheus prom/prometheus:v1.4.1 + +If there were already running monitoring stack daemon(s) of the type whose +image you've changed, you must redeploy the daemon(s) in order to have them +actually use the new image. + +For example, if you had changed the prometheus image + +.. prompt:: bash # + + ceph orch redeploy prometheus + + +.. note:: + + By setting a custom image, the default value will be overridden (but not + overwritten). The default value changes when updates become available. + By setting a custom image, you will not be able to update the component + you have set the custom image for automatically. You will need to + manually update the configuration (image name and tag) to be able to + install updates. + + If you choose to go with the recommendations instead, you can reset the + custom image you have set before. After that, the default value will be + used again. Use ``ceph config rm`` to reset the configuration option + + .. code-block:: bash + + ceph config rm mgr mgr/cephadm/ + + For example + + .. code-block:: bash + + ceph config rm mgr mgr/cephadm/container_image_prometheus + +Using custom configuration files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By overriding cephadm templates, it is possible to completely customize the +configuration files for monitoring services. + +Internally, cephadm already uses `Jinja2 +`_ templates to generate the +configuration files for all monitoring components. To be able to customize the +configuration of Prometheus, Grafana or the Alertmanager it is possible to store +a Jinja2 template for each service that will be used for configuration +generation instead. This template will be evaluated every time a service of that +kind is deployed or reconfigured. That way, the custom configuration is +preserved and automatically applied on future deployments of these services. + +.. note:: + + The configuration of the custom template is also preserved when the default + configuration of cephadm changes. If the updated configuration is to be used, + the custom template needs to be migrated *manually*. + +Option names +"""""""""""" + +The following templates for files that will be generated by cephadm can be +overridden. These are the names to be used when storing with ``ceph config-key +set``: + +- ``services/alertmanager/alertmanager.yml`` +- ``services/grafana/ceph-dashboard.yml`` +- ``services/grafana/grafana.ini`` +- ``services/prometheus/prometheus.yml`` + +You can look up the file templates that are currently used by cephadm in +``src/pybind/mgr/cephadm/templates``: + +- ``services/alertmanager/alertmanager.yml.j2`` +- ``services/grafana/ceph-dashboard.yml.j2`` +- ``services/grafana/grafana.ini.j2`` +- ``services/prometheus/prometheus.yml.j2`` + +Usage +""""" + +The following command applies a single line value: + +.. code-block:: bash + + ceph config-key set mgr/cephadm/ + +To set contents of files as template use the ``-i`` argument: + +.. code-block:: bash + + ceph config-key set mgr/cephadm/ -i $PWD/ + +.. note:: + + When using files as input to ``config-key`` an absolute path to the file must + be used. + + +Then the configuration file for the service needs to be recreated. +This is done using `reconfig`. For more details see the following example. + +Example +""""""" + +.. code-block:: bash + + # set the contents of ./prometheus.yml.j2 as template + ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \ + -i $PWD/prometheus.yml.j2 + + # reconfig the prometheus service + ceph orch reconfig prometheus + +Deploying monitoring without cephadm +------------------------------------ + +If you have an existing prometheus monitoring infrastructure, or would like +to manage it yourself, you need to configure it to integrate with your Ceph +cluster. + +* Enable the prometheus module in the ceph-mgr daemon + + .. code-block:: bash + + ceph mgr module enable prometheus + + By default, ceph-mgr presents prometheus metrics on port 9283 on each host + running a ceph-mgr daemon. Configure prometheus to scrape these. + +* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`. + +* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`. + +Disabling monitoring +-------------------- + +To disable monitoring and remove the software that supports it, run the following commands: + +.. code-block:: console + + $ ceph orch rm grafana + $ ceph orch rm prometheus --force # this will delete metrics data collected so far + $ ceph orch rm node-exporter + $ ceph orch rm alertmanager + $ ceph mgr module disable prometheus + +See also :ref:`orch-rm`. + +Setting up RBD-Image monitoring +------------------------------- + +Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see +:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana +and the metrics will not be visible in Prometheus. + +Setting up Grafana +------------------ + +Manually setting the Grafana URL +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Cephadm automatically configures Prometheus, Grafana, and Alertmanager in +all cases except one. + +In a some setups, the Dashboard user's browser might not be able to access the +Grafana URL that is configured in Ceph Dashboard. This can happen when the +cluster and the accessing user are in different DNS zones. + +If this is the case, you can use a configuration option for Ceph Dashboard +to set the URL that the user's browser will use to access Grafana. This +value will never be altered by cephadm. To set this configuration option, +issue the following command: + + .. prompt:: bash $ + + ceph dashboard set-grafana-frontend-api-url + +It might take a minute or two for services to be deployed. After the +services have been deployed, you should see something like this when you issue the command ``ceph orch ls``: + +.. code-block:: console + + $ ceph orch ls + NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC + alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present + crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present + grafana 1/1 0s ago docker.io/pcuzner/ceph-grafana-el8:latest f77afcf0bcf6 absent + node-exporter 2/2 6s ago docker.io/prom/node-exporter:latest e5a616e4b9cf present + prometheus 1/1 6s ago docker.io/prom/prometheus:latest e935122ab143 present + +Configuring SSL/TLS for Grafana +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``cephadm`` deploys Grafana using the certificate defined in the ceph +key/value store. If no certificate is specified, ``cephadm`` generates a +self-signed certificate during the deployment of the Grafana service. + +A custom certificate can be configured using the following commands: + +.. prompt:: bash # + + ceph config-key set mgr/cephadm/grafana_key -i $PWD/key.pem + ceph config-key set mgr/cephadm/grafana_crt -i $PWD/certificate.pem + +If you have already deployed Grafana, run ``reconfig`` on the service to +update its configuration: + +.. prompt:: bash # + + ceph orch reconfig grafana + +The ``reconfig`` command also sets the proper URL for Ceph Dashboard. + +Setting up Alertmanager +----------------------- + +Adding Alertmanager webhooks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To add new webhooks to the Alertmanager configuration, add additional +webhook urls like so: + +.. code-block:: yaml + + service_type: alertmanager + spec: + user_data: + default_webhook_urls: + - "https://foo" + - "https://bar" + +Where ``default_webhook_urls`` is a list of additional URLs that are +added to the default receivers' ```` configuration. + +Run ``reconfig`` on the service to update its configuration: + +.. prompt:: bash # + + ceph orch reconfig alertmanager + +Further Reading +--------------- + +* :ref:`mgr-prometheus` diff --git a/doc/cephadm/services/nfs.rst b/doc/cephadm/services/nfs.rst new file mode 100644 index 0000000000000..c48d0f7658f71 --- /dev/null +++ b/doc/cephadm/services/nfs.rst @@ -0,0 +1,120 @@ +.. _deploy-cephadm-nfs-ganesha: + +=========== +NFS Service +=========== + +.. note:: Only the NFSv4 protocol is supported. + +The simplest way to manage NFS is via the ``ceph nfs cluster ...`` +commands; see :ref:`mgr-nfs`. This document covers how to manage the +cephadm services directly, which should only be necessary for unusual NFS +configurations. + +Deploying NFS ganesha +===================== + +Cephadm deploys NFS Ganesha daemon (or set of daemons). The configuration for +NFS is stored in the ``nfs-ganesha`` pool and exports are managed via the +``ceph nfs export ...`` commands and via the dashboard. + +To deploy a NFS Ganesha gateway, run the following command: + +.. prompt:: bash # + + ceph orch apply nfs ** [--port **] [--placement ...] + +For example, to deploy NFS with a service id of *foo* on the default +port 2049 with the default placement of a single daemon: + +.. prompt:: bash # + + ceph orch apply nfs foo + +See :ref:`orchestrator-cli-placement-spec` for the details of the placement +specification. + +Service Specification +===================== + +Alternatively, an NFS service can be applied using a YAML specification. + +.. code-block:: yaml + + service_type: nfs + service_id: mynfs + placement: + hosts: + - host1 + - host2 + spec: + port: 12345 + +In this example, we run the server on the non-default ``port`` of +12345 (instead of the default 2049) on ``host1`` and ``host2``. + +The specification can then be applied by running the following command: + +.. prompt:: bash # + + ceph orch apply -i nfs.yaml + +.. _cephadm-ha-nfs: + +High-availability NFS +===================== + +Deploying an *ingress* service for an existing *nfs* service will provide: + +* a stable, virtual IP that can be used to access the NFS server +* fail-over between hosts if there is a host failure +* load distribution across multiple NFS gateways (although this is rarely necessary) + +Ingress for NFS can be deployed for an existing NFS service +(``nfs.mynfs`` in this example) with the following specification: + +.. code-block:: yaml + + service_type: ingress + service_id: nfs.mynfs + placement: + count: 2 + spec: + backend_service: nfs.mynfs + frontend_port: 2049 + monitor_port: 9000 + virtual_ip: 10.0.0.123/24 + +A few notes: + + * The *virtual_ip* must include a CIDR prefix length, as in the + example above. The virtual IP will normally be configured on the + first identified network interface that has an existing IP in the + same subnet. You can also specify a *virtual_interface_networks* + property to match against IPs in other networks; see + :ref:`ingress-virtual-ip` for more information. + * The *monitor_port* is used to access the haproxy load status + page. The user is ``admin`` by default, but can be modified by + via an *admin* property in the spec. If a password is not + specified via a *password* property in the spec, the auto-generated password + can be found with: + + .. prompt:: bash # + + ceph config-key get mgr/cephadm/ingress.*{svc_id}*/monitor_password + + For example: + + .. prompt:: bash # + + ceph config-key get mgr/cephadm/ingress.nfs.myfoo/monitor_password + + * The backend service (``nfs.mynfs`` in this example) should include + a *port* property that is not 2049 to avoid conflicting with the + ingress service, which could be placed on the same host(s). + +Further Reading +=============== + +* CephFS: :ref:`cephfs-nfs` +* MGR: :ref:`mgr-nfs` diff --git a/doc/cephadm/services/osd.rst b/doc/cephadm/services/osd.rst new file mode 100644 index 0000000000000..4347d05aff5b7 --- /dev/null +++ b/doc/cephadm/services/osd.rst @@ -0,0 +1,890 @@ +*********** +OSD Service +*********** +.. _device management: ../rados/operations/devices +.. _libstoragemgmt: https://github.com/libstorage/libstoragemgmt + +List Devices +============ + +``ceph-volume`` scans each host in the cluster from time to time in order +to determine which devices are present and whether they are eligible to be +used as OSDs. + +To print a list of devices discovered by ``cephadm``, run this command: + +.. prompt:: bash # + + ceph orch device ls [--hostname=...] [--wide] [--refresh] + +Example +:: + + Hostname Path Type Serial Size Health Ident Fault Available + srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Unknown N/A N/A No + srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Unknown N/A N/A No + srv-01 /dev/sdd hdd 15R0A07DFRD6 300G Unknown N/A N/A No + srv-01 /dev/sde hdd 15P0A0QDFRD6 300G Unknown N/A N/A No + srv-02 /dev/sdb hdd 15R0A033FRD6 300G Unknown N/A N/A No + srv-02 /dev/sdc hdd 15R0A05XFRD6 300G Unknown N/A N/A No + srv-02 /dev/sde hdd 15R0A0ANFRD6 300G Unknown N/A N/A No + srv-02 /dev/sdf hdd 15R0A06EFRD6 300G Unknown N/A N/A No + srv-03 /dev/sdb hdd 15R0A0OGFRD6 300G Unknown N/A N/A No + srv-03 /dev/sdc hdd 15R0A0P7FRD6 300G Unknown N/A N/A No + srv-03 /dev/sdd hdd 15R0A0O7FRD6 300G Unknown N/A N/A No + +Using the ``--wide`` option provides all details relating to the device, +including any reasons that the device might not be eligible for use as an OSD. + +In the above example you can see fields named "Health", "Ident", and "Fault". +This information is provided by integration with `libstoragemgmt`_. By default, +this integration is disabled (because `libstoragemgmt`_ may not be 100% +compatible with your hardware). To make ``cephadm`` include these fields, +enable cephadm's "enhanced device scan" option as follows; + +.. prompt:: bash # + + ceph config set mgr mgr/cephadm/device_enhanced_scan true + +.. warning:: + Although the libstoragemgmt library performs standard SCSI inquiry calls, + there is no guarantee that your firmware fully implements these standards. + This can lead to erratic behaviour and even bus resets on some older + hardware. It is therefore recommended that, before enabling this feature, + you test your hardware's compatibility with libstoragemgmt first to avoid + unplanned interruptions to services. + + There are a number of ways to test compatibility, but the simplest may be + to use the cephadm shell to call libstoragemgmt directly - ``cephadm shell + lsmcli ldl``. If your hardware is supported you should see something like + this: + + :: + + Path | SCSI VPD 0x83 | Link Type | Serial Number | Health Status + ---------------------------------------------------------------------------- + /dev/sda | 50000396082ba631 | SAS | 15P0A0R0FRD6 | Good + /dev/sdb | 50000396082bbbf9 | SAS | 15P0A0YFFRD6 | Good + + +After you have enabled libstoragemgmt support, the output will look something +like this: + +:: + + # ceph orch device ls + Hostname Path Type Serial Size Health Ident Fault Available + srv-01 /dev/sdb hdd 15P0A0YFFRD6 300G Good Off Off No + srv-01 /dev/sdc hdd 15R0A08WFRD6 300G Good Off Off No + : + +In this example, libstoragemgmt has confirmed the health of the drives and the ability to +interact with the Identification and Fault LEDs on the drive enclosures. For further +information about interacting with these LEDs, refer to `device management`_. + +.. note:: + The current release of `libstoragemgmt`_ (1.8.8) supports SCSI, SAS, and SATA based + local disks only. There is no official support for NVMe devices (PCIe) + +.. _cephadm-deploy-osds: + +Deploy OSDs +=========== + +Listing Storage Devices +----------------------- + +In order to deploy an OSD, there must be a storage device that is *available* on +which the OSD will be deployed. + +Run this command to display an inventory of storage devices on all cluster hosts: + +.. prompt:: bash # + + ceph orch device ls + +A storage device is considered *available* if all of the following +conditions are met: + +* The device must have no partitions. +* The device must not have any LVM state. +* The device must not be mounted. +* The device must not contain a file system. +* The device must not contain a Ceph BlueStore OSD. +* The device must be larger than 5 GB. + +Ceph will not provision an OSD on a device that is not available. + +Creating New OSDs +----------------- + +There are a few ways to create new OSDs: + +* Tell Ceph to consume any available and unused storage device: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices + +* Create an OSD from a specific device on a specific host: + + .. prompt:: bash # + + ceph orch daemon add osd **:** + + For example: + + .. prompt:: bash # + + ceph orch daemon add osd host1:/dev/sdb + +* You can use :ref:`drivegroups` to categorize device(s) based on their + properties. This might be useful in forming a clearer picture of which + devices are available to consume. Properties include device type (SSD or + HDD), device model names, size, and the hosts on which the devices exist: + + .. prompt:: bash # + + ceph orch apply -i spec.yml + +Dry Run +------- + +The ``--dry-run`` flag causes the orchestrator to present a preview of what +will happen without actually creating the OSDs. + +For example: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices --dry-run + + :: + + NAME HOST DATA DB WAL + all-available-devices node1 /dev/vdb - - + all-available-devices node2 /dev/vdc - - + all-available-devices node3 /dev/vdd - - + +.. _cephadm-osd-declarative: + +Declarative State +----------------- + +The effect of ``ceph orch apply`` is persistent. This means that drives that +are added to the system after the ``ceph orch apply`` command completes will be +automatically found and added to the cluster. It also means that drives that +become available (by zapping, for example) after the ``ceph orch apply`` +command completes will be automatically found and added to the cluster. + +We will examine the effects of the following command: + + .. prompt:: bash # + + ceph orch apply osd --all-available-devices + +After running the above command: + +* If you add new disks to the cluster, they will automatically be used to + create new OSDs. +* If you remove an OSD and clean the LVM physical volume, a new OSD will be + created automatically. + +To disable the automatic creation of OSD on available devices, use the +``unmanaged`` parameter: + +If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the ``unmanaged`` parameter: + +.. prompt:: bash # + + ceph orch apply osd --all-available-devices --unmanaged=true + +.. note:: + + Keep these three facts in mind: + + - The default behavior of ``ceph orch apply`` causes cephadm constantly to reconcile. This means that cephadm creates OSDs as soon as new drives are detected. + + - Setting ``unmanaged: True`` disables the creation of OSDs. If ``unmanaged: True`` is set, nothing will happen even if you apply a new OSD service. + + - ``ceph orch daemon add`` creates OSDs, but does not add an OSD service. + +* For cephadm, see also :ref:`cephadm-spec-unmanaged`. + +.. _cephadm-osd-removal: + +Remove an OSD +============= + +Removing an OSD from a cluster involves two steps: + +#. evacuating all placement groups (PGs) from the cluster +#. removing the PG-free OSD from the cluster + +The following command performs these two steps: + +.. prompt:: bash # + + ceph orch osd rm [--replace] [--force] + +Example: + +.. prompt:: bash # + + ceph orch osd rm 0 + +Expected output:: + + Scheduled OSD(s) for removal + +OSDs that are not safe to destroy will be rejected. + +Monitoring OSD State +-------------------- + +You can query the state of OSD operation with the following command: + +.. prompt:: bash # + + ceph orch osd rm status + +Expected output:: + + OSD_ID HOST STATE PG_COUNT REPLACE FORCE STARTED_AT + 2 cephadm-dev done, waiting for purge 0 True False 2020-07-17 13:01:43.147684 + 3 cephadm-dev draining 17 False True 2020-07-17 13:01:45.162158 + 4 cephadm-dev started 42 False True 2020-07-17 13:01:45.162158 + + +When no PGs are left on the OSD, it will be decommissioned and removed from the cluster. + +.. note:: + After removing an OSD, if you wipe the LVM physical volume in the device used by the removed OSD, a new OSD will be created. + For more information on this, read about the ``unmanaged`` parameter in :ref:`cephadm-osd-declarative`. + +Stopping OSD Removal +-------------------- + +It is possible to stop queued OSD removals by using the following command: + +.. prompt:: bash # + + ceph orch osd rm stop + +Example: + +.. prompt:: bash # + + ceph orch osd rm stop 4 + +Expected output:: + + Stopped OSD(s) removal + +This resets the initial state of the OSD and takes it off the removal queue. + + +Replacing an OSD +---------------- + +.. prompt:: bash # + + orch osd rm --replace [--force] + +Example: + +.. prompt:: bash # + + ceph orch osd rm 4 --replace + +Expected output:: + + Scheduled OSD(s) for replacement + +This follows the same procedure as the procedure in the "Remove OSD" section, with +one exception: the OSD is not permanently removed from the CRUSH hierarchy, but is +instead assigned a 'destroyed' flag. + +**Preserving the OSD ID** + +The 'destroyed' flag is used to determine which OSD ids will be reused in the +next OSD deployment. + +If you use OSDSpecs for OSD deployment, your newly added disks will be assigned +the OSD ids of their replaced counterparts. This assumes that the new disks +still match the OSDSpecs. + +Use the ``--dry-run`` flag to make certain that the ``ceph orch apply osd`` +command does what you want it to. The ``--dry-run`` flag shows you what the +outcome of the command will be without making the changes you specify. When +you are satisfied that the command will do what you want, run the command +without the ``--dry-run`` flag. + +.. tip:: + + The name of your OSDSpec can be retrieved with the command ``ceph orch ls`` + +Alternatively, you can use your OSDSpec file: + +.. prompt:: bash # + + ceph orch apply osd -i --dry-run + +Expected output:: + + NAME HOST DATA DB WAL + node1 /dev/vdb - - + + +When this output reflects your intention, omit the ``--dry-run`` flag to +execute the deployment. + + +Erasing Devices (Zapping Devices) +--------------------------------- + +Erase (zap) a device so that it can be reused. ``zap`` calls ``ceph-volume +zap`` on the remote host. + +.. prompt:: bash # + + ceph orch device zap + +Example command: + +.. prompt:: bash # + + ceph orch device zap my_hostname /dev/sdx + +.. note:: + If the unmanaged flag is unset, cephadm automatically deploys drives that + match the DriveGroup in your OSDSpec. For example, if you use the + ``all-available-devices`` option when creating OSDs, when you ``zap`` a + device the cephadm orchestrator automatically creates a new OSD in the + device. To disable this behavior, see :ref:`cephadm-osd-declarative`. + + +.. _osd_autotune: + +Automatically tuning OSD memory +=============================== + +OSD daemons will adjust their memory consumption based on the +``osd_memory_target`` config option (several gigabytes, by +default). If Ceph is deployed on dedicated nodes that are not sharing +memory with other services, cephadm can automatically adjust the per-OSD +memory consumption based on the total amount of RAM and the number of deployed +OSDs. + +This option is enabled globally with:: + + ceph config set osd osd_memory_target_autotune true + +Cephadm will start with a fraction +(``mgr/cephadm/autotune_memory_target_ratio``, which defaults to +``.7``) of the total RAM in the system, subtract off any memory +consumed by non-autotuned daemons (non-OSDs, for OSDs for which +``osd_memory_target_autotune`` is false), and then divide by the +remaining OSDs. + +The final targets are reflected in the config database with options like:: + + WHO MASK LEVEL OPTION VALUE + osd host:foo basic osd_memory_target 126092301926 + osd host:bar basic osd_memory_target 6442450944 + +Both the limits and the current memory consumed by each daemon are visible from +the ``ceph orch ps`` output in the ``MEM LIMIT`` column:: + + NAME HOST PORTS STATUS REFRESHED AGE MEM USED MEM LIMIT VERSION IMAGE ID CONTAINER ID + osd.1 dael running (3h) 10s ago 3h 72857k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 9e183363d39c + osd.2 dael running (81m) 10s ago 81m 63989k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 1f0cc479b051 + osd.3 dael running (62m) 10s ago 62m 64071k 117.4G 17.0.0-3781-gafaed750 7015fda3cd67 ac5537492f27 + +To exclude an OSD from memory autotuning, disable the autotune option +for that OSD and also set a specific memory target. For example, + + .. prompt:: bash # + + ceph config set osd.123 osd_memory_target_autotune false + ceph config set osd.123 osd_memory_target 16G + + +.. _drivegroups: + +Advanced OSD Service Specifications +=================================== + +:ref:`orchestrator-cli-service-spec`\s of type ``osd`` are a way to describe a +cluster layout, using the properties of disks. Service specifications give the +user an abstract way to tell Ceph which disks should turn into OSDs with which +configurations, without knowing the specifics of device names and paths. + +Service specifications make it possible to define a yaml or json file that can +be used to reduce the amount of manual work involved in creating OSDs. + +For example, instead of running the following command: + +.. prompt:: bash [monitor.1]# + + ceph orch daemon add osd **:** + +for each device and each host, we can define a yaml or json file that allows us +to describe the layout. Here's the most basic example. + +Create a file called (for example) ``osd_spec.yml``: + +.. code-block:: yaml + + service_type: osd + service_id: default_drive_group <- name of the drive_group (name can be custom) + placement: + host_pattern: '*' <- which hosts to target, currently only supports globs + data_devices: <- the type of devices you are applying specs to + all: true <- a filter, check below for a full list + +This means : + +#. Turn any available device (ceph-volume decides what 'available' is) into an + OSD on all hosts that match the glob pattern '*'. (The glob pattern matches + against the registered hosts from `host ls`) A more detailed section on + host_pattern is available below. + +#. Then pass it to `osd create` like this: + + .. prompt:: bash [monitor.1]# + + ceph orch apply osd -i /path/to/osd_spec.yml + + This instruction will be issued to all the matching hosts, and will deploy + these OSDs. + + Setups more complex than the one specified by the ``all`` filter are + possible. See :ref:`osd_filters` for details. + + A ``--dry-run`` flag can be passed to the ``apply osd`` command to display a + synopsis of the proposed layout. + +Example + +.. prompt:: bash [monitor.1]# + + ceph orch apply osd -i /path/to/osd_spec.yml --dry-run + + + +.. _osd_filters: + +Filters +------- + +.. note:: + Filters are applied using an `AND` gate by default. This means that a drive + must fulfill all filter criteria in order to get selected. This behavior can + be adjusted by setting ``filter_logic: OR`` in the OSD specification. + +Filters are used to assign disks to groups, using their attributes to group +them. + +The attributes are based off of ceph-volume's disk query. You can retrieve +information about the attributes with this command: + +.. code-block:: bash + + ceph-volume inventory + +Vendor or Model +^^^^^^^^^^^^^^^ + +Specific disks can be targeted by vendor or model: + +.. code-block:: yaml + + model: disk_model_name + +or + +.. code-block:: yaml + + vendor: disk_vendor_name + + +Size +^^^^ + +Specific disks can be targeted by `Size`: + +.. code-block:: yaml + + size: size_spec + +Size specs +__________ + +Size specifications can be of the following forms: + +* LOW:HIGH +* :HIGH +* LOW: +* EXACT + +Concrete examples: + +To include disks of an exact size + +.. code-block:: yaml + + size: '10G' + +To include disks within a given range of size: + +.. code-block:: yaml + + size: '10G:40G' + +To include disks that are less than or equal to 10G in size: + +.. code-block:: yaml + + size: ':10G' + +To include disks equal to or greater than 40G in size: + +.. code-block:: yaml + + size: '40G:' + +Sizes don't have to be specified exclusively in Gigabytes(G). + +Other units of size are supported: Megabyte(M), Gigabyte(G) and Terrabyte(T). +Appending the (B) for byte is also supported: ``MB``, ``GB``, ``TB``. + + +Rotational +^^^^^^^^^^ + +This operates on the 'rotational' attribute of the disk. + +.. code-block:: yaml + + rotational: 0 | 1 + +`1` to match all disks that are rotational + +`0` to match all disks that are non-rotational (SSD, NVME etc) + + +All +^^^ + +This will take all disks that are 'available' + +Note: This is exclusive for the data_devices section. + +.. code-block:: yaml + + all: true + + +Limiter +^^^^^^^ + +If you have specified some valid filters but want to limit the number of disks that they match, use the ``limit`` directive: + +.. code-block:: yaml + + limit: 2 + +For example, if you used `vendor` to match all disks that are from `VendorA` +but want to use only the first two, you could use `limit`: + +.. code-block:: yaml + + data_devices: + vendor: VendorA + limit: 2 + +Note: `limit` is a last resort and shouldn't be used if it can be avoided. + + +Additional Options +------------------ + +There are multiple optional settings you can use to change the way OSDs are deployed. +You can add these options to the base level of a DriveGroup for it to take effect. + +This example would deploy all OSDs with encryption enabled. + +.. code-block:: yaml + + service_type: osd + service_id: example_osd_spec + placement: + host_pattern: '*' + data_devices: + all: true + encrypted: true + +See a full list in the DriveGroupSpecs + +.. py:currentmodule:: ceph.deployment.drive_group + +.. autoclass:: DriveGroupSpec + :members: + :exclude-members: from_json + +Examples +-------- + +The simple case +^^^^^^^^^^^^^^^ + +All nodes with the same setup + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: HDD-123-foo + Size: 4TB + + 2 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + +This is a common setup and can be described quite easily: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + data_devices: + model: HDD-123-foo <- note that HDD-123 would also be valid + db_devices: + model: MC-55-44-XZ <- same here, MC-55-44 is valid + +However, we can improve it by reducing the filters on core properties of the drives: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + data_devices: + rotational: 1 + db_devices: + rotational: 0 + +Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db) + +If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size: + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + data_devices: + size: '2TB:' + db_devices: + size: ':2TB' + +Note: All of the above DriveGroups are equally valid. Which of those you want to use depends on taste and on how much you expect your node layout to change. + + +The advanced case +^^^^^^^^^^^^^^^^^ + +Here we have two distinct setups + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: HDD-123-foo + Size: 4TB + + 12 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + + 2 NVMEs + Vendor: VendorC + Model: NVME-QQQQ-987 + Size: 256GB + + +* 20 HDDs should share 2 SSDs +* 10 SSDs should share 2 NVMes + +This can be described with two layouts. + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_hdd + placement: + host_pattern: '*' + data_devices: + rotational: 0 + db_devices: + model: MC-55-44-XZ + limit: 2 (db_slots is actually to be favoured here, but it's not implemented yet) + --- + service_type: osd + service_id: osd_spec_ssd + placement: + host_pattern: '*' + data_devices: + model: MC-55-44-XZ + db_devices: + vendor: VendorC + +This would create the desired layout by using all HDDs as data_devices with two SSD assigned as dedicated db/wal devices. +The remaining SSDs(8) will be data_devices that have the 'VendorC' NVMEs assigned as dedicated db/wal devices. + +The advanced case (with non-uniform nodes) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The examples above assumed that all nodes have the same drives. That's however not always the case. + +Node1-5 + +.. code-block:: none + + 20 HDDs + Vendor: Intel + Model: SSD-123-foo + Size: 4TB + 2 SSDs + Vendor: VendorA + Model: MC-55-44-ZX + Size: 512GB + +Node6-10 + +.. code-block:: none + + 5 NVMEs + Vendor: Intel + Model: SSD-123-foo + Size: 4TB + 20 SSDs + Vendor: VendorA + Model: MC-55-44-ZX + Size: 512GB + +You can use the 'host_pattern' key in the layout to target certain nodes. Salt target notation helps to keep things easy. + + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_node_one_to_five + placement: + host_pattern: 'node[1-5]' + data_devices: + rotational: 1 + db_devices: + rotational: 0 + --- + service_type: osd + service_id: osd_spec_six_to_ten + placement: + host_pattern: 'node[6-10]' + data_devices: + model: MC-55-44-XZ + db_devices: + model: SSD-123-foo + +This applies different OSD specs to different hosts depending on the `host_pattern` key. + +Dedicated wal + db +^^^^^^^^^^^^^^^^^^ + +All previous cases co-located the WALs with the DBs. +It's however possible to deploy the WAL on a dedicated device as well, if it makes sense. + +.. code-block:: none + + 20 HDDs + Vendor: VendorA + Model: SSD-123-foo + Size: 4TB + + 2 SSDs + Vendor: VendorB + Model: MC-55-44-ZX + Size: 512GB + + 2 NVMEs + Vendor: VendorC + Model: NVME-QQQQ-987 + Size: 256GB + + +The OSD spec for this case would look like the following (using the `model` filter): + +.. code-block:: yaml + + service_type: osd + service_id: osd_spec_default + placement: + host_pattern: '*' + data_devices: + model: MC-55-44-XZ + db_devices: + model: SSD-123-foo + wal_devices: + model: NVME-QQQQ-987 + + +It is also possible to specify directly device paths in specific hosts like the following: + +.. code-block:: yaml + + service_type: osd + service_id: osd_using_paths + placement: + hosts: + - Node01 + - Node02 + data_devices: + paths: + - /dev/sdb + db_devices: + paths: + - /dev/sdc + wal_devices: + paths: + - /dev/sdd + + +This can easily be done with other filters, like `size` or `vendor` as well. + +.. _cephadm-osd-activate: + +Activate existing OSDs +====================== + +In case the OS of a host was reinstalled, existing OSDs need to be activated +again. For this use case, cephadm provides a wrapper for :ref:`ceph-volume-lvm-activate` that +activates all existing OSDs on a host. + +.. prompt:: bash # + + ceph cephadm osd activate ... + +This will scan all existing disks for OSDs and deploy corresponding daemons. + +Futher Reading +============== + +* :ref:`ceph-volume` +* :ref:`rados-index` diff --git a/doc/cephadm/services/rgw.rst b/doc/cephadm/services/rgw.rst new file mode 100644 index 0000000000000..2914042cc375b --- /dev/null +++ b/doc/cephadm/services/rgw.rst @@ -0,0 +1,281 @@ +=========== +RGW Service +=========== + +.. _cephadm-deploy-rgw: + +Deploy RGWs +=========== + +Cephadm deploys radosgw as a collection of daemons that manage a +single-cluster deployment or a particular *realm* and *zone* in a +multisite deployment. (For more information about realms and zones, +see :ref:`multisite`.) + +Note that with cephadm, radosgw daemons are configured via the monitor +configuration database instead of via a `ceph.conf` or the command line. If +that configuration isn't already in place (usually in the +``client.rgw.`` section), then the radosgw +daemons will start up with default settings (e.g., binding to port +80). + +To deploy a set of radosgw daemons, with an arbitrary service name +*name*, run the following command: + +.. prompt:: bash # + + ceph orch apply rgw ** [--realm=**] [--zone=**] --placement="** [** ...]" + +Trivial setup +------------- + +For example, to deploy 2 RGW daemons (the default) for a single-cluster RGW deployment +under the arbitrary service id *foo*: + +.. prompt:: bash # + + ceph orch apply rgw foo + +Designated gateways +------------------- + +A common scenario is to have a labeled set of hosts that will act +as gateways, with multiple instances of radosgw running on consecutive +ports 8000 and 8001: + +.. prompt:: bash # + + ceph orch host label add gwhost1 rgw # the 'rgw' label can be anything + ceph orch host label add gwhost2 rgw + ceph orch apply rgw foo '--placement=label:rgw count-per-host:2' --port=8000 + +.. _cephadm-rgw-networks: + +Specifying Networks +------------------- + +The RGW service can have the network they bind to configured with a yaml service specification. + +example spec file: + +.. code-block:: yaml + + service_type: rgw + service_name: foo + placement: + label: rgw + count-per-host: 2 + networks: + - 192.169.142.0/24 + spec: + port: 8000 + + +Multisite zones +--------------- + +To deploy RGWs serving the multisite *myorg* realm and the *us-east-1* zone on +*myhost1* and *myhost2*: + +.. prompt:: bash # + + ceph orch apply rgw east --realm=myorg --zone=us-east-1 --placement="2 myhost1 myhost2" + +Note that in a multisite situation, cephadm only deploys the daemons. It does not create +or update the realm or zone configurations. To create a new realm and zone, you need to do +something like: + +.. prompt:: bash # + + radosgw-admin realm create --rgw-realm= --default + +.. prompt:: bash # + + radosgw-admin zonegroup create --rgw-zonegroup= --master --default + +.. prompt:: bash # + + radosgw-admin zone create --rgw-zonegroup= --rgw-zone= --master --default + +.. prompt:: bash # + + radosgw-admin period update --rgw-realm= --commit + +See :ref:`orchestrator-cli-placement-spec` for details of the placement +specification. See :ref:`multisite` for more information of setting up multisite RGW. + +See also :ref:`multisite`. + +Setting up HTTPS +---------------- + +In order to enable HTTPS for RGW services, apply a spec file following this scheme: + +.. code-block:: yaml + + service_type: rgw + service_id: myrgw + spec: + rgw_frontend_ssl_certificate: | + -----BEGIN PRIVATE KEY----- + V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt + ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 + IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu + YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg + ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= + -----END PRIVATE KEY----- + -----BEGIN CERTIFICATE----- + V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4gTG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFt + ZXQsIGNvbnNldGV0dXIgc2FkaXBzY2luZyBlbGl0ciwgc2VkIGRpYW0gbm9udW15 + IGVpcm1vZCB0ZW1wb3IgaW52aWR1bnQgdXQgbGFib3JlIGV0IGRvbG9yZSBtYWdu + YSBhbGlxdXlhbSBlcmF0LCBzZWQgZGlhbSB2b2x1cHR1YS4gQXQgdmVybyBlb3Mg + ZXQgYWNjdXNhbSBldCBqdXN0byBkdW8= + -----END CERTIFICATE----- + ssl: true + +Then apply this yaml document: + +.. prompt:: bash # + + ceph orch apply -i myrgw.yaml + +Note the value of ``rgw_frontend_ssl_certificate`` is a literal string as +indicated by a ``|`` character preserving newline characters. + +.. _orchestrator-haproxy-service-spec: + +High availability service for RGW +================================= + +The *ingress* service allows you to create a high availability endpoint +for RGW with a minumum set of configuration options. The orchestrator will +deploy and manage a combination of haproxy and keepalived to provide load +balancing on a floating virtual IP. + +If SSL is used, then SSL must be configured and terminated by the ingress service +and not RGW itself. + +.. image:: ../../images/HAProxy_for_RGW.svg + +There are N hosts where the ingress service is deployed. Each host +has a haproxy daemon and a keepalived daemon. A virtual IP is +automatically configured on only one of these hosts at a time. + +Each keepalived daemon checks every few seconds whether the haproxy +daemon on the same host is responding. Keepalived will also check +that the master keepalived daemon is running without problems. If the +"master" keepalived daemon or the active haproxy is not responding, +one of the remaining keepalived daemons running in backup mode will be +elected as master, and the virtual IP will be moved to that node. + +The active haproxy acts like a load balancer, distributing all RGW requests +between all the RGW daemons available. + +Prerequisites +------------- + +* An existing RGW service, without SSL. (If you want SSL service, the certificate + should be configured on the ingress service, not the RGW service.) + +Deploying +--------- + +Use the command:: + + ceph orch apply -i + +Service specification +--------------------- + +It is a yaml format file with the following properties: + +.. code-block:: yaml + + service_type: ingress + service_id: rgw.something # adjust to match your existing RGW service + placement: + hosts: + - host1 + - host2 + - host3 + spec: + backend_service: rgw.something # adjust to match your existing RGW service + virtual_ip: / # ex: 192.168.20.1/24 + frontend_port: # ex: 8080 + monitor_port: # ex: 1967, used by haproxy for load balancer status + virtual_interface_networks: [ ... ] # optional: list of CIDR networks + ssl_cert: | # optional: SSL certificate and key + -----BEGIN CERTIFICATE----- + ... + -----END CERTIFICATE----- + -----BEGIN PRIVATE KEY----- + ... + -----END PRIVATE KEY----- + +where the properties of this service specification are: + +* ``service_type`` + Mandatory and set to "ingress" +* ``service_id`` + The name of the service. We suggest naming this after the service you are + controlling ingress for (e.g., ``rgw.foo``). +* ``placement hosts`` + The hosts where it is desired to run the HA daemons. An haproxy and a + keepalived container will be deployed on these hosts. These hosts do not need + to match the nodes where RGW is deployed. +* ``virtual_ip`` + The virtual IP (and network) in CIDR format where the ingress service will be available. +* ``virtual_interface_networks`` + A list of networks to identify which ethernet interface to use for the virtual IP. +* ``frontend_port`` + The port used to access the ingress service. +* ``ssl_cert``: + SSL certificate, if SSL is to be enabled. This must contain the both the certificate and + private key blocks in .pem format. + +.. _ingress-virtual-ip: + +Selecting ethernet interfaces for the virtual IP +------------------------------------------------ + +You cannot simply provide the name of the network interface on which +to configure the virtual IP because interface names tend to vary +across hosts (and/or reboots). Instead, cephadm will select +interfaces based on other existing IP addresses that are already +configured. + +Normally, the virtual IP will be configured on the first network +interface that has an existing IP in the same subnet. For example, if +the virtual IP is 192.168.0.80/24 and eth2 has the static IP +192.168.0.40/24, cephadm will use eth2. + +In some cases, the virtual IP may not belong to the same subnet as an existing static +IP. In such cases, you can provide a list of subnets to match against existing IPs, +and cephadm will put the virtual IP on the first network interface to match. For example, +if the virtual IP is 192.168.0.80/24 and we want it on the same interface as the machine's +static IP in 10.10.0.0/16, you can use a spec like:: + + service_type: ingress + service_id: rgw.something + spec: + virtual_ip: 192.168.0.80/24 + virtual_interface_networks: + - 10.10.0.0/16 + ... + +A consequence of this strategy is that you cannot currently configure the virtual IP +on an interface that has no existing IP address. In this situation, we suggest +configuring a "dummy" IP address is an unroutable network on the correct interface +and reference that dummy network in the networks list (see above). + + +Useful hints for ingress +------------------------ + +* It is good to have at least 3 RGW daemons. +* We recommend at least 3 hosts for the ingress service. + +Further Reading +=============== + +* :ref:`object-gateway`