From: Sage Weil Date: Sun, 2 Dec 2018 22:43:43 +0000 (-0600) Subject: doc/rados/operations: document autoscaler and its health warnings X-Git-Tag: v14.1.0~582^2~7 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=f490fd0130e170e7f26ef24118bd1569e8404377;p=ceph.git doc/rados/operations: document autoscaler and its health warnings Signed-off-by: Sage Weil --- diff --git a/doc/rados/operations/health-checks.rst b/doc/rados/operations/health-checks.rst index 1fdd72badd34..d43941df8de5 100644 --- a/doc/rados/operations/health-checks.rst +++ b/doc/rados/operations/health-checks.rst @@ -420,7 +420,7 @@ ___________ The number of PGs in use in the cluster is below the configurable threshold of ``mon_pg_warn_min_per_osd`` PGs per OSD. This can lead to suboptimal distribution and balance of data across the OSDs in -the cluster, and similar reduce overall performance. +the cluster, and similarly reduce overall performance. This may be an expected condition if data pools have not yet been created. @@ -429,6 +429,33 @@ The PG count for existing pools can be increased or new pools can be created. Please refer to :ref:`choosing-number-of-placement-groups` for more information. +POOL_TOO_FEW_PGS +________________ + +One or more pools should probably have more PGs, based on the amount +of data that is currently stored in the pool. This can lead to +suboptimal distribution and balance of data across the OSDs in the +cluster, and similarly reduce overall performance. This warning is +generated if the ``pg_autoscale_mode`` property on the pool is set to +``warn``. + +To disable the warning, you can disable auto-scaling of PGs for the +pool entirely with:: + + ceph osd pool set pg_autoscale_mode off + +To allow the cluster to automatically adjust the number of PGs,:: + + ceph osd pool set pg_autoscale_mode on + +You can also manually set the number of PGs for the pool to the +recommended amount with:: + + ceph osd pool set pg_num + +Please refer to :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler` for more information. + TOO_MANY_PGS ____________ @@ -451,6 +478,63 @@ so marking "out" OSDs "in" (if there are any) can also help:: Please refer to :ref:`choosing-number-of-placement-groups` for more information. +POOL_TOO_MANY_PGS +_________________ + +One or more pools should probably have more PGs, based on the amount +of data that is currently stored in the pool. This can lead to higher +memory utilization for OSD daemons, slower peering after cluster state +changes (like OSD restarts, additions, or removals), and higher load +on the Manager and Monitor daemons. This warning is generated if the +``pg_autoscale_mode`` property on the pool is set to ``warn``. + +To disable the warning, you can disable auto-scaling of PGs for the +pool entirely with:: + + ceph osd pool set pg_autoscale_mode off + +To allow the cluster to automatically adjust the number of PGs,:: + + ceph osd pool set pg_autoscale_mode on + +You can also manually set the number of PGs for the pool to the +recommended amount with:: + + ceph osd pool set pg_num + +Please refer to :ref:`choosing-number-of-placement-groups` and +:ref:`pg-autoscaler` for more information. + +POOL_TARGET_SIZE_RATIO_OVERCOMMITTED +____________________________________ + +One or more pools have a ``target_size_ratio`` property set to +estimate the expected size of the pool as a fraction of total storage, +but the value(s) exceed the total available storage (either by +themselves or in combination with other pools' actual usage). + +This is usually an indication that the ``target_size_ratio`` value for +the pool is too large and should be reduced or set to zero with:: + + ceph osd pool set target_size_ratio 0 + +For more information, see :ref:`specifying_pool_target_size`. + +POOL_TARGET_SIZE_BYTES_OVERCOMMITTED +____________________________________ + +One or more pools have a ``target_size_bytes`` property set to +estimate the expected size of the pool, +but the value(s) exceed the total available storage (either by +themselves or in combination with other pools' actual usage). + +This is usually an indication that the ``target_size_bytes`` value for +the pool is too large and should be reduced or set to zero with:: + + ceph osd pool set target_size_bytes 0 + +For more information, see :ref:`specifying_pool_target_size`. + SMALLER_PGP_NUM _______________ @@ -483,6 +567,7 @@ not contain as much data have too many PGs. See the discussion of The threshold can be raised to silence the health warning by adjusting the ``mon_pg_warn_max_object_skew`` config option on the monitors. + POOL_APP_NOT_ENABLED ____________________ diff --git a/doc/rados/operations/placement-groups.rst b/doc/rados/operations/placement-groups.rst index de82a9f65f34..f1f3c9838d3f 100644 --- a/doc/rados/operations/placement-groups.rst +++ b/doc/rados/operations/placement-groups.rst @@ -2,6 +2,164 @@ Placement Groups ================== +.. _pg-autoscaler: + +Autoscaling placement groups +============================ + +Placement groups (PGs) are an internal implementation detail of how +Ceph distributes data. You can allow the cluster to either make +recommendations or automatically tune PGs based on how the cluster is +used by enabling *pg-autoscaling*. + +Each pool in the system has a ``pg_autoscale_mode`` property that can be set to ``off``, ``on``, or ``warn``. + +* ``off``: Disable autoscaling for this pool. It is up to the administrator to choose an appropriate PG number for each pool. Please refer to :ref:`choosing-number-of-placement-groups` for more information. +* ``on``: Enable automated adjustments of the PG count for the given pool. +* ``warn``: Raise health alerts when the PG count should be adjusted + +To set the autoscaling mode for existing pools,:: + + ceph osd pool set pg_autoscale_mode + +For example to enable autoscaling on pool ``foo``,:: + + ceph osd pool set foo pg_autoscale_mode on + +You can also configure the default ``pg_autoscale_mode`` that is +applied to any pools that are created in the future with:: + + ceph config set global osd_pool_default_autoscale_mode + +Viewing PG scaling recommendations +---------------------------------- + +You can view each pool, its relative utilization, and any suggested changes to +the PG count with this command:: + + ceph osd pool autoscale-status + +Output will be something like:: + + POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALE + a 12900M 3.0 82431M 0.4695 8 128 warn + c 0 3.0 82431M 0.0000 0.2000 1 64 warn + b 0 953.6M 3.0 82431M 0.0347 8 warn + +**SIZE** is the amount of data stored in the pool. **TARGET SIZE**, if +present, is the amount of data the administrator has specified that +they expect to eventually be stored in this pool. The system uses +the larger of the two values for its calculation. + +**RATE** is the multiplier for the pool that determines how much raw +storage capacity is consumed. For example, a 3 replica pool will +have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a +ratio of 1.5. + +**RAW CAPACITY** is the total amount of raw storage capacity on the +OSDs that are responsible for storing this pool's (and perhaps other +pools') data. **RATIO** is the ratio of that total capacity that +this pool is consuming (i.e., ratio = size * rate / raw capacity). + +**TARGET RATIO**, if present, is the ratio of storage that the +administrator has specified that they expect this pool to consume. +The system uses the larger of the actual ratio and the target ratio +for its calculation. If both target size bytes and ratio are specified, the +ratio takes precedence. + +**PG_NUM** is the current number of PGs for the pool (or the current +number of PGs that the pool is working towards, if a ``pg_num`` +change is in progress). **NEW PG_NUM**, if present, is what the +system believes the pool's ``pg_num`` should be changed to. It is +always a power of 2, and will only be present if the "ideal" value +varies from the current value by more than a factor of 3. + +The final column, **AUTOSCALE**, is the pool ``pg_autoscale_mode``, +and will be either ``on``, ``off``, or ``warn``. + + +Automated scaling +----------------- + +Allowing the cluster to automatically scale PGs based on usage is the +simplest approach. Ceph will look at the total available storage and +target number of PGs for the whole system, look at how much data is +stored in each pool, and try to apportion the PGs accordingly. The +system is relatively conservative with its approach, only making +changes to a pool when the current number of PGs (``pg_num``) is more +than 3 times off from what it thinks it should be. + +The target number of PGs per OSD is based on the +``mon_target_pg_per_osd`` configurable (default: 100), which can be +adjusted with:: + + ceph config set global mon_target_pg_per_osd 100 + +The autoscaler analyzes pools and adjusts on a per-subtree basis. +Because each pool may map to a different CRUSH rule, and each rule may +distribute data across different devices, Ceph will consider +utilization of each subtree of the hierarchy independently. For +example, a pool that maps to OSDs of class `ssd` and a pool that maps +to OSDs of class `hdd` will each have optimal PG counts that depend on +the number of those respective device types. + + +.. _specifying_pool_target_size: + +Specifying expected pool size +----------------------------- + +When a cluster or pool is first created, it will consume a small +fraction of the total cluster capacity and will appear to the system +as if it should only need a small number of placement groups. +However, in most cases cluster administrators have a good idea which +pools are expected to consume most of the system capacity over time. +By providing this information to Ceph, a more appropriate number of +PGs can be used from the beginning, preventing subsequent changes in +``pg_num`` and the overhead associated with moving data around when +those adjustments are made. + +The *target size** of a pool can be specified in two ways: either in +terms of the absolute size of the pool (i.e., bytes), or as a ratio of +the total cluster capacity. + +For example,:: + + ceph osd pool set mypool target_size_bytes 100T + +will tell the system that `mypool` is expected to consume 100 TiB of +space. Alternatively,:: + + ceph osd pool set mypool target_size_ratio .9 + +will tell the system that `mypool` is expected to consume 90% of the +total cluster capacity. + +You can also set the target size of a pool at creation time with the optional ``--target-size-bytes `` or ``--target-size-ratio `` arguments to the ``ceph osd pool create`` command. + +Note that if impossible target size values are specified (for example, +a capacity larger than the total cluster, or ratio(s) that sum to more +than 1.0) then a health warning +(``POOL_TARET_SIZE_RATIO_OVERCOMMITTED`` or +``POOL_TARGET_SIZE_BYTES_OVERCOMMITTED``) will be raised. + +Specifying bounds on a pool's PGs +--------------------------------- + +It is also possible to specify a minimum number of PGs for a pool. +This is useful for establishing a lower bound on the amount of +parallelism client will see when doing IO, even when a pool is mostly +empty. Setting the lower bound prevents Ceph from reducing (or +recommending you reduce) the PG number below the configured number. + +You can set the minimum number of PGs for a pool with:: + + ceph osd pool set pg_num_min + +You can also specify the minimum PG count at pool creation time with +the optional ``--pg-num-min `` argument to the ``ceph osd pool +create`` command. + .. _preselection: A preselection of pg_num @@ -255,6 +413,8 @@ resources. Choosing the number of Placement Groups ======================================= +.. note: It is rarely necessary to do this math by hand. Instead, use the ``ceph osd pool autoscale-status`` command in combination with the ``target_size_bytes`` or ``target_size_ratio`` pool properties. See :ref:`pg-autoscaler` for more information. + If you have more than 50 OSDs, we recommend approximately 50-100 placement groups per OSD to balance out resource usage, data durability and distribution. If you have less than 50 OSDs, choosing