From: Laura Flores Date: Thu, 16 Oct 2025 18:32:10 +0000 (-0500) Subject: doc/releases: add more to mgr section and fix rados formatting X-Git-Tag: v21.0.0~27^2~2^2~4^2~20 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=6cb65f929678f499cc0a0223ed6425cb590df0a8;p=ceph.git doc/releases: add more to mgr section and fix rados formatting Signed-off-by: Laura Flores --- diff --git a/doc/releases/tentacle.rst b/doc/releases/tentacle.rst index 934b26ce5bf1..a06029ccc69b 100644 --- a/doc/releases/tentacle.rst +++ b/doc/releases/tentacle.rst @@ -20,8 +20,8 @@ RADOS faster WAL (write-ahead-log). * Data Availability Score: Users can now track a data availability score for each pool in their cluster. -* All components have been switched to the faster OMAP iteration interface, - which improves RGW bucket listing and scrub operations. +* OMAP: All components have been switched to the faster OMAP iteration + interface, which improves RGW bucket listing and scrub operations. Dashboard @@ -109,13 +109,17 @@ Dashboard MGR --- -* Users now have the ability to force-disable always-on modules and the - removal of the restful and zabbix modules (both deprecated since 2020). - Note that the dashboard module's richer and better-maintained RESTful - API can be used as an alternative to the restful module, and the - prometheus module can be used as an alternative monitoring solution for - zabbix. +* The Ceph Manager's always-on modulues/plugins can now be force-disabled. + This can be necessary in cases where we wish to prevent the manager from being + flooded by module commands when Ceph services are down or degraded. +* ``mgr/restful``, ``mgr/zabbix``: both modules, already deprecated since 2020, have been + finally removed. They have not been actively maintenance in the last years, + and started suffering from vulnerabilities in their dependency chain (e.g.: + CVE-2023-46136). An alternative for the ``restful`` module is the ``dashboard`` module, + which provides a richer and better maintained RESTful API. Regarding the ``zabbix`` module, + there are alternative monitoring solutions, like ``prometheus``, which is the most + widely adopted among the Ceph user community. RADOS ----- @@ -126,7 +130,7 @@ RADOS * A new implementation of the Erasure Coding I/O code provides substantial performance improvements and some capacity improvements. The new code is designed to optimize performance when using Erasure Coding with block storage - (RBD) and file storage (CephFS), but will have some benefits for object (RGW) + (RBD) and file storage (CephFS) but will have some benefits for object (RGW) storage, in particular when using smaller sized objects. A new flag ``allow_ec_optimizations`` needs to be set on each pool to switch to using the new code. Existing pools can be upgraded once the OSD and MON daemons have been @@ -136,11 +140,11 @@ RADOS ISA-L. Clusters created on Tentacle or later releases will use ISA-L as the default plugin when creating a new pool. Clusters that upgrade to the T release will continue to use their existing default values. The default values can be - overridden by creating a new erasure code profile and selecting it when creating - a new pool. ISA-L is recommended for new pools because the Jerasure library is - no longer maintained. + overridden by creating a new erasure code profile and selecting it when + creating a new pool. ISA-L is recommended for new pools because the Jerasure + library is no longer maintained. -* BlueStore now has better compression and a new, faster WAL (write-ahead log). +* BlueStore now has better compression and a new, faster WAL (write-ahead-log). * All components have been switched to the faster OMAP iteration interface, which improves RGW bucket listing and scrub operations. @@ -150,14 +154,14 @@ RADOS * Testing improvements for dencoding verification were added. -* A new command, ``ceph osd pool availability-status``, has been added that allows - users to view the availability score for each pool in a cluster. A pool is - considered unavailable if any PG in the pool is not in active state or if there - are unfound objects. Otherwise the pool is considered available. The score is - updated every one second by default. This interval can be changed using the new - config option ``pool_availability_update_interval``. The feature is off by - default. A new config option ``enable_availability_tracking`` can be used to - turn on the feature if required. Another command is added to clear the +* A new command, ``ceph osd pool availability-status``, has been added that + allows users to view the availability score for each pool in a cluster. A pool + is considered unavailable if any PG in the pool is not in active state or if + there are unfound objects. Otherwise the pool is considered available. The + score is updated every one second by default. This interval can be changed + using the new config option ``pool_availability_update_interval``. The feature + is off by default. A new config option ``enable_availability_tracking`` can be + used to turn on the feature if required. Another command is added to clear the availability status for a specific pool: :: @@ -177,14 +181,13 @@ RADOS Related tracker: https://tracker.ceph.com/issues/70406 * The ``ceph df`` command reports incorrect ``MAX AVAIL`` for stretch mode pools - when CRUSH rules use multiple ``take`` steps for datacenters. - ``PGMap::get_rule_avail`` incorrectly calculates available space from only one - datacenter. As a workaround, define CRUSH rules with ``take default`` and - ``choose firstn 0 type datacenter``. See - https://tracker.ceph.com/issues/56650#note-6 for details. - - Upgrading a cluster configured with a CRUSH rule with multiple ``take`` steps - can lead to data shuffling, as the new CRUSH changes may necessitate data + when CRUSH rules use multiple take steps for datacenters. ``PGMap::get_rule_avail`` + incorrectly calculates available space from only one datacenter. As a workaround, + define CRUSH rules with ``take default`` and ``choose firstn 0 type datacenter``. + See https://tracker.ceph.com/issues/56650#note-6 for details. + + Upgrading a cluster configured with a CRUSH rule with multiple take steps can + lead to data shuffling, as the new CRUSH changes may necessitate data redistribution. In contrast, a stretch rule with a single-take configuration will not cause any data movement during the upgrade process. @@ -192,7 +195,7 @@ RADOS behavior as ``librados::IoCtx::aio_cancel()``. * A new command, ``ceph osd rm-pg-upmap-primary-all``, has been added that allows - users to clear all ``pg-upmap-primary`` mappings in the OSD map when desired. + users to clear all ``pg-upmap-primary`` mappings in the osdmap when desired. Related trackers: @@ -202,24 +205,24 @@ RADOS * The configuration parameter ``osd_repair_during_recovery`` has been removed. That configuration flag used to control whether an operator-initiated "repair scrub" would be allowed to start on an OSD that is performing a recovery. In - this Ceph version, operator-initiated scrubs and repair scrubs are never - blocked by a repair being performed. + this Ceph version, operator-initiated scrubs and repair scrubs are never blocked + by a repair being performed. * Fixed issue of recovery/backfill hang due to improper handling of items in the - dmclock background clean-up thread. + dmclock's background clean-up thread. Related tracker: https://tracker.ceph.com/issues/61594 -* The OSD’s IOPS capacity used by the mClock scheduler is now also checked to - determine whether it is below a configured threshold value defined by: +* The OSD's IOPS capacity used by the mClock scheduler is now also checked to + determine if it's below a configured threshold value defined by: - ``osd_mclock_iops_capacity_low_threshold_hdd`` — set to 50 IOPS - ``osd_mclock_iops_capacity_low_threshold_ssd`` — set to 1000 IOPS + - ``osd_mclock_iops_capacity_low_threshold_hdd`` – set to 50 IOPS + - ``osd_mclock_iops_capacity_low_threshold_ssd`` – set to 1000 IOPS The check is intended to handle cases where the measured IOPS is unrealistically low. If such a case is detected, the IOPS capacity is either set to the last valid value or the configured default to avoid affecting cluster performance - (slow or stalled operations). + (slow or stalled ops). * Documentation has been updated with steps to override OSD IOPS capacity configuration.