From 4ed7255cfdbb6aa5ffd6a0018ba65ca3465e978b Mon Sep 17 00:00:00 2001 From: Bill Scales Date: Fri, 1 Aug 2025 16:17:58 +0100 Subject: [PATCH] doc: erasure coding enhancements for tentacle * Document new pool flag allow_ec_optimizations * Reference new conf setting osd_pool_default_flag_ec_optimizations * Add section describing Erasure Code Optimizations Signed-off-by: Bill Scales (cherry picked from commit 185987afff561001423196e9bc1366e4b7079c20) --- doc/cephfs/createfs.rst | 6 ++ .../configuration/pool-pg-config-ref.rst | 1 + doc/rados/operations/erasure-code.rst | 56 +++++++++++++++++++ doc/rados/operations/pools.rst | 3 + 4 files changed, 66 insertions(+) diff --git a/doc/cephfs/createfs.rst b/doc/cephfs/createfs.rst index ce91660c2ef..1054d3f1c27 100644 --- a/doc/cephfs/createfs.rst +++ b/doc/cephfs/createfs.rst @@ -137,5 +137,11 @@ You may use Erasure Coded pools as CephFS data pools as long as they have overwr Note that EC overwrites are only supported when using OSDs with the BlueStore backend. +If you are storing lots of small files or are frequently modifying files you can improve performance by enabling EC optimizations, which is done as follows: + +.. code:: bash + + ceph osd pool set my_ec_pool allow_ec_optimizations true + You may not use Erasure Coded pools as CephFS metadata pools, because CephFS metadata is stored using RADOS *OMAP* data structures, which EC pools cannot store. diff --git a/doc/rados/configuration/pool-pg-config-ref.rst b/doc/rados/configuration/pool-pg-config-ref.rst index c3a25a3e74f..422ebec5af4 100644 --- a/doc/rados/configuration/pool-pg-config-ref.rst +++ b/doc/rados/configuration/pool-pg-config-ref.rst @@ -69,6 +69,7 @@ See :ref:`pg-autoscaler`. .. confval:: osd_max_pg_log_entries .. confval:: osd_default_data_pool_replay_window .. confval:: osd_max_pg_per_osd_hard_ratio +.. confval:: osd_pool_default_flag_ec_optimizations .. _pool: ../../operations/pools .. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg#peering diff --git a/doc/rados/operations/erasure-code.rst b/doc/rados/operations/erasure-code.rst index 42b1032ac50..333f49abc83 100644 --- a/doc/rados/operations/erasure-code.rst +++ b/doc/rados/operations/erasure-code.rst @@ -252,6 +252,62 @@ of ``k`` more viable. Increasing ``m`` still impacts write performance, especially for small writes, so for block and file workloads a value of ``m`` no larger than 3 is recommended. +Erasure Coding Optimizations +---------------------------- + +Since Tentacle, an erasure-coded pool may have optimizations enabled +with a per-pool setting. This improves performance for smaller I/Os and +eliminates padding which can save capacity: + +.. prompt:: bash $ + + ceph osd pool set ec_pool allow_ec_optimizations true + +The optimizations will make an erasure code pool more suitable for use +with RBD or CephFS. For RGW workloads that have large objects that are read and +written sequentially there will be little benefit from these optimizations; but +RGW workloads with lots of very small objects or small random access reads will +see performance and capacity benefits. + +This flag may be enabled for existing pools, and can be configured +to default for new pools using the central configuration option +:confval:`osd_pool_default_flag_ec_optimizations`. Once the flag has been +enabled for a pool it cannot be disabled because it changes how new data is +stored. + +The flag cannot be set unless all the Monitors and OSDs have been +upgraded to Tentacle or later. Optimizations can be enabled and used without +upgrading gateways and clients. + +Optimizations are currently only supported with the Jerasure and ISA-L plugins +when using the ``reed_sol_van`` technique (these are the old and current +defaults and are the most widely used plugins and technique). Attempting to +set the flag for a pool using an unsupported combination of plugin and +technique is blocked with an error message. + +The default stripe unit is 4K which works well for standard EC pools. +For the majority of I/O workloads it is recommended to increase the stripe +unit to at least 16K when using optimizations. Performance testing +shows that 16K is the best choice for general purpose I/O workloads. Increasing +this value will significantly improve small read performance but will slightly +reduce the performance of small sequential writes. For I/O workloads that are +predominately reads, larger values up to 256KB will further improve read +performance but will further reduce the performance of small sequential writes. +Values larger than 256KB are unlikely to have any performance benefit. The +stripe unit is a pool create-time option that can be set in the erasure code +profile or by setting the central configuration option +:confval:`osd_pool_erasure_code_stripe_unit`. The stripe unit cannot be changed +after the pool has been created, so if enabling optimizations for an existing +pool you will not get the full benefit of the optimizations. + +Without optimizations enabled, the choice of ``k+m`` in the erasure code profile +affects performance. The higher the values of ``k`` and ``m`` the lower the +performance will be. With optimizations enabled there is only a very slight +reduction in performance as ``k`` increases so this makes using a higher value +of ``k`` more viable. Increasing ``m`` still impacts write performance, +especially for small writes, so for block and file workloads a value of ``m`` +no larger than 3 is recommended. + Erasure-coded pool overhead --------------------------- diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst index 444fd248704..69866ade3ce 100644 --- a/doc/rados/operations/pools.rst +++ b/doc/rados/operations/pools.rst @@ -449,6 +449,8 @@ You may set values for the following keys: :Type: Boolean .. versionadded:: 12.2.0 + +.. describe:: allow_ec_optimizations :Description: Enables performance and capacity optimizations for an erasure-coded pool. These optimizations were designed for CephFS and RBD workloads; RGW workloads with signficant numbers of small objects or with small random access reads of objects will also benefit. RGW workloads with large sequential read and writes will see little benefit. For more details, see :ref:`rados_ops_erasure_coding_optimizations`: :Type: Boolean @@ -905,6 +907,7 @@ Here are the break downs of the argument: .. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter .. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups .. _Erasure Coding with Overwrites: ../erasure-code#erasure-coding-with-overwrites +.. _Erasure Coding Optimizations: ../erasure-code#erasure-coding-optimizations .. _Block Device Commands: ../../../rbd/rados-rbd-cmds/#create-a-block-device-pool .. _pgcalc: ../pgcalc -- 2.39.5