From: Jaya Prakash Date: Mon, 1 Jun 2026 15:39:24 +0000 (+0000) Subject: doc/rados/bluestore: Add documentation for the BlueFS spillover cleaner X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=7d7c18a9156826f280009beadef26ccdaa832bda;p=ceph.git doc/rados/bluestore: Add documentation for the BlueFS spillover cleaner Fixes: https://tracker.ceph.com/issues/74319 Signed-off-by: Jaya Prakash --- diff --git a/doc/rados/bluestore/bluefs-spillover-cleaner.rst b/doc/rados/bluestore/bluefs-spillover-cleaner.rst new file mode 100644 index 00000000000..7d25dbf6de3 --- /dev/null +++ b/doc/rados/bluestore/bluefs-spillover-cleaner.rst @@ -0,0 +1,111 @@ +========================= +BlueFS Spillover Cleaner +========================= + +Overview +======== + +BlueFS may place files on the slow device when the DB device +runs out of free space. This condition is known as spillover. + +The BlueFS Spillover Cleaner is a background component that periodically +scans for spillover files and attempts to migrate them back to +the DB device when sufficient space becomes available. + +Spillover is most commonly observed when the DB device becomes full, +for example large RocksDB compactions after OSD crashes, upgrades that change +RocksDB behavior. + +The cleaner is disabled by default. + +Operation +========= + +The cleaner operates in two phases: + +Active Phase: + +- Scan for files located on the slow device. +- Migrate those files back to the DB device. +- While migration is in progress, the cleaner throttles itself according to :confval:`bluefs_spillover_cleaner_work_ratio` to reduce interference with foreground IO. + +Idle Phase: + +- Entered when no spillover files are found +- When no spillover files remain, the cleaner enters a longer sleep period controlled by :confval:`bluefs_spillover_idle_time` before performing the next scan. + +Configuration +============= + +Enable cleaner +-------------- + +.. confval:: bluefs_spillover_cleaner + +Idle Time +------------- + +.. confval:: bluefs_spillover_idle_time + +Work ratio +---------- + +.. confval:: bluefs_spillover_cleaner_work_ratio + +Higher values make migration more aggressive, while lower values reduce +resource consumption. + +Migration behavior +================== + +Files are migrated incrementally rather than moving an entire file at +once. This helps limit memory consumption and reduce latency spikes. + +Migration is attempted only when sufficient free space exists on the +DB device. + +Admin Commands +============== + +Display spillover cleaner status: + +.. prompt:: bash # + + ceph tell osd.N bluefs spillover cleaner stats + +Example output: + +.. code-block:: + + { + "Files Migrated": [ + ... + "db.wal/000052.log size=0x24cd21 migrated=0x1200000 from dev=2->1 ts=2026-06-02T12:44:25.488250+0000", + "db.wal/000053.log size=0x3696ad migrated=0x1200000 from dev=2->1 ts=2026-06-02T12:44:25.833014+0000", + "db.wal/000054.log size=0x1571f61 migrated=0x2400000 from dev=2->1 ts=2026-06-02T12:44:26.134245+0000", + "db.wal/000055.log size=0x1b2b508 migrated=0x2400000 from dev=2->1 ts=2026-06-02T12:44:26.550633+0000" + ], + "pending_files": [ + "db.wal/000056.log", + "db.wal/000057.log", + "db.wal/000058.log", + "db.wal/000059.log", + "db.wal/000060.log", + ... + ] + } + +Testing +======= + +For testing purposes, allocations can be forced onto the slow device. + +.. confval:: bluefs_debug_force_slow + +Example: + +.. prompt:: bash # + + ceph config set osd bluefs_debug_force_slow true + +This option is intended only for development and testing. \ No newline at end of file diff --git a/doc/rados/configuration/index.rst b/doc/rados/configuration/index.rst index 6c73c67ed9a..01b83f95472 100644 --- a/doc/rados/configuration/index.rst +++ b/doc/rados/configuration/index.rst @@ -21,6 +21,7 @@ For general object store configuration, refer to the following: Storage devices BlueStore RocksDB cache <../bluestore/rocksdb-config> + BlueFS Spillover Cleaner <../bluestore/bluefs-spillover-cleaner> ceph-conf