]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
common/options: Add TTL rocksdb option for time based compaction
authorMark Nelson <mnelson@redhat.com>
Thu, 21 Jul 2022 19:49:23 +0000 (19:49 +0000)
committerMark Nelson <mnelson@redhat.com>
Fri, 22 Jul 2022 01:21:34 +0000 (01:21 +0000)
In many different contexts we are seeing issues with RocksDB tombstones causing extremely slow itereation performance.  In the past we've tried to solve this using RangeDelete with unfortunate consequences.  There are a couple of things we can do to mitigate some of the impact of this however.

One option is to set a compaction TTL.  This is documented in the RocksDB wiki here:

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#periodic-and-ttl-compaction

The idea here is that no data is allowed to sit in RocksDB for a given length of time without being compacted.  Several users including Alexandre Marangone and Josh Baergen from DigitalOcean have documented significantly better performance under deletion workloads while utilizing this option:

https://tracker.ceph.com/issues/53926

We are setting this to a fairly conservative 6 hours by default (The same value Josh Baergen reported using at Digital Ocean).  This should limit the write-amplification impact that could potentially occur with a much more aggressive compaction TTL.

Caveats to this approach:

1) It only works with tombstones accumulating in SST files.
2) It will only help with a gradual accumulation of tombstones over long periods of time.
3) It does nothing to help with accumulation of tombstones in memtables.

Additional mitigation methods (especially compaction triggered by delete) will be necessary, though this still can serve as a useful "last line of defense" if tombstones are accumulating in SST files.

Signed-off-by: Mark Nelson <mnelson@redhat.com>
src/common/options/global.yaml.in

index ab2e730d05621b920377338924b3c3ab705b5a97..d5d898ea3b70d6418721aa631f3756400fc1355c 100644 (file)
@@ -4911,7 +4911,7 @@ options:
   type: str
   level: advanced
   desc: Full set of rocksdb settings to override
-  default: compression=kNoCompression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0
+  default: compression=kNoCompression,max_write_buffer_number=128,min_write_buffer_number_to_merge=16,compaction_style=kCompactionStyleLevel,write_buffer_size=8388608,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0,ttl=21600
   with_legacy: true
 - name: bluestore_rocksdb_options_annex
   type: str