From: Jianpeng Ma Date: Mon, 10 Aug 2020 07:56:13 +0000 (+0800) Subject: os/bluestore/BlueRocksEnv: Avoid flushing too much data at once. X-Git-Tag: v16.1.0~1377^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=3a1d4ff6dcdcd897d31a85ef82c726599624b66d;p=ceph.git os/bluestore/BlueRocksEnv: Avoid flushing too much data at once. Although, in _flush func we already check length. If length of dirty is less then bluefs_min_flush_size, we will skip this flush. But in fact, we found rocksdb can call many times Append() and then call Flush(). This make flush_data is much larger than bluefs_min_flush_size. From my test, w/o this patch, it can reduce 99.99% latency(from 145.753ms to 20.474ms) for 4k randwrite with bluefs_buffered_io=true. Because Bluefs::flush acquire lock. So we add new api try_flush to avoid lock contention. Signed-off-by: Jianpeng Ma --- diff --git a/src/os/bluestore/BlueFS.h b/src/os/bluestore/BlueFS.h index 1a4997663455..c8e2063ce79a 100644 --- a/src/os/bluestore/BlueFS.h +++ b/src/os/bluestore/BlueFS.h @@ -541,6 +541,12 @@ public: int r = _flush(h, force, l); ceph_assert(r == 0); } + void try_flush(FileWriter *h) { + h->buffer_appender.flush(); + if (h->buffer.length() >= cct->_conf->bluefs_min_flush_size) { + flush(h, true); + } + } void flush_range(FileWriter *h, uint64_t offset, uint64_t length) { std::lock_guard l(lock); _flush_range(h, offset, length); diff --git a/src/os/bluestore/BlueRocksEnv.cc b/src/os/bluestore/BlueRocksEnv.cc index 8b154335ca6a..f4d68ea10259 100644 --- a/src/os/bluestore/BlueRocksEnv.cc +++ b/src/os/bluestore/BlueRocksEnv.cc @@ -178,6 +178,9 @@ class BlueRocksWritableFile : public rocksdb::WritableFile { rocksdb::Status Append(const rocksdb::Slice& data) override { h->append(data.data(), data.size()); + // Avoid calling many time Append() and then calling Flush(). + // Especially for buffer mode, flush much data will cause jitter. + fs->try_flush(h); return rocksdb::Status::OK(); }