From: Jianpeng Ma Date: Mon, 10 Aug 2020 07:56:13 +0000 (+0800) Subject: os/bluestore/BlueRocksEnv: Avoid flushing too much data at once. X-Git-Tag: v14.2.17~20^2~3 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=0de64149a08bf23a20688dc093bc56c8b17b4887;p=ceph.git os/bluestore/BlueRocksEnv: Avoid flushing too much data at once. Although, in _flush func we already check length. If length of dirty is less then bluefs_min_flush_size, we will skip this flush. But in fact, we found rocksdb can call many times Append() and then call Flush(). This make flush_data is much larger than bluefs_min_flush_size. From my test, w/o this patch, it can reduce 99.99% latency(from 145.753ms to 20.474ms) for 4k randwrite with bluefs_buffered_io=true. Because Bluefs::flush acquire lock. So we add new api try_flush to avoid lock contention. Signed-off-by: Jianpeng Ma (cherry picked from commit 3a1d4ff6dcdcd897d31a85ef82c726599624b66d) --- diff --git a/src/os/bluestore/BlueFS.h b/src/os/bluestore/BlueFS.h index 86748e4afb09..2b7ccfafce20 100644 --- a/src/os/bluestore/BlueFS.h +++ b/src/os/bluestore/BlueFS.h @@ -557,6 +557,12 @@ public: int r = _flush(h, force, l); ceph_assert(r == 0); } + void try_flush(FileWriter *h) { + h->buffer_appender.flush(); + if (h->buffer.length() >= cct->_conf->bluefs_min_flush_size) { + flush(h, true); + } + } void flush_range(FileWriter *h, uint64_t offset, uint64_t length) { std::lock_guard l(lock); _flush_range(h, offset, length); diff --git a/src/os/bluestore/BlueRocksEnv.cc b/src/os/bluestore/BlueRocksEnv.cc index 51614c09d2cd..b4703ae2c2a9 100644 --- a/src/os/bluestore/BlueRocksEnv.cc +++ b/src/os/bluestore/BlueRocksEnv.cc @@ -172,6 +172,9 @@ class BlueRocksWritableFile : public rocksdb::WritableFile { rocksdb::Status Append(const rocksdb::Slice& data) override { h->append(data.data(), data.size()); + // Avoid calling many time Append() and then calling Flush(). + // Especially for buffer mode, flush much data will cause jitter. + fs->try_flush(h); return rocksdb::Status::OK(); }