Although, in _flush func we already check length. If length of dirty
is less then bluefs_min_flush_size, we will skip this flush.
But in fact, we found rocksdb can call many times Append() and then
call Flush(). This make flush_data is much larger than
bluefs_min_flush_size.
From my test, w/o this patch, it can reduce 99.99% latency(from
145.753ms to 20.474ms) for 4k randwrite with bluefs_buffered_io=true.
Because Bluefs::flush acquire lock. So we add new api try_flush
to avoid lock contention.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit
3a1d4ff6dcdcd897d31a85ef82c726599624b66d)
int r = _flush(h, force, l);
ceph_assert(r == 0);
}
+ void try_flush(FileWriter *h) {
+ h->buffer_appender.flush();
+ if (h->buffer.length() >= cct->_conf->bluefs_min_flush_size) {
+ flush(h, true);
+ }
+ }
void flush_range(FileWriter *h, uint64_t offset, uint64_t length) {
std::lock_guard l(lock);
_flush_range(h, offset, length);
rocksdb::Status Append(const rocksdb::Slice& data) override {
h->append(data.data(), data.size());
+ // Avoid calling many time Append() and then calling Flush().
+ // Especially for buffer mode, flush much data will cause jitter.
+ fs->try_flush(h);
return rocksdb::Status::OK();
}