When rocksdb has log recycling on (this is required!), it will
do robust checksums on log records and change playback behavior
to tolerate trailing garbage in the log file. This normally
allows it to overwrite previous log files, but it can also let
us overwrite arbitrary garbage on the device.
If we allocate some new space for a .log file (already indicated
by the WRITER_WAL hint), extend the size immediately so that each
subsequent append doesn't have to (unless/until we do another
allocation).
This is safe as long as rocksdb recycling is enabled (which it is
by default).
This is faster because we don't have to flush the bluefs log on
every log append during the period after startup before rocksdb
starts recycling log files.
Signed-off-by: Sage Weil <sage@redhat.com>
OPTION(bluefs_compact_log_sync, OPT_BOOL, false) // sync or async log compaction?
OPTION(bluefs_buffered_io, OPT_BOOL, false)
OPTION(bluefs_allocator, OPT_STR, "bitmap") // stupid | bitmap
+OPTION(bluefs_preextend_wal_files, OPT_BOOL, true) // this *requires* that rocksdb has recycling enabled
OPTION(bluestore_bluefs, OPT_BOOL, true)
OPTION(bluestore_bluefs_env_mirror, OPT_BOOL, false) // mirror to normal Env for debug
<< dendl;
return r;
}
+ if (g_conf->bluefs_preextend_wal_files &&
+ h->writer_type == WRITER_WAL) {
+ // NOTE: this *requires* that rocksdb also has log recycling
+ // enabled and is therefore doing robust CRCs on the log
+ // records. otherwise, we will fail to reply the rocksdb log
+ // properly due to garbage on the device.
+ h->file->fnode.size = h->file->fnode.get_allocated();
+ dout(10) << __func__ << " extending WAL size to 0x" << std::hex
+ << h->file->fnode.size << std::dec << " to include allocated"
+ << dendl;
+ }
must_dirty = true;
}
if (h->file->fnode.size < offset + length) {