From: Andrew Kryczka Date: Fri, 16 Mar 2018 21:08:01 +0000 (-0700) Subject: Fix WAL corruption from checkpoint/backup race condition X-Git-Tag: v5.12.2~4 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=34ccab0210d709f39496a52f02c6b7c7b829d241;p=rocksdb.git Fix WAL corruption from checkpoint/backup race condition Summary: `Writer::WriteBuffer` was always called at the beginning of checkpoint/backup. But that log writer has no internal synchronization, which meant the same buffer could be flushed twice in a race condition case, causing a WAL entry to be duplicated. Then subsequent WAL entries would be at unexpected offsets, causing the 32KB block boundaries to be overlapped and manifesting as a corruption. This PR fixes the behavior to only use `WriteBuffer` (via `FlushWAL`) in checkpoint/backup when manual WAL flush is enabled. In that case, users are responsible for providing synchronization between WAL flushes. We can also consider removing the call entirely. Closes https://github.com/facebook/rocksdb/pull/3603 Differential Revision: D7277447 Pulled By: ajkr fbshipit-source-id: 1b15bd7fd930511222b075418c10de0aaa70a35a --- diff --git a/HISTORY.md b/HISTORY.md index 7c2d70d96..f87601590 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -6,6 +6,9 @@ ### New Features * Avoid unnecessarily flushing in `CompactRange()` when the range specified by the user does not overlap unflushed memtables. +### Bug Fixes +* Fix WAL corruption caused by race condition between user write thread and backup/checkpoint thread. + ## 5.12.0 (2/14/2018) ### Public API Change * Iterator::SeekForPrev is now a pure virtual method. This is to prevent user who implement the Iterator interface fail to implement SeekForPrev by mistake. diff --git a/utilities/checkpoint/checkpoint_impl.cc b/utilities/checkpoint/checkpoint_impl.cc index d93c7095f..d3c0ae1e4 100644 --- a/utilities/checkpoint/checkpoint_impl.cc +++ b/utilities/checkpoint/checkpoint_impl.cc @@ -222,7 +222,9 @@ Status CheckpointImpl::CreateCustomCheckpoint( TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1"); TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2"); - db_->FlushWAL(false /* sync */); + if (db_options.manual_wal_flush) { + db_->FlushWAL(false /* sync */); + } } // if we have more than one column family, we need to also get WAL files if (s.ok()) {