We do async-compact-log, i met this bug:
2017-06-28 11:51:42.747315
7f193dd70bc0 -1
/root/ceph/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool)' thread
7f193dd70bc0 time 2017-06-28
11:51:42.741868
/root/ceph/src/os/bluestore/BlueFS.cc: 714: FAILED assert(r == q->second->file_map.end())
ceph version
12.0.3-2327-gc74625e
(
c74625ebf57d603043f414a83b7a6525264fb6ae) luminous (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x10e) [0x5628ee1f8a0e]
2: (BlueFS::_replay(bool)+0x3bc3) [0x5628ee18cb13]
3: (BlueFS::mount()+0x1cf) [0x5628ee18cf0f]
4: (BlueStore::_open_db(bool)+0xd99) [0x5628ee0af7f9]
5: (BlueStore::_mount(bool)+0x3da) [0x5628ee0e056a]
6: (OSD::init()+0x28f) [0x5628edce10bf]
7: (main()+0x29ca) [0x5628edbf116a]
8: (__libc_start_main()+0xf5) [0x7f193b2c1f45]
9: (()+0x493306) [0x5628edc8b306]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
assume this case :
Thread1 Thread2
_compact_log_async
_flush_and_sync_log
lock.unlock()
open_for_write(A)
op_file_update
op_dir_link
lock.lock()
_compact_log_dump_metadata
contail file A
flush
lock.unlock
op_file_update(alloc new extent)
_flush_and_sync_log
So two log entry have the same infos(op_dir_link). When do _replay the
above bug occur.
Before reflect everything to compact, we should clear current log entrys
to avoid this. And compact contain all infos. It don't miss something.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>