os/bluestore: fix deep-scrub operation againest disk silent errors
Say a object who has data caches, but in a while later, caches' underlying
physical device has silent disk erros accidentally, then caches and physical
data are not same. In such case, deep-scrub operation still tries to read
caches firstly and won't do crc checksum, then deep-scrub won't find such
data corruptions timely.
Here introduce a new flag 'CEPH_OSD_OP_FLAG_BYPASS_CLEAN_CACHE' which tells
deep-scrub to bypass object caches. Note that we only bypass cache who is in
STATE_CLEAN state. For STATE_WRITING caches, currently they are not written
to physical device, so deep-scrub operation can not read physical device and
can read these dirty caches safely. Once they are in STATE_CLEAN state(or not
added to bluestore cache), next round deep-scurb can check them correctly.
As to above discussions, I refactor BlueStore::BufferSpace::read sightly,
adding a new 'flags' argument, whose value will be 0 or:
enum {
BYPASS_CLEAN_CACHE = 0x1, // bypass clean cache
};
flags 0: normal read, do not bypass clean or dirty cache
flags BYPASS_CLEAN_CACHE: bypass clean cache, currently only for deep-scrube
operation
Test:
I deliberately corrupt a object with cache, with this patch, deep-scrub
can find data error very timely.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@easystack.cn>
(cherry picked from commit
a7f1af25dd2ba88a322ed21828f073a277b09d02)
Conflicts:
src/include/rados.h
src/os/bluestore/BlueStore.cc: trivial resolution