We keep superblock bitstream in 2 places:
- "osd_superblock" object's data stream
- "osd_superblock" object's omap key named "osd_superblock"
The recovery procedure works fine.
But if data stream is corrupted on ObjectStore::write() request
BlueStore might require to read the old data from object's allocation unit.
This will cause assert like:
"BlueStore.cc: 15865: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)"
The solution is to remove object before writing to it, clearing any
reason for the read-on-write.
Replication, using vstart cluster:
> OSD=1 MON=1 FS=0 RGW=0 ../src/vstart.sh -l -n
> ../src/stop.sh
> OBJECT=$(./bin/ceph-objectstore-tool --no-mon-config --data-path dev/osd0/ --op meta-list |grep osd_superblock)
> ./bin/ceph-objectstore-tool --no-superblock --no-mon-config --data-path dev/osd0/ --pool meta $OBJECT get-bytes osd-superblock.data
Using no superblock
> head -c 500 /dev/random >> osd-superblock.data
> ./bin/ceph-objectstore-tool --no-superblock --no-mon-config --data-path dev/osd0/ --pool meta $OBJECT set-bytes osd-superblock.data
Using no superblock
> ./bin/ceph-objectstore-tool --no-superblock --no-mon-config --data-path dev/osd0/ --pool meta $OBJECT dump | grep '"offset"'
Error getting attr on : meta,#-1:
7b3f43c4:::osd_superblock:0#, (61) No data available
"offset": 8192, <- use this offset in dd
> dd if=/dev/random of=dev/osd0/block bs=1 count=4000 seek=8192 conv=notrunc <- seek from offset above
4000+0 records in
4000+0 records out
4000 bytes (4.0 kB, 3.9 KiB) copied, 0.
0041956 s, 953 kB/s
> ../src/vstart.sh
> ./bin/ceph osd pool create xxxx
Without the fix osd will be asserted now.
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>