ceph: fix ceph_fallocate() ignoring of FALLOC_FL_ALLOCATE_RANGE mode
The fio test reveals the issue for the case of file size
is not aligned on 4K (for example, 4122, 8600, 10K etc).
The reproducing path:
target_dir=/mnt/cephfs
report_dir=/report
size=100ki
nrfiles=10
pattern=0x74657374
fio --runtime=5M --rw=write --bs=4k --size=$size \
--nrfiles=$nrfiles --numjobs=16 --buffer_pattern=0x74657374 \
--iodepth=1 --direct=0 --ioengine=libaio --group_reporting \
--name=fiotest --directory=$target_dir \
--output $report_dir/sequential_write.log
fio --runtime=5M --verify_only --verify=pattern \
--verify_pattern=0x74657374 --size=$size --nrfiles=$nrfiles \
--numjobs=16 --bs=4k --iodepth=1 --direct=0 --name=fiotest \
--ioengine=libaio --group_reporting --verify_fatal=1 \
--verify_state_save=0 --directory=$target_dir \
--output $report_dir/verify_sequential_write.log
The essence of the issue that the write phase calls
the fallocate() to pre-allocate 10K of file size and, then,
it writes only 8KB of data. However, CephFS code
in ceph_fallocate() ignores the FALLOC_FL_ALLOCATE_RANGE
mode and, finally, file is 8K in size only. As a result,
verification phase initiates wierd behaviour of CephFS code.
CephFS code calls ceph_fallocate() again and completely
re-write the file content by some garbage. Finally,
verification phase fails because file contains unexpected
data pattern.
fio: got pattern 'd0', wanted '74'. Bad bits 3
fio: bad pattern block offset 0
pattern: verify failed at file /mnt/cephfs/fiotest.3.0 offset 0, length
2631490270 (requested block: offset=0, length=4096, flags=8)
fio: verify type mismatch (36969 media, 18 given)
fio: got pattern '25', wanted '74'. Bad bits 3
fio: bad pattern block offset 0
pattern: verify failed at file /mnt/cephfs/fiotest.4.0 offset 0, length
1694436820 (requested block: offset=0, length=4096, flags=8)
fio: verify type mismatch (6714 media, 18 given)
Expected state ot the file:
hexdump -C ./fiotest.0.0
00000000 74 65 73 74 74 65 73 74 74 65 73 74 74 65 73 74 |testtesttesttest| *
00002000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| *
00002190 00 00 00 00 00 00 00 00 |........|
00002198
Real state of the file:
head -n 2 ./fiotest.0.0
00000000 35 e0 28 cc 38 a0 99 16 06 9c 6a a9 f2 cd e9 0a |5.(.8.....j.....|
00000010 80 53 2a 07 09 e5 0d 15 70 4a 25 f7 0b 39 9d 18 |.S*.....pJ%..9..|
The patch reworks ceph_fallocate() method by means of adding
support of FALLOC_FL_ALLOCATE_RANGE mode. Also, it adds the checking
that new size can be allocated by means of checking inode_newsize_ok(),
fsc->max_file_size, and ceph_quota_is_max_bytes_exceeded().
Invalidation and making dirty logic is moved into dedicated
methods.
There is one peculiarity for the case of generic/103 test.
CephFS logic receives max_file_size from MDS server and it's 1TB
by default. As a result, generic/103 can fail if max_file_size
is smaller than volume size:
generic/103 6s ... - output mismatch (see /home/slavad/XFSTESTS/xfstests-dev/results//generic/103.out.bad)