Summary:
`ReadaheadRandomAccessFile` is used by iterators for file reads in several cases, like in compaction when `compaction_readahead_size > 0` or `use_direct_io_for_flush_and_compaction == true`, or in user iterator when `ReadOptions::readahead_size > 0`. `ReadaheadRandomAccessFile` maintains an internal buffer for readahead data. It assumes that, if the buffer's length is less than `ReadaheadRandomAccessFile::readahead_size_`, which is fixed in the constructor, then EOF has been reached so it doesn't try reading further.
Recently,
d938226af405681c592f25310f41c0c933bcdb19 started calling `RandomAccessFile::Prefetch` with various lengths: 8KB, 16KB, etc. When the `RandomAccessFile` is a `ReadaheadRandomAccessFile`, it triggers the above condition and incorrectly determines EOF. If a block is partially in the readahead buffer and EOF is incorrectly decided, the result is a truncated data block.
The problem is reproducible:
```
TEST_TMPDIR=/data/compaction_bench ./db_bench -benchmarks=fillrandom -write_buffer_size=
1048576 -target_file_size_base=
1048576 -block_size=18384 -use_direct_io_for_flush_and_compaction=true
...
put error: Corruption: truncated block read from /data/compaction_bench/dbbench/000014.sst offset 20245, expected 10143 bytes, got 8427
```
Closes https://github.com/facebook/rocksdb/pull/3454
Differential Revision:
D6869405
Pulled By: ajkr
fbshipit-source-id:
87001c299e7600a37c0dcccbd0368e0954c929cf
}
virtual Status Prefetch(uint64_t offset, size_t n) override {
+ if (n < readahead_size_) {
+ // Don't allow smaller prefetches than the configured `readahead_size_`.
+ // `Read()` assumes a smaller prefetch buffer indicates EOF was reached.
+ return Status::OK();
+ }
size_t prefetch_offset = TruncateToPageBoundary(alignment_, offset);
if (prefetch_offset == buffer_offset_) {
return Status::OK();