This commit fixes a critical cache key collision bug in the ISA erasure
code plugin that could lead to heap-buffer-overflow and silent data
corruption.
Problem:
--------
The decoding table cache was indexed only by matrix type and erasure
signature (available/missing chunk pattern), but did NOT include the
(k,m) erasure code configuration parameters. This caused different EC
configurations with similar erasure patterns to collide in the cache,
leading to incorrectly-sized cached buffers being reused.
AddressSanitizer Report:
------------------------
==4904==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x5160001397b8 at pc 0x5de8e415296b bp 0x7ffc82260310 sp 0x7ffc8225fad0
READ of size 576 at 0x5160001397b8 thread T0
#0 __asan_memcpy
#1 ErasureCodeIsaTableCache::getDecodingTableFromCache()
.../ErasureCodeIsaTableCache.cc:260:5
#2 ErasureCodeIsaDefault::isa_decode()
.../ErasureCodeIsa.cc:490:15
0x5160001397b8 is located 0 bytes after 568-byte region
[0x516000139580,0x5160001397b8) allocated by:
#0 posix_memalign
#1 ceph::buffer::raw_combined::alloc_data_n_controlblock()
#2 ErasureCodeIsaTableCache::putDecodingTableToCache()
.../ErasureCodeIsaTableCache.cc:319:18
Root Cause:
-----------
Scenario illustrating the bug:
1. First decode operation: k=2, m=1, erasure pattern "+0+2-1"
- Creates cache entry with key "+0+2-1"
- Allocates buffer: 2*(1+2)*32 = 192 bytes
2. Second decode operation: k=3, m=3, same erasure pattern "+0+2-1"
- Looks up cache with key "+0+2-1" → COLLISION
- Retrieves 192-byte buffer but needs 3*(3+3)*32 = 576 bytes
- Result: Heap-buffer-overflow (reads 384 bytes beyond allocation)
Worse scenario (silent corruption):
1. First decode: k=3, m=3 → caches 576-byte table
2. Second decode: k=2, m=1 → retrieves wrong table
- Uses incorrect decoding matrix
- Result: Silent data corruption during recovery
Solution:
---------
Include k and m parameters in cache signature
- Old format: "+0+2+3-1-4"
- New format: "k3m2a+0+2+3e-1-4"
Test Fix:
---------
Also fixes a buffer overflow in TestErasureCodePlugins.cc where
hashes_bl offset was calculated using chunk_size instead of sizeof(uint32_t),
causing reads beyond the CRC buffer.
Production Impact:
------------------
Backward Compatibility: FULLY COMPATIBLE
- Cache is ephemeral (in-memory only, not persisted)
- Cache cleared on process restart
- Rolling upgrades safe - each OSD restart gets fixed code
- Old cache entries automatically invalidated on upgrade
- No wire protocol or on-disk format changes
- No configuration changes required
- No breaking changes
Data Integrity:
- Eliminates silent data corruption risk
- Eliminates heap-buffer-overflow crashes
- Cache now correctly isolated by (k,m) configuration
- Correct decoding tables always used for recovery
- No risk of corrupting user data from the fix itself
Several factors likely prevented widespread reports:
1. Low probability conditions required:
- Need multiple EC pools with DIFFERENT (k,m) configurations
- Need similar erasure patterns across pools
- Need cache collision to occur during actual recovery operations
- Recovery operations are relatively rare in healthy clusters
2. Crash vs silent corruption detection:
- Buffer overflows (easier to detect) occur when k2,m2 > k1,m1
- Silent corruption (harder to detect) occurs when k2,m2 < k1,m1
- Crashes might be attributed to other causes
- Data corruption only detected during scrub or data verification
3. Common deployment patterns:
- Many deployments use single EC configuration cluster-wide
- Default EC configurations (k=2,m=1 or k=4,m=2) reduce collision space
- Erasure pattern variety may be insufficient for collisions
4. ISA plugin usage:
- Not universally deployed (requires Intel ISA-L library)
- Some sites use jerasure plugin instead
- Plugin selection depends on hardware and configuration
5. Detection difficulty:
- ASan not enabled in production builds
- Silent corruption only appears during:
* Degraded reads with recovery
* Scrub operations
* Deep-scrub verification
- Corrupted data might not be immediately accessed