osd: Change rmissing map key from version_t to eversion_t
Changed the rmissing reverse index map from std::map<version_t, hobject_t> to std::map<eversion_t, hobject_t> to properly track both epoch and version information for missing objects.
Motivation:
This fixes a critical bug where out-of-order divergent logs across epoch boundaries could corrupt the missing list and cause assertions in pg_missing_set::get_oldest_need().
The root cause was that rmissing used only version_t as the key, which could lead to version number collisions when the same version number appears in different epochs. When merging logs from epoch 1 and epoch 2, objects with the same version number (e.g., version 1 in epoch 1 vs version 1 in epoch 2) would collide in the map, causing the reverse index to become inconsistent with the forward missing map.
During out-of-order recovery, this inconsistency would trigger the assertion:
FAILED ceph_assert(it != missing.end())
in get_oldest_need() when trying to look up an object that should exist in missing but whose entry in rmissing was overwritten by a collision.
Fix:
By using the full eversion_t (epoch + version) as the key, the map now:
Properly orders missing objects by epoch first, then version prevents key collisions when version numbers overlap across epochs Ensures correct recovery ordering during epoch transitions Maintains consistency between missing and rmissing maps Changes:
Updated rmissing type declaration and all accessor methods in pg_missing_set Updated all map operations to use eversion_t keys throughout the codebase Added safety assertions to detect duplicate keys and ensure consistency between missing and rmissing maps Fixed iterator types and conversions in recovery code paths Upgrade Compatibility:
This change requires no upgrade or versioning code because rmissing is never serialized to disk or transmitted over the network. The encode() method (line 5319) only encodes the missing map, not rmissing. During decode() (line 5325), rmissing is completely reconstructed from the decoded missing map (lines 5354-5363). This means:
Old OSDs writing data will only serialize missing
New OSDs reading data will reconstruct rmissing with the new key type No on-disk format changes are required
No network protocol changes are required
The change is transparent to upgrade/downgrade scenarios Testing:
All existing tests pass (unittest_pglog: 53 tests, unittest_osd_types: 70 tests) Safety assertions successfully detected and prevented a test bug where duplicate eversion_t keys were being used Renamed test cases for clarity:
merge_log_epoch_change_basic: Tests fundamental invariant merge_log_epoch_change_out_of_order_recovery: Tests recovery ordering Note: AI assistance was used to generate the unit test cases that validate the epoch change behavior and out-of-order recovery scenarios.
Fixes: https://tracker.ceph.com/issues/74306 Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>