]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
osd/test: Add EC peering test infrastructure and recovery test cases 68697/head
authorAlex Ainscow <aainscow@uk.ibm.com>
Mon, 27 Apr 2026 13:24:45 +0000 (14:24 +0100)
committerAlex Ainscow <aainscow@uk.ibm.com>
Thu, 30 Apr 2026 15:24:25 +0000 (16:24 +0100)
commitd6b2c448857f1e5d1bbbdaa09da5bf7f65878328
tree91c769a7754fdec356821a3b4091f1a8c7989d38
parentcfec561b2da9632ff1f7ce54f505fe0d368c231f
osd/test: Add EC peering test infrastructure and recovery test cases

This commit enhances the EC peering test framework and adds test cases
for erasure-coded pool recovery scenarios:

NOTE: Many of the tests cases are disabled as they recreate certain
problems. Later commits will enable these tests and fix the production
issues, but under different PRs.

Test Infrastructure Improvements:
- Add MockStore wrapper with read error injection capabilities for testing
  error handling in EC recovery
- Enhance ECPeeringTestFixture with recovery callback verification
- Add support for pg_upmap to better simulate OSD placement
- Implement write_attribute() for testing partial vs full stripe writes
- Add read_shard_object_info() to verify on-disk version consistency
- Improve logging with missing object stats (m=, u=, mbc=)
- Add support for doing object recovery in Fast EC.
- Add set_config() helper for runtime configuration changes
- Preserve xinfo features when marking OSDs up/down
- Fix pg_temp handling for EC pools with optimizations

Mock Object Enhancements:
- Update MockPGBackendListener with recovery callback tracking
- Add on_local_recover, on_peer_recover, on_global_recover tracking
- Implement proper stats publishing (pg_stats_publish)
- Add is_missing_object() implementation
- Enhance should_send_op() with async_recovery_target logic
- Add apply_stats() to update PeeringState statistics

Test Cases Added:
- ECRecoveryTest: Verifies recovery with missing objects after OSD failure
- ECSequentialOSDFailoverTest: Tests sequential OSD failure/recovery cycles
- MultiObjectRecoveryReadCrash: Reproduces bug #75432 (multi-object reads)
- RollbackVersionMismatch: Reproduces bug #76213 (version mismatch)
- RollbackAfterMixedBlockedWrites: Reproduces bug #75211 (rollback issues)

These tests validate EC recovery mechanisms including:
- Object version tracking across shards
- Recovery callback invocation (local, peer, global)
- Handling of read errors during recovery
- Rollback behavior after blocked writes
- Multi-object recovery with partial failures

Assisted-by: IBM Bob, using Claude Sonnet
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
src/test/osd/ECPeeringTestFixture.cc
src/test/osd/ECPeeringTestFixture.h
src/test/osd/MockPGBackendListener.h
src/test/osd/MockStore.h [new file with mode: 0644]
src/test/osd/OSDMapTestHelpers.h
src/test/osd/PGBackendTestFixture.cc
src/test/osd/PGBackendTestFixture.h
src/test/osd/TestECFailoverWithPeering.cc