osd: Do not remove objects with divergent logs if only partial writes.
Fixes https://tracker.ceph.com/issues/74221
Note: An AI was used to assist generating unit tests for this commit.
The production code was written by the author.
In the scenario we are fixing here, there is a divergent log, which needs to
be rolled back. The non-primary does not participate in the transaction to
the object, but the log exists describing the transaction. The primary has
a different transaction and has correctly detected the divergence.
The primary correctly concludes that no recovery is needed for the object, since
only partial writes exist on the non-primary.
The non-primary observes its divergent log and incorrectly concludes that
recovery IS needed for the divergent write and prepares by removing that
object.
The consequence of this depends on the next operation:
1. A read will fail with -EIO
2. A RMW involving a read from the removed object will detect the failure
and reconstruct the necessary data.
3. A RMW not involve the write or an append will recreate the object, but with
zeros, so will cause data corruption. A
It is unusual for such a log entry to exist on the non-primary because
normally those are omitted from the non-primary log. The scenario that causes
this when a partial write triggers a clone due to copy on write. We now have
a clone operation which affects ALL shards and so the log entry is sent to
all shards.
This is unusual to see in the field. We must have all of the following:
1. A clone operation (these are infrequent)
2. A partial write.
3. A peering cycle must happen before this write is complete.
The combination of 1 and 3 make this a very unusual operation in teuthology
and will be even rarer in the field.
The fix ensures we skip divergent log entries for partial writes that the shard
did not participate in.
ceph osd erasure-code-profile set alex k=2 m=2
ceph osd pool create mypool --pg_num=1 --pool_type=erasure alex
ceph osd pool set mypool allow_ec_overwrites true
ceph osd pool set mypool allow_ec_optimizations true
ceph osd pool set mypool min_size 2