PGLog::reset_complete_to is not handling the scenario where all the
missing objects have a partial write that excludes updating the shard being
recovered as their most recent update. In this scenario the oldest need
is newer than newest log entry. Setting last_compelte to the head of the
log confuses code and makes it think that recovery has completed.
The fix is to hold last_complete one entry behind the head of the log
until all missing objects have been recovered.
PGLog::recover_got already does this when an object is recovered and the
remaining objects to recover match this scenario, so this fix just makes
reset_complete_to behave the same way as recover_got.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
// partial writes allow a shard which did not participate in a write to
// have a missing version that is newer that the most recent log entry
if (ec_optimizations_enabled && (log.complete_to == log.log.end())) {
+ // keep complete_to one entry behind the end of the log to stop
+ // code incorrectly using it to deduce that recovery has completed
+ --log.complete_to;
break;
}
ceph_assert(log.complete_to != log.log.end());