osd: Optimized EC incorrectly rolled backwards write
A bug in choose_acting in this scenario:
* Current primary shard has been absent so has missed the latest few writes
* All the recent writes are partial writes that have not updated shard X
* All the recent writes have completed
The authorative shard is chosen from the set of primary-capable shards
that have the highest last epoch started, these have all got log entries
for the recent writes.
The get log shard is chosen from the set of shards that have the highest
last epoch started, this chooses shard X because its furthest behind
The primary shard last update is not less than get log shard last
update so this if statement decides that it has a good enough log:
if ((repeat_getlog != nullptr) &&
get_log_shard != all_info.end() &&
(info.last_update < get_log_shard->second.last_update) &&
pool.info.is_nonprimary_shard(get_log_shard->first.shard)) {
We then proceed through peering using the primary log and the
log from shard X. Neither have details about the recent writes
which are then incorrectly rolled back.
The if statement should be looking at last_update for the
authorative shard rather than the get_log_shard, the code
would then realize that it needs to get the log from the
authorative shard first and then have a second pass
where it gets the log from the get log shard.
Peering would then have information about the partial writes
(obtained from the authorative shards log) and could correctly
roll these writes forward by deducing that the get_log_shard
didn't have these log entries because they were partial writes.
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit
ac4e0926bbac4ee4d8e33110b8a434495d730770)