From: xie xingguo Date: Sat, 13 Jun 2020 07:28:31 +0000 (+0800) Subject: osd/PeeringState: fix history.same_interval_since of merge target again X-Git-Tag: v14.2.11~16^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=5d9231eada9fe72506a8adffdd3df44fa2953b99;p=ceph.git osd/PeeringState: fix history.same_interval_since of merge target again The symptom looks much like we see in https://tracker.ceph.com/issues/37654. The root cause is that both merge source and target could be fabricated PGs (aka placeholders), hence merge target's same_interval_since could remain 0 after merge. Fix by adjusting history.same_interval_since to last_epoch_clean reported by these PGs were found to be ready for merge. This peer is going to be ignored/purged by primary anyway later when peering is done. Fixes: https://tracker.ceph.com/issues/45991 Signed-off-by: xie xingguo (cherry picked from commit be5ea3a01f31b4893a823e971f452f3ccf9de001) Conflicts: src/osd/PeeringState.cc - file does not exist in nautilus; made the change manually in PG.cc instead --- diff --git a/src/osd/PG.cc b/src/osd/PG.cc index d421fc51230..fade231d1c8 100644 --- a/src/osd/PG.cc +++ b/src/osd/PG.cc @@ -2909,6 +2909,20 @@ void PG::merge_from(map& sources, RecoveryCtx *rctx, << sources.begin()->second->info.history << dendl; + // above we have pulled down source's history and we need to check + // history.epoch_created again to confirm that source is not a placeholder + // too. (peering requires a sane history.same_interval_since value for any + // non-newly created pg and below here we know we are basically iterating + // back a series of past maps to fake a merge process, hence we need to + // fix history.same_interval_since first so that start_peering_interval() + // will not complain) + if (info.history.epoch_created == 0) { + dout(10) << __func__ << " both merge target and source are placeholders," + << " set sis to lec " << info.history.last_epoch_clean + << dendl; + info.history.same_interval_since = info.history.last_epoch_clean; + } + // if the past_intervals start is later than last_epoch_clean, it // implies the source repeered again but the target didn't, or // that the source became clean in a later epoch than the target.