From 155c44bfe068f749dfb186df47d954f402bf903e Mon Sep 17 00:00:00 2001 From: xie xingguo Date: Thu, 12 Mar 2020 09:41:10 +0800 Subject: [PATCH] osd/PeeringState: fix pending want_acting vs osd offline race MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit In general there are two scenarios we might call choose_acting to post a pending want_change change: 1) we are in the middle of peering, and we decide to select some peers other than current acting set in order to continue serving client reads and writes. In this case, when any OSD from the pending want_acting set goes down, primary will restart peering process and tidy want_acting up properly (see PeeringState::Primary::exit()). 2) PG is active, and we want to transit all successfully backfilled (or async-recovered) peers back into up set. In this case, any want_acting member is deemed to be either coming from the current up set or acting set (as we pass restrict_to_up_acting == true when calling down into choose_acting). From 1, we know we'd never leak a want_acting set that might contain stray peers into Active state. From 2, we know that assert would effectively catch any potential bad Active choose_acting callers without setting restrict_to_up_acting properly. However, in 023524a I did introduce a third scenario that might be against rule 2 — we now call choose_acting with restrict_to_up_acting option off on any stray peer coming back to life when PG is active, and if that peer is down (again) and the corresponding pg_temp change is still in-flight, then we would reliably fire the assert. Fix by calling choose_acting again whenever Active sees a new map that marks down an stray osd in want_acting, so we don't leave a dirty want_acting (and pg_temp) there. Fixes: https://tracker.ceph.com/issues/44507 Signed-off-by: xie xingguo --- src/osd/PeeringState.cc | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/osd/PeeringState.cc b/src/osd/PeeringState.cc index 685334cc09abe..4976d520a047a 100644 --- a/src/osd/PeeringState.cc +++ b/src/osd/PeeringState.cc @@ -5575,13 +5575,32 @@ boost::statechart::result PeeringState::Active::react(const AdvMap& advmap) ps->share_pg_info(); } + bool need_acting_change = false; for (size_t i = 0; i < ps->want_acting.size(); i++) { int osd = ps->want_acting[i]; if (!advmap.osdmap->is_up(osd)) { pg_shard_t osd_with_shard(osd, shard_id_t(i)); - ceph_assert(ps->is_acting(osd_with_shard) || ps->is_up(osd_with_shard)); + if (!ps->is_acting(osd_with_shard) && !ps->is_up(osd_with_shard)) { + psdout(10) << "Active stray osd." << osd << " in want_acting is down" + << dendl; + need_acting_change = true; + } } } + if (need_acting_change) { + psdout(10) << "Active need acting change, call choose_acting again" + << dendl; + // possibly because we re-add some strays into the acting set and + // some of them then go down in a subsequent map before we could see + // the map changing the pg temp. + // call choose_acting again to clear them out. + // note that we leave restrict_to_up_acting to false in order to + // not overkill any chosen stray that is still alive. + pg_shard_t auth_log_shard; + bool history_les_bound = false; + ps->remove_down_peer_info(advmap.osdmap); + ps->choose_acting(auth_log_shard, false, &history_les_bound, true); + } /* Check for changes in pool size (if the acting set changed as a result, * this does not matter) */ -- 2.39.5