From 155c44bfe068f749dfb186df47d954f402bf903e Mon Sep 17 00:00:00 2001
From: xie xingguo <xie.xingguo@zte.com.cn>
Date: Thu, 12 Mar 2020 09:41:10 +0800
Subject: [PATCH] osd/PeeringState: fix pending want_acting vs osd offline race
MIME-Version: 1.0
Content-Type: text/plain; charset=utf8
Content-Transfer-Encoding: 8bit

In general there are two scenarios we might call choose_acting to
post a pending want_change change:

1) we are in the middle of peering, and we decide to select some
  peers other than current acting set in order to continue serving
  client reads and writes.
  In this case, when any OSD from the pending want_acting set goes down,
  primary will restart peering process and tidy want_acting up properly
  (see PeeringState::Primary::exit()).

2) PG is active, and we want to transit all successfully backfilled
  (or async-recovered) peers back into up set.
  In this case, any want_acting member is deemed to be either coming from
  the current up set or acting set (as we pass restrict_to_up_acting == true
  when calling down into choose_acting).

From 1, we know we'd never leak a want_acting set that might
contain stray peers into Active state. From 2, we know that assert would
effectively catch any potential bad Active choose_acting callers without
setting restrict_to_up_acting properly.

However, in 023524a I did introduce a third scenario that might be
against rule 2 â we now call choose_acting with restrict_to_up_acting
option off on any stray peer coming back to life when PG is active,
and if that peer is down (again) and the corresponding pg_temp change
is still in-flight, then we would reliably fire the assert.

Fix by calling choose_acting again whenever Active sees a new map
that marks down an stray osd in want_acting, so we don't leave
a dirty want_acting (and pg_temp) there.

Fixes: https://tracker.ceph.com/issues/44507
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
---
 src/osd/PeeringState.cc | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/osd/PeeringState.cc b/src/osd/PeeringState.cc
index 685334cc09abe..4976d520a047a 100644
--- a/src/osd/PeeringState.cc
+++ b/src/osd/PeeringState.cc
@@ -5575,13 +5575,32 @@ boost::statechart::result PeeringState::Active::react(const AdvMap& advmap)
     ps->share_pg_info();
   }
 
+  bool need_acting_change = false;
   for (size_t i = 0; i < ps->want_acting.size(); i++) {
     int osd = ps->want_acting[i];
     if (!advmap.osdmap->is_up(osd)) {
       pg_shard_t osd_with_shard(osd, shard_id_t(i));
-      ceph_assert(ps->is_acting(osd_with_shard) || ps->is_up(osd_with_shard));
+      if (!ps->is_acting(osd_with_shard) && !ps->is_up(osd_with_shard)) {
+        psdout(10) << "Active stray osd." << osd << " in want_acting is down"
+                   << dendl;
+        need_acting_change = true;
+      }
     }
   }
+  if (need_acting_change) {
+     psdout(10) << "Active need acting change, call choose_acting again"
+                << dendl;
+    // possibly because we re-add some strays into the acting set and
+    // some of them then go down in a subsequent map before we could see
+    // the map changing the pg temp.
+    // call choose_acting again to clear them out.
+    // note that we leave restrict_to_up_acting to false in order to
+    // not overkill any chosen stray that is still alive.
+    pg_shard_t auth_log_shard;
+    bool history_les_bound = false;
+    ps->remove_down_peer_info(advmap.osdmap);
+    ps->choose_acting(auth_log_shard, false, &history_les_bound, true);
+  }
 
   /* Check for changes in pool size (if the acting set changed as a result,
    * this does not matter) */
-- 
2.39.5