]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
osd/PeeringState: transit async_recovery_targets back into acting before backfilling 32202/head
authorxie xingguo <xie.xingguo@zte.com.cn>
Thu, 12 Dec 2019 06:01:45 +0000 (14:01 +0800)
committerxie xingguo <xie.xingguo@zte.com.cn>
Sat, 14 Dec 2019 00:27:11 +0000 (08:27 +0800)
When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting set.
Since we don't allow any want that is larger than the pool size,
a pg must transit into UNDERSIZED when asynchronous recovery
eventually happens.
However, if that pg has one or more backfill targets, it might
spin UNDERSIZED for a long time during which mon will keep issuing
"PG_AVAILABILITY" warns until all backfill targets finally completes.

Fix by calling choose_acting to get any async_recovery_targets back
into acting before we continue to backfill.

Fixes: https://tracker.ceph.com/issues/43311
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
src/osd/PeeringState.cc

index 99534bcc62018a42d468e4c5106b6f88b60fcd4a..10a8d72561b111d4245845d4566f6739e1a19d66 100644 (file)
@@ -5347,7 +5347,14 @@ PeeringState::Recovering::react(const RequestBackfill &evt)
   ps->state_clear(PG_STATE_FORCED_RECOVERY);
   pl->cancel_local_background_io_reservation();
   pl->publish_stats_to_osd();
-  // XXX: Is this needed?
+  // transit any async_recovery_targets back into acting
+  // so pg won't have to stay undersized for long
+  // as backfill might take a long time to complete..
+  if (!ps->async_recovery_targets.empty()) {
+    pg_shard_t auth_log_shard;
+    bool history_les_bound = false;
+    ps->choose_acting(auth_log_shard, true, &history_les_bound);
+  }
   return transit<WaitLocalBackfillReserved>();
 }