From: xie xingguo Date: Thu, 12 Dec 2019 06:01:45 +0000 (+0800) Subject: osd/PeeringState: transit async_recovery_targets back into acting before backfilling X-Git-Tag: v14.2.10~162^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=32d157acbbd9e3113270468e3e4ead059b99b8a0;p=ceph.git osd/PeeringState: transit async_recovery_targets back into acting before backfilling When an osd that is part of current up set gets chosen as an async_recovery_target, it gets removed from the acting set. Since we don't allow any want that is larger than the pool size, a pg must transit into UNDERSIZED when asynchronous recovery eventually happens. However, if that pg has one or more backfill targets, it might spin UNDERSIZED for a long time during which mon will keep issuing "PG_AVAILABILITY" warns until all backfill targets finally completes. Fix by calling choose_acting to get any async_recovery_targets back into acting before we continue to backfill. Fixes: https://tracker.ceph.com/issues/43311 Signed-off-by: xie xingguo (cherry picked from commit 48bc4786fd73b538b673e6a7eb7ced986d95005f) Conflicts: src/osd/PeeringState.cc - file does not exist in nautilus; backported the code change manually to src/osd/PG.cc --- diff --git a/src/osd/PG.cc b/src/osd/PG.cc index 780535243a1..d421fc51230 100644 --- a/src/osd/PG.cc +++ b/src/osd/PG.cc @@ -8354,8 +8354,15 @@ PG::RecoveryState::Recovering::react(const RequestBackfill &evt) pg->state_clear(PG_STATE_FORCED_RECOVERY); release_reservations(); pg->osd->local_reserver.cancel_reservation(pg->info.pgid); - // XXX: Is this needed? pg->publish_stats_to_osd(); + // transit any async_recovery_targets back into acting + // so pg won't have to stay undersized for long + // as backfill might take a long time to complete.. + if (!pg->async_recovery_targets.empty()) { + pg_shard_t auth_log_shard; + bool history_les_bound = false; + pg->choose_acting(auth_log_shard, true, &history_les_bound); + } return transit(); }