When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting set.
Since we don't allow any want that is larger than the pool size,
a pg must transit into UNDERSIZED when asynchronous recovery
eventually happens.
However, if that pg has one or more backfill targets, it might
spin UNDERSIZED for a long time during which mon will keep issuing
"PG_AVAILABILITY" warns until all backfill targets finally completes.
Fix by calling choose_acting to get any async_recovery_targets back
into acting before we continue to backfill.
Fixes: https://tracker.ceph.com/issues/43311
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit
48bc4786fd73b538b673e6a7eb7ced986d95005f)
Conflicts:
src/osd/PeeringState.cc
- file does not exist in nautilus; backported the code change manually
to src/osd/PG.cc
pg->state_clear(PG_STATE_FORCED_RECOVERY);
release_reservations();
pg->osd->local_reserver.cancel_reservation(pg->info.pgid);
- // XXX: Is this needed?
pg->publish_stats_to_osd();
+ // transit any async_recovery_targets back into acting
+ // so pg won't have to stay undersized for long
+ // as backfill might take a long time to complete..
+ if (!pg->async_recovery_targets.empty()) {
+ pg_shard_t auth_log_shard;
+ bool history_les_bound = false;
+ pg->choose_acting(auth_log_shard, true, &history_les_bound);
+ }
return transit<WaitLocalBackfillReserved>();
}