git-server-git.apps.pok.os.sepia.ceph.com Git

osd/PGLog: persist num_objects_missing for replicas when peering is done

guoracle reported that:

> In the asynchronous recovery feature, the asynchronous recovery
> target OSD is selected by last_updata.version, so that after the
> peering is completed, the asynchronous recovery target OSDs update
> the last_update.version, and then go down again, when the asynchronous
> recovery target OSDs is back online, when peering,there is no pglog
> difference between the asynchronous recovery targets and the
> authoritative OSD, resulting in no asynchronous recovery.

https://github.com/ceph/ceph/pull/24004 aimed to solve the problem by
persisting the number of missing objects into the disk when peering was
done, and then we could take both new approximate missing objects
(estimated according to last_update) and historical num_objects_missing
into account when determining async_recovery_targets on any new follow-up
peering cycles.
However, the above comment stands only if we could keep an up-to-date
num_objects_missing field for each pg instance under any circumstances,
which is unfortunately not true for replicas which have completed peering
but never started recovery later (7de35629f562436d2bdb85788bdf97b10db3f556
make sure we'll update num_objects_missing for primary when peering is done,
and will keep num_objects_missing up-to-update when each missing object
is recovered).

Note that guoracle also suggests to fix the same problem by using
last_complete.version to calculate the pglog difference and update the
last_complete of the asynchronous recovery target OSD in the copy of peer_info
to the latest after the recovery is complete, which should not work well
because we might reset last_complete to 0'0 whenever we trim pglog past the
minimal need-version of missing set.

Fix by persisting num_objects_missing for replicas correctly when peering
is done.

Fixes: https://tracker.ceph.com/issues/41924
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

author	xie xingguo <xie.xingguo@zte.com.cn>
	Thu, 19 Sep 2019 01:48:56 +0000 (09:48 +0800)
committer	xie xingguo <xie.xingguo@zte.com.cn>
	Fri, 20 Sep 2019 00:09:28 +0000 (08:09 +0800)
commit	3b024c54e590e68e2cf35d20cbc5ee3e1122dedd
tree	f828ae55828f8070eeebd21008e371b86423f314	tree \| snapshot
parent	7ea365920bc88ec53b32bf6d9319edf5dc1dac95	commit \| diff