git-server-git.apps.pok.os.sepia.ceph.com Git

author	Casey Bodley <cbodley@redhat.com>
	Tue, 13 Jul 2021 17:54:36 +0000 (13:54 -0400)
committer	Cory Snyder <csnyder@iland.com>
	Wed, 4 Aug 2021 16:38:25 +0000 (12:38 -0400)
commit	ed481ece74610a94a2cfc2280232e3daf82a0349
tree	75ac9e57a98dc916cd0efaf751b93827513cd899	tree \| snapshot
parent	e3a73eb0dfafe15fd5f69c971b3a588008fec02e	commit \| diff

rgw: metadata sync treats all errors as 'transient'

collect_children() had a special case for EAGAIN that it treated as
a 'transient' error, which set the can_adjust_marker = false to bail out
of RGWMetaSyncShardCR and retry from the previous marker

but the http client doesn't return EAGAIN - rgw_http_error_to_errno()
defaults to EIO - so this retry logic based on can_adjust_marker never
runs. on any other error, RGWMetaSyncSingleEntryCR would not call
marker_tracker->finish() to advance the sync status marker, and
RGWMetaSyncShardCR would continue on with full- or incremental sync
without ever attempting to retry the failed entries

a detailed comment in collect_children() describes a different strategy
for handling 'permanent' errors, but that was never fully elaborated.
i also don't think there's a reasonable way to differentiate between
transient and permanent errors, so this treats all errors as transient
to be retried

if an error really is permanent for a given metadata key, metadata sync
will get stuck there and require manual intervention

Fixes: https://tracker.ceph.com/issues/39657
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 866d66b8749b28ec626a8d0adba3d14fdd8abead)