rgw: metadata sync treats all errors as 'transient'
collect_children() had a special case for EAGAIN that it treated as
a 'transient' error, which set the can_adjust_marker = false to bail out
of RGWMetaSyncShardCR and retry from the previous marker
but the http client doesn't return EAGAIN - rgw_http_error_to_errno()
defaults to EIO - so this retry logic based on can_adjust_marker never
runs. on any other error, RGWMetaSyncSingleEntryCR would not call
marker_tracker->finish() to advance the sync status marker, and
RGWMetaSyncShardCR would continue on with full- or incremental sync
without ever attempting to retry the failed entries
a detailed comment in collect_children() describes a different strategy
for handling 'permanent' errors, but that was never fully elaborated.
i also don't think there's a reasonable way to differentiate between
transient and permanent errors, so this treats all errors as transient
to be retried
if an error really is permanent for a given metadata key, metadata sync
will get stuck there and require manual intervention
Fixes: https://tracker.ceph.com/issues/39657
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit
866d66b8749b28ec626a8d0adba3d14fdd8abead)