]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
rgw/rest: track connection failures per-IP instead of per-endpoint
authorOguzhan Ozmen <oozmen@bloomberg.net>
Tue, 3 Mar 2026 00:45:59 +0000 (00:45 +0000)
committerOguzhan Ozmen <oozmen@bloomberg.net>
Tue, 2 Jun 2026 22:16:20 +0000 (22:16 +0000)
commit2d7bc7818c5976702cbddcb228f6ac13dcee8697
tree89e276aa81f99dcd0ba9bc48585577121e35914b
parentafab77cae3a87bb7bfde1e08a3be4f562d189638
rgw/rest: track connection failures per-IP instead of per-endpoint

Previously, when a connection to a zone endpoint failed, the entire
endpoint was marked as unavailable for a timeout period. Since we now
resolve endpoints to all their IP addresses (via DNS A/AAAA records),
we can be more granular: track failures at the individual IP level.

Introduce ResolvedIP struct that pairs each IP's connect_to string
with its own failure timestamp. When selecting an IP for a request,
round-robin skips IPs that have recently failed, allowing traffic to
continue flowing to healthy nodes even when some are down.

An endpoint-level last_failure_time is maintained as a fast-path
optimization to avoid scanning all IPs when none have failed recently.

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
src/rgw/rgw_rest_conn.cc
src/rgw/rgw_rest_conn.h