From e2f815cd17683a108cd093200a8ac52bf92f3ce5 Mon Sep 17 00:00:00 2001 From: Samuel Just Date: Fri, 2 Apr 2021 22:30:54 +0000 Subject: [PATCH] osd/PeeringState: fix acting_set_writeable min_size check acting.size() >= pool.info.min_size is meant to check min_size against acting set participants, but acting is a vector with placeholders. actingset is the representation with placeholders removed. The upshot of this bug is that the activation process will basically ignore min_size for an ec pool allowing writes in cases where it shouldn't. PastIntervals::check_new_interval, however, performs the check correctly, and will therefore discount intervals in which we really did serve writes as not writeable. This can trigger many different problem conditions including but not limited to: - Unfound objects due to accepting a last_update with insufficient osds - Lost writes - Crashes due to peering rules being violated This bug was originally introduced with recovery below min_size in e5a96fd, and then preserved through refactors in 749a13d and 95bec9. 7cb818a exposed it with with expansion of recovery below min_size to include ec pools (acting.size() is sufficient for replicated pools). Fixes: https://tracker.ceph.com/issues/48613 Fixes: https://tracker.ceph.com/issues/48417 Signed-off-by: Samuel Just (cherry picked from commit 642a1c165499bcbd4cfdf907af313ac7ffe44ff4) --- src/osd/PeeringState.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/osd/PeeringState.h b/src/osd/PeeringState.h index 07b165ca6c567..2a000fb965a86 100644 --- a/src/osd/PeeringState.h +++ b/src/osd/PeeringState.h @@ -2313,7 +2313,7 @@ public: * applicable stretch cluster constraints. */ bool acting_set_writeable() { - return (acting.size() >= pool.info.min_size) && + return (actingset.size() >= pool.info.min_size) && (pool.info.stretch_set_can_peer(acting, *get_osdmap(), NULL)); } -- 2.39.5