From fdfc5c64e8ada580792e4c02cb3ba44e0b265c17 Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Fri, 6 Apr 2018 10:26:10 -0500 Subject: [PATCH] mon/OSDMonitor: start sending new-style pg_create2 messages The new sharded wq implementation cannot handle a resent mon create message and a split child already existing. This a side effect of the new pg create path instantiating the PG at the pool create epoch osdmap and letting it roll forward through splits; the mon may be resending a create for a pg that was already created elsewhere and split elsewhere, such that one of those split children has peered back onto this same OSD. When we roll forward our re-created empty parent it may split and find the child already exists, crashing. This is no longer a concern because the mgr-based controller for pg_num will not split PGs until after the initial PGs are all created. (We know this because the pool has the CREATED flag set.) The old-style path had it's own problem http://tracker.ceph.com/issues/22165. We would build the history and instantiate the pg in the latest osdmap epoch, ignoring any split children that should have been created between teh pool create epoch and the current epoch. Since we're now taking the new path, that is no longer a problem. Fixes: http://tracker.ceph.com/issues/22165 Signed-off-by: Sage Weil --- src/mon/OSDMonitor.cc | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc index 0ba4dc8e42e23..d30b5078987a3 100644 --- a/src/mon/OSDMonitor.cc +++ b/src/mon/OSDMonitor.cc @@ -4178,13 +4178,7 @@ epoch_t OSDMonitor::send_pg_creates(int osd, Connection *con, epoch_t next) cons MOSDPGCreate *oldm = nullptr; // for pre-mimic OSD compat MOSDPGCreate2 *m = nullptr; - // for now, keep sending legacy creates. Until we sort out how to address - // racing mon create resends and splits, we are better off with the less - // drastic impacts of http://tracker.ceph.com/issues/22165. The legacy - // create message handling path in the OSD still does the old thing where - // the pg history is pregenerated and it's instantiated at the latest osdmap - // epoch; child pgs are simply not created. - bool old = true; // !HAVE_FEATURE(con->get_features(), SERVER_NAUTILUS); + bool old = osdmap.require_osd_release < CEPH_RELEASE_NAUTILUS; epoch_t last = 0; for (auto epoch_pgs = creating_pgs_by_epoch->second.lower_bound(next); -- 2.39.5