The new sharded wq implementation cannot handle a resent mon create
message and a split child already existing. This a side effect of the
new pg create path instantiating the PG at the pool create epoch osdmap
and letting it roll forward through splits; the mon may be resending a
create for a pg that was already created elsewhere and split elsewhere,
such that one of those split children has peered back onto this same OSD.
When we roll forward our re-created empty parent it may split and find the
child already exists, crashing.
This is no longer a concern because the mgr-based controller for pg_num
will not split PGs until after the initial PGs are all created. (We
know this because the pool has the CREATED flag set.)
The old-style path had it's own problem
http://tracker.ceph.com/issues/22165. We would build the history and
instantiate the pg in the latest osdmap epoch, ignoring any split children
that should have been created between teh pool create epoch and the
current epoch. Since we're now taking the new path, that is no longer
a problem.
Fixes: http://tracker.ceph.com/issues/22165
Signed-off-by: Sage Weil <sage@redhat.com>
MOSDPGCreate *oldm = nullptr; // for pre-mimic OSD compat
MOSDPGCreate2 *m = nullptr;
- // for now, keep sending legacy creates. Until we sort out how to address
- // racing mon create resends and splits, we are better off with the less
- // drastic impacts of http://tracker.ceph.com/issues/22165. The legacy
- // create message handling path in the OSD still does the old thing where
- // the pg history is pregenerated and it's instantiated at the latest osdmap
- // epoch; child pgs are simply not created.
- bool old = true; // !HAVE_FEATURE(con->get_features(), SERVER_NAUTILUS);
+ bool old = osdmap.require_osd_release < CEPH_RELEASE_NAUTILUS;
epoch_t last = 0;
for (auto epoch_pgs = creating_pgs_by_epoch->second.lower_bound(next);