src/osd/scrubber: process MapsCompared event syncronously
Summary:
WaitReplica submits two asyncronous events through the OSD scheduler
concurrently: DigestUpdate and MapsCompared. This causes the error
described in https://tracker.ceph.com/issues/59049 because MapsCompared
must be processed first, but they may be scheduled with differing
priorities depending on whether there are writes blocked. This patch
avoids the problem by simply processing MapsCompared syncronously.
Details:
Within the WaitReplicas state handler for GotReplicas, we invoke
PGScrubber::maps_compare_n_cleanup:
sc::result WaitReplicas::react(const GotReplicas&)
{
...
// maps_compare_n_cleanup() will arrange for MapsCompared event to be
// sent:
scrbr->maps_compare_n_cleanup();
return discard_event();
void PgScrubber::maps_compare_n_cleanup()
{
...
auto required_fixes =
m_be->scrub_compare_maps(
m_end.is_max(),
m_listener->sl_get_snap_map_reader());
...
m_osds->queue_scrub_maps_compared(m_pg, Scrub::scrub_prio_t::low_priority);
PgScrubber::maps_compare_n_cleanup arranges for the MapsCompared event
to be submitted through the OSD schdeduler to be delivered to the scrub
state machine later. It also invokes scrub_compare_maps
objs_fix_list_t ScrubBackend::scrub_compare_maps(
bool max_reached,
SnapMapReaderI& snaps_getter)
{
...
scrub_snapshot_metadata(for_meta_scrub);
which in turn invokes scrub_snapshot_metadata
void ScrubBackend::scrub_snapshot_metadata(ScrubMap& map)
{
...
m_scrubber.submit_digest_fixes(this_chunk->missing_digest);
which in turn invokes submit_digest_fixes
void PrimaryLogScrub::submit_digest_fixes(const digests_fixes_t& fixes)
{
...
for (auto& [obj, dgs] : fixes) {
...
PrimaryLogPG::OpContextUPtr ctx = m_pl_pg->simple_opc_create(obc);
...
m_pl_pg->finish_ctx(ctx.get(), pg_log_entry_t::MODIFY);
...
ctx->register_on_success([this]() {
if ((num_digest_updates_pending >= 1) &&
(--num_digest_updates_pending == 0)) {
m_osds->queue_scrub_digest_update(m_pl_pg,
m_pl_pg->is_scrub_blocking_ops());
}
});
m_pl_pg->simple_opc_submit(std::move(ctx));
}
}
which submits a sequence of repops updating objects' digests and
schduling the DigestUpdate event once they have completed.
These two events, DigestUpdate and MapsCompared, are therefore in flight
concurrently. However, only one event ordering is actually tolerated --
we must process MapsCompared first in order to be in state
WaitDigestUpdate when the DigestUpdate event arrives. This would be
fine except that the priority at which we submit the DigestUpdate event
may end up with a higher priority if there is currently an op blocked.
There are two ways we could solve this. We could ensure that the two
are always queued at the same priority and therefore will be processed
in order. However, there's actually no reason to process the
MapsCompared event asyncronously -- it would be far less brittle to
simply process the MapsCompared syncronously and avoid this problem
entirely.
Fixes: https://tracker.ceph.com/issues/59049
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit
604860e54098b20ea65785dbcf7553dd50b22532)