]> git.apps.os.sepia.ceph.com Git - ceph.git/commitdiff
osd/PeeringState: clear LAGGY and WAIT states on exiting Started 31864/head
authorSage Weil <sage@redhat.com>
Mon, 25 Nov 2019 19:15:24 +0000 (13:15 -0600)
committerSage Weil <sage@redhat.com>
Mon, 25 Nov 2019 19:15:24 +0000 (13:15 -0600)
These flags were not getting cleared except in recheck_readable(), which
meant that a flag from a prior interval could bleed into a new interval.
More dangerously, in a mixed-version cluster, one interval might include
all octopus+ OSDs while the next might include a pre-octopus OSD, bypassing
most of the laggy recheck code.  This could lead to a stalled request
and/or requeue ordering bug when release_object_locks() looked at
is_laggy() and put a lock waiter on the waiting_for_readable list.

Fixes: https://tracker.ceph.com/issues/42978
Signed-off-by: Sage Weil <sage@redhat.com>
src/osd/PeeringState.cc

index a9b5b4fde41d974254a3a60cb611c03c31ee543f..a326dc28a4e3471cf66083c80e3c704030961fb4 100644 (file)
@@ -4301,6 +4301,7 @@ void PeeringState::Started::exit()
   DECLARE_LOCALS;
   utime_t dur = ceph_clock_now() - enter_time;
   pl->get_peering_perf().tinc(rs_started_latency, dur);
+  ps->state_clear(PG_STATE_WAIT | PG_STATE_LAGGY);
 }
 
 /*--------Reset---------*/