These flags were not getting cleared except in recheck_readable(), which
meant that a flag from a prior interval could bleed into a new interval.
More dangerously, in a mixed-version cluster, one interval might include
all octopus+ OSDs while the next might include a pre-octopus OSD, bypassing
most of the laggy recheck code. This could lead to a stalled request
and/or requeue ordering bug when release_object_locks() looked at
is_laggy() and put a lock waiter on the waiting_for_readable list.
Fixes: https://tracker.ceph.com/issues/42978
Signed-off-by: Sage Weil <sage@redhat.com>
DECLARE_LOCALS;
utime_t dur = ceph_clock_now() - enter_time;
pl->get_peering_perf().tinc(rs_started_latency, dur);
+ ps->state_clear(PG_STATE_WAIT | PG_STATE_LAGGY);
}
/*--------Reset---------*/