mds: ensure next replay is queued on req drop
Not all client replay requests are queued at once since [1]. We require
the next request by queued when completed (unsafely) or during cleanup.
Not all code paths seem to handle this [2] so move it to a generic
location, MDCache::request_cleanup. Even so, this doesn't handle all
errors (so we must still be careful) as sometimes we must queue the next
replay request before an MDRequest is constructed [3] during some error
conditions.
Additionally, preserve the behavior of Server::journal_and_reply
queueing the next replay op. Otherwise, must wait for the request to be
durable before moving onto the next one, unnecessarily.
For reproducing, two specific cases are highlighted (thanks to @Mer1997 on
Github for locating these):
- The request is killed by a session close / eviction while a replayed request
is queued and waiting for a journal flush (e.g. dirty inest locks).
- The request construction fails because the request is already in the
active_requests. This could happen theoretically if a client resends the same
request (same reqid) twice.
The first case is most probable but very difficult to reproduce for testing
purposes. The replayed op would need to wait on a journal flush (to be
restarted by C_MDS_RetryRequest). Then, the request would need killed by a
session close.
[1]
ed6a18d90fdd1dc869369fb92c2aad43bc5c9a34
[2] https://github.com/ceph/ceph/blob/
a6f1a1c6c09d74f5918c715b05789f34f2ea0e90/src/mds/Server.cc#L2253-L2262
[3] https://github.com/ceph/ceph/blob/
a6f1a1c6c09d74f5918c715b05789f34f2ea0e90/src/mds/Server.cc#L2380
Fixes: https://tracker.ceph.com/issues/56577
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit
078ecaa42b98f9858d2e3a045aedb51153b39e34)
Conflicts:
src/mds/Server.cc: minor code diff conflict