git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit

mds: ensure next replay is queued on req drop

Not all client replay requests are queued at once since [1]. We require
the next request by queued when completed (unsafely) or during cleanup.
Not all code paths seem to handle this [2] so move it to a generic
location, MDCache::request_cleanup. Even so, this doesn't handle all
errors (so we must still be careful) as sometimes we must queue the next
replay request before an MDRequest is constructed [3] during some error
conditions.

Additionally, preserve the behavior of Server::journal_and_reply
queueing the next replay op. Otherwise, must wait for the request to be
durable before moving onto the next one, unnecessarily.

For reproducing, two specific cases are highlighted (thanks to @Mer1997 on
Github for locating these):

- The request is killed by a session close / eviction while a replayed request
  is queued and waiting for a journal flush (e.g. dirty inest locks).

- The request construction fails because the request is already in the
  active_requests. This could happen theoretically if a client resends the same
  request (same reqid) twice.

The first case is most probable but very difficult to reproduce for testing
purposes. The replayed op would need to wait on a journal flush (to be
restarted by C_MDS_RetryRequest).  Then, the request would need killed by a
session close.

[1] ed6a18d90fdd1dc869369fb92c2aad43bc5c9a34
[2] https://github.com/ceph/ceph/blob/a6f1a1c6c09d74f5918c715b05789f34f2ea0e90/src/mds/Server.cc#L2253-L2262
[3] https://github.com/ceph/ceph/blob/a6f1a1c6c09d74f5918c715b05789f34f2ea0e90/src/mds/Server.cc#L2380

Fixes: https://tracker.ceph.com/issues/56577
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 078ecaa42b98f9858d2e3a045aedb51153b39e34)

Conflicts:
src/mds/Server.cc: minor code diff conflict

author	Patrick Donnelly <pdonnell@redhat.com>
	Fri, 15 Jul 2022 20:39:00 +0000 (16:39 -0400)
committer	Patrick Donnelly <pdonnell@redhat.com>
	Fri, 3 Nov 2023 00:16:36 +0000 (20:16 -0400)
commit	dbe3dd2dc78ef69e6fe4fed2382b60a96d3cf6b1
tree	ed2a6b8c0269af2337284a1378e27ed7c1190958	tree \| snapshot
parent	58d2e6c20cdaf5a80524c8a9cb5390bde3c95a38	commit \| diff

src/mds/MDCache.cc		diff \| blob \| history
src/mds/MDSRank.cc		diff \| blob \| history
src/mds/Mutation.h		diff \| blob \| history
src/mds/Server.cc		diff \| blob \| history