ceph: add client reset state machine and session teardown
Add the client-side reset state machine, request gating, and manual
session teardown implementation.
Manual reset is an operator-triggered escape hatch for client/MDS
stalemates in which caps, locks, or unsafe metadata state stop making
forward progress. The reset blocks new metadata work, attempts a
bounded best-effort drain of dirty client state while sessions are
still alive, and finally asks the MDS to close sessions before tearing
local session state down directly.
The reset state machine tracks four phases: IDLE -> QUIESCING ->
DRAINING -> TEARDOWN -> IDLE. QUIESCING is set synchronously by
schedule_reset() before the workqueue item is dispatched, so that new
metadata requests and file-lock acquisitions are gated immediately --
even before the work function begins running. All non-IDLE phases
block callers on blocked_wq, preventing races with session teardown.
The drain phase flushes mdlog state, dirty caps, and pending cap
releases for a bounded interval. State that still cannot make progress
within that interval is discarded during teardown, which is the point
of the reset: break the stalemate and allow fresh sessions to rebuild
clean state.
The session teardown follows the established check_new_map()
forced-close pattern: unregister sessions under mdsc->mutex, then clean
up caps and requests under s->s_mutex. Reconnect is not attempted
because the MDS only accepts reconnects during its own RECONNECT phase
after restart, not from an active client.
Blocked callers are released when reset completes and observe the final
result via -EAGAIN (reset failed) or 0 (success). Internal work-function
errors such as -ENOMEM are not propagated to unrelated callers like
open() or flock(); the detailed error remains in debugfs and
tracepoints.
The work function checks st->shutdown before each phase transition
(DRAINING, TEARDOWN) so that a concurrent ceph_mdsc_destroy() is not
overwritten. If destroy already took ownership, the work function
releases session references and returns without touching the state.
The timeout calculation for blocked-request waiters uses max_t() to
prevent jiffies underflow when the deadline has already passed.
The close-grace sleep before teardown is a best-effort nudge to let
queued REQUEST_CLOSE messages egress; it is not a correctness
requirement since the MDS still has session_autoclose as a fallback.
The destroy path marks reset as failed and wakes blocked waiters before
cancel_work_sync() so unmount does not stall.