From: Leonid Usov Date: Mon, 8 Apr 2024 11:35:02 +0000 (+0300) Subject: mds/quiesce: xlock the file to let clients keep their buffered writes X-Git-Tag: testing/wip-pdonnell-testing-20240416.232211-debug~6^2~1 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=8ac98428769cf45a8d43431ad0fbefe8cb953f06;p=ceph-ci.git mds/quiesce: xlock the file to let clients keep their buffered writes With the quiesce protocol taking a `rdlock` on the file, it also revokes the `Fb` capability, which the clients can't release until they are done flushing, and that may take up arbitrarily long, evidently, more than 10 minutes. We went for the rdlock to avoid affecting readonly clients, but given the evidence above we should not optimize for those. Ideally, we’d like to have a QUIESCE file lock mode where both rd and buffer are allowed, but as of now it seems like our best available option is to `xlock` the file which will let the writing clients keep their buffers for the duration of the quiesce. We can only afford this change for a `splitauth` config, i.e. where we drop the lock immediately after all `Fw`s are revoked Signed-off-by: Leonid Usov --- diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc index 756bfd3c3eb..d0fc2aabc3e 100644 --- a/src/mds/MDCache.cc +++ b/src/mds/MDCache.cc @@ -13620,10 +13620,32 @@ void MDCache::dispatch_quiesce_inode(const MDRequestRef& mdr) if (!(mdr->locking_state & MutationImpl::ALL_LOCKED)) { MutationImpl::LockOpVec lov; + + lov.add_xlock(&in->quiescelock); /* !! */ + + if (splitauth) { + // xlock the file to let the Fb clients stay with buffered writes. + // While this will unnecesarily revoke rd caps, it's not as + // big of an overhead compared to having the Fb clients flush + // their buffers, which evidently can lead to the quiesce timeout + // We'll drop the lock after all clients conform to this request + // so the file will be still readable during the quiesce after + // the interested clients receive their Fr back + // + // NB: this will also wrlock the versionlock + lov.add_xlock(&in->filelock); + } else { + // if splitauth == false then we won't drop the lock after acquisition (see below) + // we can't afford keeping it as xlock for a long time, so we'll have to deal + // with the potential quiesce timeout on high-load systems. + // The reason we're OK with this is that splitauth is enabled by default, + // and really should not be ever disabled outside of the test setups + // TODO: consider removing the `splitauth` config option completely. + lov.add_rdlock(&in->filelock); + } + // The rest of caps-related locks - rdlock to revoke write caps lov.add_rdlock(&in->authlock); - lov.add_rdlock(&in->filelock); lov.add_rdlock(&in->linklock); - lov.add_xlock(&in->quiescelock); /* !! */ lov.add_rdlock(&in->xattrlock); if (!mds->locker->acquire_locks(mdr, lov, nullptr, {in}, false, true)) { return; @@ -13649,6 +13671,10 @@ void MDCache::dispatch_quiesce_inode(const MDRequestRef& mdr) */ mds->locker->drop_lock(mdr.get(), &in->authlock); mds->locker->drop_lock(mdr.get(), &in->filelock); + // versionlock will be taken automatically for the file xlock. + // We don't really need it, but it doesn't make sense to + // change the Locker logic just for this flow + mds->locker->drop_lock(mdr.get(), &in->versionlock); mds->locker->drop_lock(mdr.get(), &in->linklock); mds->locker->drop_lock(mdr.get(), &in->xattrlock); }