Lucian Petrut [Fri, 24 May 2024 10:03:11 +0000 (10:03 +0000)]
rbd-wnbd: wait for the disk cleanup to complete
The WNBD disk removal workflow is asynchronous, which is why we'll
need to wait for the cleanup to complete when stopping the service.
The "disconnect_all_mappings" function is moved to
RbdMappingDispatcher::stop, allowing us to access the mapping list
more easily and reject new mappings after a stop has been requested.
Patrick Donnelly [Thu, 30 May 2024 13:08:04 +0000 (09:08 -0400)]
Merge PR #57730 into squid
* refs/pull/57730/head:
squid: mds: remove unnecssary quiesce finisher variable
squid: mds: attach quiesce_path mdr to finisher at creation not dispatch
squid: mds/quiesce: disable quiesce root debug parameters by default
squid: mds/quiesce-agt: never send a synchronous ack
squid: mds/quiesce-agt: add test for a rapid async ack
squid: mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
squid: qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
squid: mds: add `--lifetime` parameter to the `lock path` asok command
squid: mds/quiesce: accept a regular file as the quiesce root
squid: mds: command_quiesce_path: rename `--wait` to `--await` for consistency
squid: mds: command_quiesce_path: do not block the asok thread and return an adequate rc
squid: mds/quiesce: drop remote authpins before waiting for the quiesce lock
squid: qa/cephfs/test_quiesce: test proper handling of remote authpins
squid: mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
squid: mds: enhance the `lock path` asok command
squid: mds/quiesce: overdrive fragmenting that's still freezing
squid: revert: mds: provide a mechanism to authpin while freezing
squid: qa/cephfs/test_quiesce: enhance the fragmentation test
squid: mds/queisce-db: collect acks while bootstrapping
squid: mds/quiesce-db: optimize peer updates
squid: mds/quiesce-db: track db epoch separately from the membership epoch
squid: mds/quiesce-db: test that a peer on a newer membership epoch can ack a root
squid: mds: don't stall the asok thread for flush commands
squid: qa/quiescer: relax some timing requirements in the quiescer
squid: qa/tasks/quiescer: dump ops in parallel
squid: qa/suites/fs: add quiescer to the fs suite
squid: qa/tasks: the quiescer task and a waiter task to test it
squid: qa/tasks/cephfs: don't create a new CephManager if there is one in the context
squid: qa/tasks: vstart_runner: introduce --config-mode
squid: qa/tasks: introduce ThrasherGreenlet
squid: qa: update quiesce tests to expect ipolicy lock
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
Leonid Usov [Mon, 20 May 2024 16:17:04 +0000 (19:17 +0300)]
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
Just like with the fragmenting, we should abort an ongoing export
if a quiesce is attempted for the directory.
To minimize the stress for the system, we only allow the abort
if the export hasn't yet managed to freeze the tree. If that is the case,
then quiesce will have to wait for the export to finish.
Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit da5c263b8e7797eac6c9d13d5b6a6b292d9c5def) Fixes: https://tracker.ceph.com/issues/66259
Leonid Usov [Mon, 20 May 2024 22:03:15 +0000 (01:03 +0300)]
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks
We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.
Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.
If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.
Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit b1cb6d985622c6164d99d3fd79b6eeaf6530894c) Fixes: https://tracker.ceph.com/issues/66258
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
squid: mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 3552fc5a9ea17c173a18be41fa15fbbae8d77edf) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
squid: mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
(cherry picked from commit 8b6440652d501644d641c1c8b3255c3720738ec6) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
squid: revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Fixes: https://tracker.ceph.com/issues/65716 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit bf760602a4f02cc07072db2da5cb987e3072afce) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Mon, 13 May 2024 21:10:04 +0000 (00:10 +0300)]
squid: mds/quiesce-db: track db epoch separately from the membership epoch
Tracking the db epoch separately will make sure that replicas
only follow leader's epoch choice, even if they are already on
the new membership epoch. This eliminates races due to the
random order of mdsmap updates.
Fixes: https://tracker.ceph.com/issues/65977 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 379ef7196b61142dc7753992f897ad91b37f048f) Fixes: https://tracker.ceph.com/issues/66070
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 15f734ec6291bf918d704d7d3e6330b5606c47e3) Fixes: https://tracker.ceph.com/issues/66103 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
The new mode of the vstart_runner allows for passing
paths to yaml configs that will be merged and then
run just as the teuthology would do it.
Building on the standard run method we can even
pass "-" as the config name and provide one on the stdin like
python3 ../qa/tasks/vstart_runner.py --config-mode "-" << END
tasks:
- quiescer:
quiesce_factor: 0.5
min_quiesce: 10
max_quiesce: 10
initial_delay: 5
cancelations_cap: 2
paths:
- a
- b
- c
- waiter:
on_exit: 100
END
This commit does the minimum to allow testing of the quiescer,
but it also lays the groundwork for running arbitrary configs.
The cornerstone of the approach is to inject our local implementations
of the main fs suite classes. To be able to do that, some minor
refactoring was required in the corresponding modules:
the standard classes were renamed to have a *Base suffix, and the
former class name without the suffix is made a module level variable
initialized with the *Base implementation. This refactoring
is meant to be backward compatible.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 274849e544dd1f77158a2c80a4c654cb0363f71d) Fixes: https://tracker.ceph.com/issues/66103
Patrick Donnelly [Fri, 19 Apr 2024 23:29:44 +0000 (19:29 -0400)]
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
In order to check an inode's F_QUIESCE_BLOCK, the quiesce_inode op must acquire
the policylock. Furthermore, to ensure the F_QUIESCE_BLOCK is not changed
during quiesce, the lock must be held for the duration of the op's lifetime.
Fixes: https://tracker.ceph.com/issues/65595 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 50613b5562469ad24ed0fc547cafcfdeef5be604) Fixes: https://tracker.ceph.com/issues/65740
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.
Ilya Dryomov [Thu, 23 May 2024 16:15:08 +0000 (18:15 +0200)]
.github: expand tests label to all files under qa
The test job definition under qa/suites is an integral part of almost
any test. Often, the test logic is split between the task or workunit
and respective snippet(s) under qa/suites.
Other files under qa are less used, but still related to nothing but
testing, so just add the label on all of it.
Zac Dover [Mon, 20 May 2024 06:29:44 +0000 (16:29 +1000)]
doc/cephfs: separate commands into sections
Separate commands so that each command has its own subsection in the
section "FS Subvolumes" in the file doc/cephfs/fs-volumes.rst.
Previously, the list of commands for manipulating subvolumes was one
long, unbroken list and the beginning of one section could easily be
mistaken for the end of the previous section.
Zac Dover [Mon, 20 May 2024 11:55:16 +0000 (21:55 +1000)]
doc/cephfs: edit "Cloning Snapshots" in fs-volumes.rst
Edit the "Cloning Snapshots" section in doc/cephfs/fs-volumes.rst. This
commit represents only a grammar pass. A future commit (and future PR)
will separate this section into subsections by command.
Lucian Petrut [Thu, 12 Jan 2023 10:55:06 +0000 (12:55 +0200)]
qa: add ceph-rbd windows service restart test
We're adding a test that:
* maps a configurable number of images
* runs a specified test - we're reusing the ones from stress_test,
making just a few minor changes to allow running the same test
multiple times
* restarts the ceph-rbd Windows service
* waits for the images to be reconnected and refreshes the mount
information
* reruns the test
* repeats the above workflow for a specified number of times,
reusing the same images
This test ensures that:
* mounted images are still available after a service restart
* drive letters are retained
* the image content is retained
* there are no race conditions when connecting or disconnecting
a large number of images in parallel
* the driver is capable of mapping a specified number of images
simultaneously
Lucian Petrut [Tue, 10 Jan 2023 14:50:04 +0000 (16:50 +0200)]
qa: reorganize Windows python test
We're splitting the rbd-wnbd python test into separate files so
that the common code may easily be reused by other tests. This
also makes the code easier to read and maintain.
Matthew Vernon [Wed, 22 May 2024 15:31:33 +0000 (16:31 +0100)]
doc: clarify use of location: in host spec
It wasn't clear that you can specify more than one element of the CRUSH hierarchy in a spec file, nor that it might be useful to do so (e.g. to ensure the host ends up beneath the default root).
So update the text to make it clearer, and similarly the example.
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Patrick Donnelly [Thu, 23 May 2024 00:58:23 +0000 (20:58 -0400)]
Merge PR #57342 into squid
* refs/pull/57342/head:
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Patrick Donnelly [Wed, 22 May 2024 18:20:46 +0000 (14:20 -0400)]
Merge PR #57176 into squid
* refs/pull/57176/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters