Rishabh Dave [Tue, 7 May 2024 14:50:55 +0000 (20:20 +0530)]
qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit faa30e03f31551a71ebb8330dbbe7005d9ddd559)
Rishabh Dave [Wed, 8 May 2024 13:59:11 +0000 (19:29 +0530)]
qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd
This issue was not caught in original QA run because "ceph mds fail"
returns 0 even though MDS name received by it in argument is
non-existent. This is done for the sake of idempotency, however it
caused this bug to go uncaught.
Fixea: https://tracker.ceph.com/issues/65864 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit ab643f7a501797634a366fd29bf4acef6a8f0cf2)
Rishabh Dave [Mon, 25 Mar 2024 12:05:38 +0000 (17:35 +0530)]
qa/cephfs: add tests failing MDS and FS when MDS is unhealthy
Add tests to verify that the confirmation flag is mandatory for running
commands "ceph mds fail" and "ceph fs fail" when MDS has one of the two
health warnings: MDS_CACHE_OVERSIZE or MDS_TRIM.
Also, add MDS_CACHE_OVERSIZE and MDS_TRIM to ignorelist for
test_admin.py so that QA jobs knows this an expected failure.
Rishabh Dave [Mon, 25 Mar 2024 12:01:01 +0000 (17:31 +0530)]
qa/cephfs: pass confirmation flag to fs fail in tear down code
Since "ceph fs fail" command now requires the confirmation flag when
Ceph cluster has either health warning MDS_TRIM or MDS_CACHE_OVERSIZE,
update tear down in QA code. During the teardown, the CephFS should be
failed, regardless of whether or not Ceph cluster has health warnings,
since it is teardown.
Rishabh Dave [Wed, 13 Mar 2024 09:31:02 +0000 (15:01 +0530)]
cephfs,mon: require confirmation to fail unhealthy FS
Confirmation flag must be passed when running the command "ceph fs fail"
when the MDS for this FS has either of the two health warnings: MDS_TRIM
or MDS_CACHE_OVERSIZED. Else, the command will fail and print an
appropriate error message.
Restarting an MDS with these health warnings is not recommened since it
will have a slow recovery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b901616494a8359e59f7ec2cd661077c4aced01c)
Since the command "ceph mds fail" now may require confirmation flag
("--yes-i-really-mean-it"), update this method to allow/disallow adding
this flag to the command arguments.
Rishabh Dave [Fri, 19 Apr 2024 11:28:30 +0000 (16:58 +0530)]
doc/cephfs: mention need of confirmation for "ceph mds fail"
Update docs since command "ceph mds fail" will now fail if MDS has either
health warning MDS_TRIM or MDS_CACHE_OVERSIZED and if confirmation flag
is not passed.
Rishabh Dave [Fri, 8 Mar 2024 15:39:18 +0000 (21:09 +0530)]
cephfs,mon: require confirmation to fail unhealthy MDS
When running the command "ceph mds fail" for an MDS that is unhealthy
due to, MDS_CACHE_OVERSIZED or MDS_TRIM, user must pass confirmation
flag. Else, the command will fail and print an appropriate error
message.
Restarting an MDS with such health warnings is not recommended since it
will have a slow reocvery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit eeda00eea5043d3ba806695a207b732cb53b35c4)
Lucian Petrut [Fri, 24 May 2024 10:03:11 +0000 (10:03 +0000)]
rbd-wnbd: wait for the disk cleanup to complete
The WNBD disk removal workflow is asynchronous, which is why we'll
need to wait for the cleanup to complete when stopping the service.
The "disconnect_all_mappings" function is moved to
RbdMappingDispatcher::stop, allowing us to access the mapping list
more easily and reject new mappings after a stop has been requested.
Patrick Donnelly [Thu, 30 May 2024 13:08:04 +0000 (09:08 -0400)]
Merge PR #57730 into squid
* refs/pull/57730/head:
squid: mds: remove unnecssary quiesce finisher variable
squid: mds: attach quiesce_path mdr to finisher at creation not dispatch
squid: mds/quiesce: disable quiesce root debug parameters by default
squid: mds/quiesce-agt: never send a synchronous ack
squid: mds/quiesce-agt: add test for a rapid async ack
squid: mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
squid: qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
squid: mds: add `--lifetime` parameter to the `lock path` asok command
squid: mds/quiesce: accept a regular file as the quiesce root
squid: mds: command_quiesce_path: rename `--wait` to `--await` for consistency
squid: mds: command_quiesce_path: do not block the asok thread and return an adequate rc
squid: mds/quiesce: drop remote authpins before waiting for the quiesce lock
squid: qa/cephfs/test_quiesce: test proper handling of remote authpins
squid: mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
squid: mds: enhance the `lock path` asok command
squid: mds/quiesce: overdrive fragmenting that's still freezing
squid: revert: mds: provide a mechanism to authpin while freezing
squid: qa/cephfs/test_quiesce: enhance the fragmentation test
squid: mds/queisce-db: collect acks while bootstrapping
squid: mds/quiesce-db: optimize peer updates
squid: mds/quiesce-db: track db epoch separately from the membership epoch
squid: mds/quiesce-db: test that a peer on a newer membership epoch can ack a root
squid: mds: don't stall the asok thread for flush commands
squid: qa/quiescer: relax some timing requirements in the quiescer
squid: qa/tasks/quiescer: dump ops in parallel
squid: qa/suites/fs: add quiescer to the fs suite
squid: qa/tasks: the quiescer task and a waiter task to test it
squid: qa/tasks/cephfs: don't create a new CephManager if there is one in the context
squid: qa/tasks: vstart_runner: introduce --config-mode
squid: qa/tasks: introduce ThrasherGreenlet
squid: qa: update quiesce tests to expect ipolicy lock
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
Leonid Usov [Mon, 20 May 2024 16:17:04 +0000 (19:17 +0300)]
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
Just like with the fragmenting, we should abort an ongoing export
if a quiesce is attempted for the directory.
To minimize the stress for the system, we only allow the abort
if the export hasn't yet managed to freeze the tree. If that is the case,
then quiesce will have to wait for the export to finish.
Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit da5c263b8e7797eac6c9d13d5b6a6b292d9c5def) Fixes: https://tracker.ceph.com/issues/66259
Leonid Usov [Mon, 20 May 2024 22:03:15 +0000 (01:03 +0300)]
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks
We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.
Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.
If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.
Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit b1cb6d985622c6164d99d3fd79b6eeaf6530894c) Fixes: https://tracker.ceph.com/issues/66258
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
squid: mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 3552fc5a9ea17c173a18be41fa15fbbae8d77edf) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
squid: mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
(cherry picked from commit 8b6440652d501644d641c1c8b3255c3720738ec6) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
squid: revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Fixes: https://tracker.ceph.com/issues/65716 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit bf760602a4f02cc07072db2da5cb987e3072afce) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Mon, 13 May 2024 21:10:04 +0000 (00:10 +0300)]
squid: mds/quiesce-db: track db epoch separately from the membership epoch
Tracking the db epoch separately will make sure that replicas
only follow leader's epoch choice, even if they are already on
the new membership epoch. This eliminates races due to the
random order of mdsmap updates.
Fixes: https://tracker.ceph.com/issues/65977 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 379ef7196b61142dc7753992f897ad91b37f048f) Fixes: https://tracker.ceph.com/issues/66070
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 15f734ec6291bf918d704d7d3e6330b5606c47e3) Fixes: https://tracker.ceph.com/issues/66103 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
The new mode of the vstart_runner allows for passing
paths to yaml configs that will be merged and then
run just as the teuthology would do it.
Building on the standard run method we can even
pass "-" as the config name and provide one on the stdin like
python3 ../qa/tasks/vstart_runner.py --config-mode "-" << END
tasks:
- quiescer:
quiesce_factor: 0.5
min_quiesce: 10
max_quiesce: 10
initial_delay: 5
cancelations_cap: 2
paths:
- a
- b
- c
- waiter:
on_exit: 100
END
This commit does the minimum to allow testing of the quiescer,
but it also lays the groundwork for running arbitrary configs.
The cornerstone of the approach is to inject our local implementations
of the main fs suite classes. To be able to do that, some minor
refactoring was required in the corresponding modules:
the standard classes were renamed to have a *Base suffix, and the
former class name without the suffix is made a module level variable
initialized with the *Base implementation. This refactoring
is meant to be backward compatible.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 274849e544dd1f77158a2c80a4c654cb0363f71d) Fixes: https://tracker.ceph.com/issues/66103
Patrick Donnelly [Fri, 19 Apr 2024 23:29:44 +0000 (19:29 -0400)]
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
In order to check an inode's F_QUIESCE_BLOCK, the quiesce_inode op must acquire
the policylock. Furthermore, to ensure the F_QUIESCE_BLOCK is not changed
during quiesce, the lock must be held for the duration of the op's lifetime.
Fixes: https://tracker.ceph.com/issues/65595 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 50613b5562469ad24ed0fc547cafcfdeef5be604) Fixes: https://tracker.ceph.com/issues/65740
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.
Ilya Dryomov [Thu, 23 May 2024 16:15:08 +0000 (18:15 +0200)]
.github: expand tests label to all files under qa
The test job definition under qa/suites is an integral part of almost
any test. Often, the test logic is split between the task or workunit
and respective snippet(s) under qa/suites.
Other files under qa are less used, but still related to nothing but
testing, so just add the label on all of it.
Zac Dover [Mon, 20 May 2024 06:29:44 +0000 (16:29 +1000)]
doc/cephfs: separate commands into sections
Separate commands so that each command has its own subsection in the
section "FS Subvolumes" in the file doc/cephfs/fs-volumes.rst.
Previously, the list of commands for manipulating subvolumes was one
long, unbroken list and the beginning of one section could easily be
mistaken for the end of the previous section.
Zac Dover [Mon, 20 May 2024 11:55:16 +0000 (21:55 +1000)]
doc/cephfs: edit "Cloning Snapshots" in fs-volumes.rst
Edit the "Cloning Snapshots" section in doc/cephfs/fs-volumes.rst. This
commit represents only a grammar pass. A future commit (and future PR)
will separate this section into subsections by command.