Add an explanation of leader-peon conditions that obtain when the
cluster is in the "HEALTH_OK" state. Previously, the text discussed
these two monitor states only in the context of a health detail entry.
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
I will list Joel Davidow here as the co-author for the sake of more
expediently getting this change into the documentation, but though he is
listed as the co-author, he is the true author.
Co-authored-by: Joel Davidow <jdavidow@nso.edu> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6fb9a5ef817eda5184d51ebcb425a6091ca82299)
Zac Dover [Wed, 5 Jun 2024 16:43:15 +0000 (02:43 +1000)]
doc/start: s/intro.rst/index.rst/
Change the filename "doc/start/intro.rst" to "doc/start/index.rst" so
that Sphinx finds the root filename for the "/start" directory in the
default location.
Zac Dover [Tue, 4 Jun 2024 13:37:27 +0000 (23:37 +1000)]
doc/start: s/http/https/ in links
Replace "http" with "https" in doc/start/get-involved.rst.
This commit is, in a way, a repeat of
https://github.com/ceph/ceph/pull/57213/
(1c5383b91bd7dbfa9670c6485fcc5ff28b79f40d), which targeted the Reef
branch instead of the main branch. When this commit has been merged and
backported, I will close https://github.com/ceph/ceph/pull/57213/.
I am listing Casey Cain here as the co-author, but he is in fact the
true author of this change.
Lucian Petrut [Fri, 24 May 2024 10:03:11 +0000 (10:03 +0000)]
rbd-wnbd: wait for the disk cleanup to complete
The WNBD disk removal workflow is asynchronous, which is why we'll
need to wait for the cleanup to complete when stopping the service.
The "disconnect_all_mappings" function is moved to
RbdMappingDispatcher::stop, allowing us to access the mapping list
more easily and reject new mappings after a stop has been requested.
Patrick Donnelly [Thu, 30 May 2024 13:08:04 +0000 (09:08 -0400)]
Merge PR #57730 into squid
* refs/pull/57730/head:
squid: mds: remove unnecssary quiesce finisher variable
squid: mds: attach quiesce_path mdr to finisher at creation not dispatch
squid: mds/quiesce: disable quiesce root debug parameters by default
squid: mds/quiesce-agt: never send a synchronous ack
squid: mds/quiesce-agt: add test for a rapid async ack
squid: mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
squid: qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
squid: mds: add `--lifetime` parameter to the `lock path` asok command
squid: mds/quiesce: accept a regular file as the quiesce root
squid: mds: command_quiesce_path: rename `--wait` to `--await` for consistency
squid: mds: command_quiesce_path: do not block the asok thread and return an adequate rc
squid: mds/quiesce: drop remote authpins before waiting for the quiesce lock
squid: qa/cephfs/test_quiesce: test proper handling of remote authpins
squid: mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
squid: mds: enhance the `lock path` asok command
squid: mds/quiesce: overdrive fragmenting that's still freezing
squid: revert: mds: provide a mechanism to authpin while freezing
squid: qa/cephfs/test_quiesce: enhance the fragmentation test
squid: mds/queisce-db: collect acks while bootstrapping
squid: mds/quiesce-db: optimize peer updates
squid: mds/quiesce-db: track db epoch separately from the membership epoch
squid: mds/quiesce-db: test that a peer on a newer membership epoch can ack a root
squid: mds: don't stall the asok thread for flush commands
squid: qa/quiescer: relax some timing requirements in the quiescer
squid: qa/tasks/quiescer: dump ops in parallel
squid: qa/suites/fs: add quiescer to the fs suite
squid: qa/tasks: the quiescer task and a waiter task to test it
squid: qa/tasks/cephfs: don't create a new CephManager if there is one in the context
squid: qa/tasks: vstart_runner: introduce --config-mode
squid: qa/tasks: introduce ThrasherGreenlet
squid: qa: update quiesce tests to expect ipolicy lock
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
Leonid Usov [Mon, 20 May 2024 16:17:04 +0000 (19:17 +0300)]
squid: mds/quiesce: overdrive an export if it hasn't frozen the tree yet
Just like with the fragmenting, we should abort an ongoing export
if a quiesce is attempted for the directory.
To minimize the stress for the system, we only allow the abort
if the export hasn't yet managed to freeze the tree. If that is the case,
then quiesce will have to wait for the export to finish.
Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit da5c263b8e7797eac6c9d13d5b6a6b292d9c5def) Fixes: https://tracker.ceph.com/issues/66259
Leonid Usov [Mon, 20 May 2024 22:03:15 +0000 (01:03 +0300)]
squid: mds/quiesce: quiesce_inode should not hold on to remote auth pins
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks
We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.
Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.
If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.
Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit b1cb6d985622c6164d99d3fd79b6eeaf6530894c) Fixes: https://tracker.ceph.com/issues/66258
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
squid: mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 3552fc5a9ea17c173a18be41fa15fbbae8d77edf) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
squid: mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
(cherry picked from commit 8b6440652d501644d641c1c8b3255c3720738ec6) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
squid: revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Fixes: https://tracker.ceph.com/issues/65716 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit bf760602a4f02cc07072db2da5cb987e3072afce) Fixes: https://tracker.ceph.com/issues/66154
Leonid Usov [Mon, 13 May 2024 21:10:04 +0000 (00:10 +0300)]
squid: mds/quiesce-db: track db epoch separately from the membership epoch
Tracking the db epoch separately will make sure that replicas
only follow leader's epoch choice, even if they are already on
the new membership epoch. This eliminates races due to the
random order of mdsmap updates.
Fixes: https://tracker.ceph.com/issues/65977 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 379ef7196b61142dc7753992f897ad91b37f048f) Fixes: https://tracker.ceph.com/issues/66070
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 15f734ec6291bf918d704d7d3e6330b5606c47e3) Fixes: https://tracker.ceph.com/issues/66103 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
The new mode of the vstart_runner allows for passing
paths to yaml configs that will be merged and then
run just as the teuthology would do it.
Building on the standard run method we can even
pass "-" as the config name and provide one on the stdin like
python3 ../qa/tasks/vstart_runner.py --config-mode "-" << END
tasks:
- quiescer:
quiesce_factor: 0.5
min_quiesce: 10
max_quiesce: 10
initial_delay: 5
cancelations_cap: 2
paths:
- a
- b
- c
- waiter:
on_exit: 100
END
This commit does the minimum to allow testing of the quiescer,
but it also lays the groundwork for running arbitrary configs.
The cornerstone of the approach is to inject our local implementations
of the main fs suite classes. To be able to do that, some minor
refactoring was required in the corresponding modules:
the standard classes were renamed to have a *Base suffix, and the
former class name without the suffix is made a module level variable
initialized with the *Base implementation. This refactoring
is meant to be backward compatible.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 274849e544dd1f77158a2c80a4c654cb0363f71d) Fixes: https://tracker.ceph.com/issues/66103
Patrick Donnelly [Fri, 19 Apr 2024 23:29:44 +0000 (19:29 -0400)]
squid: mds: add missing policylock to test F_QUIESCE_BLOCK
In order to check an inode's F_QUIESCE_BLOCK, the quiesce_inode op must acquire
the policylock. Furthermore, to ensure the F_QUIESCE_BLOCK is not changed
during quiesce, the lock must be held for the duration of the op's lifetime.
Fixes: https://tracker.ceph.com/issues/65595 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 50613b5562469ad24ed0fc547cafcfdeef5be604) Fixes: https://tracker.ceph.com/issues/65740
Conflicts:
mgr/dashboard/frontend/src/app/core/layouts/workbench-layout/workbench-layout.component.ts
mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.ts
- Hardware status and multi-cluster features are not included in squid, therefore this only affects the telemetry status
Fixes: https://tracker.ceph.com/issues/65643 Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
(cherry picked from commit 7abae5c982ee1fd7de74653bdd5ad46dbc4ae388)
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.