Rishabh Dave [Mon, 25 Mar 2024 12:05:38 +0000 (17:35 +0530)]
qa/cephfs: add tests failing MDS and FS when MDS is unhealthy
Add tests to verify that the confirmation flag is mandatory for running
commands "ceph mds fail" and "ceph fs fail" when MDS has one of the two
health warnings: MDS_CACHE_OVERSIZE or MDS_TRIM.
Also, add MDS_CACHE_OVERSIZE and MDS_TRIM to ignorelist for
test_admin.py so that QA jobs knows this an expected failure.
Rishabh Dave [Mon, 25 Mar 2024 12:01:01 +0000 (17:31 +0530)]
qa/cephfs: pass confirmation flag to fs fail in tear down code
Since "ceph fs fail" command now requires the confirmation flag when
Ceph cluster has either health warning MDS_TRIM or MDS_CACHE_OVERSIZE,
update tear down in QA code. During the teardown, the CephFS should be
failed, regardless of whether or not Ceph cluster has health warnings,
since it is teardown.
Rishabh Dave [Wed, 13 Mar 2024 09:31:02 +0000 (15:01 +0530)]
cephfs,mon: require confirmation to fail unhealthy FS
Confirmation flag must be passed when running the command "ceph fs fail"
when the MDS for this FS has either of the two health warnings: MDS_TRIM
or MDS_CACHE_OVERSIZED. Else, the command will fail and print an
appropriate error message.
Restarting an MDS with these health warnings is not recommened since it
will have a slow recovery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b901616494a8359e59f7ec2cd661077c4aced01c)
Conflicts:
- src/mon/FSCommands.cc
- lines surrounding the patch are different in reef compared to main.
the reef code was still accessing "mds_map" directly instead of
accessing it using "get_mds_map()".
- return value of get_filesystem() is different in main.
Since the command "ceph mds fail" now may require confirmation flag
("--yes-i-really-mean-it"), update this method to allow/disallow adding
this flag to the command arguments.
Rishabh Dave [Fri, 19 Apr 2024 11:28:30 +0000 (16:58 +0530)]
doc/cephfs: mention need of confirmation for "ceph mds fail"
Update docs since command "ceph mds fail" will now fail if MDS has either
health warning MDS_TRIM or MDS_CACHE_OVERSIZED and if confirmation flag
is not passed.
Rishabh Dave [Fri, 8 Mar 2024 15:39:18 +0000 (21:09 +0530)]
cephfs,mon: require confirmation to fail unhealthy MDS
When running the command "ceph mds fail" for an MDS that is unhealthy
due to, MDS_CACHE_OVERSIZED or MDS_TRIM, user must pass confirmation
flag. Else, the command will fail and print an appropriate error
message.
Restarting an MDS with such health warnings is not recommended since it
will have a slow reocvery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit eeda00eea5043d3ba806695a207b732cb53b35c4)
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.
Zac Dover [Tue, 28 May 2024 00:49:35 +0000 (10:49 +1000)]
doc/dev: add target links to perf_counters.rst
Add target links to perf_counters.rst, to remedy the failure to backport
the docs changes in https://github.com/ceph/ceph/pull/53003.
(https://github.com/ceph/ceph/pull/53003 mixed code and docs changes, so
it is understandable why the backport was not achieved back in October,
when the merge to main occurred.)
Ilya Dryomov [Thu, 23 May 2024 16:15:08 +0000 (18:15 +0200)]
.github: expand tests label to all files under qa
The test job definition under qa/suites is an integral part of almost
any test. Often, the test logic is split between the task or workunit
and respective snippet(s) under qa/suites.
Other files under qa are less used, but still related to nothing but
testing, so just add the label on all of it.
Zac Dover [Mon, 20 May 2024 11:55:16 +0000 (21:55 +1000)]
doc/cephfs: edit "Cloning Snapshots" in fs-volumes.rst
Edit the "Cloning Snapshots" section in doc/cephfs/fs-volumes.rst. This
commit represents only a grammar pass. A future commit (and future PR)
will separate this section into subsections by command.
Zac Dover [Mon, 20 May 2024 06:29:44 +0000 (16:29 +1000)]
doc/cephfs: separate commands into sections
Separate commands so that each command has its own subsection in the
section "FS Subvolumes" in the file doc/cephfs/fs-volumes.rst.
Previously, the list of commands for manipulating subvolumes was one
long, unbroken list and the beginning of one section could easily be
mistaken for the end of the previous section.
Matthew Vernon [Wed, 22 May 2024 15:31:33 +0000 (16:31 +0100)]
doc: clarify use of location: in host spec
It wasn't clear that you can specify more than one element of the CRUSH hierarchy in a spec file, nor that it might be useful to do so (e.g. to ensure the host ends up beneath the default root).
So update the text to make it clearer, and similarly the example.
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Ilya Dryomov [Sun, 12 May 2024 09:15:36 +0000 (11:15 +0200)]
qa/suites/krbd: drop pre-single-major test
Single-major mapping scheme was introduced in 2014 and became the
default in 2017. It's getting increasingly difficult to build and,
more importantly, to boot a 10 year old kernel with recent userspace
(systemd, etc). If someone is still running such a kernel, it's
really unlikely that they would have the most recent rbd CLI tool
installed.
Zac Dover [Sun, 12 May 2024 01:39:34 +0000 (11:39 +1000)]
doc/cephfs: edit fs-volumes.rst (1 of x) followup
Include the suggestions for improving doc/cephfs/fs-volumes.rst made by
Anthony D'Atri here
https://github.com/ceph/ceph/pull/57415#discussion_r1597362110
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit cb700d804b4390fd9f55444dcfc04dfebac3a1bf)
Patrick Donnelly [Sat, 11 May 2024 01:27:08 +0000 (21:27 -0400)]
Merge PR #57343 into reef
* refs/pull/57343/head:
reef: qa: do not use `fs authorize` for two fs
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Tested-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
mds: raise health warning if client lacks feature for root_squash
Rather than evict all clients lacking this feature bit, raise a health error
that pushes the administrator to address it. This avoids the surprise of having
all affected clients suddenly evicted in the cluster.
Fixes: https://tracker.ceph.com/issues/65733 Fixes: 954ed30 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 66ff5c9fc8d4664f18b2fa462e96e5548c35951f)
Conflicts:
src/messages/MMDSBeacon.h: missing health beacon type
mon/MDSMonitor: add note about missing metadata inclusion
There is a "client_count" metadata on the health warning that apparently was
intended to be used for aggregating warnings but never was. Add a TODO item for
that.
mds: check relevant caps for fs include root_squash
When denying client reconnects because the MDS caps include root_squash and the
client features do not include CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK, ensure those
caps are only for the file system the MDS is joined to.
Fixes: https://tracker.ceph.com/issues/65733 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f79ae86f2c23388f6ecc3177764735e071998e09)
John Mulligan [Fri, 29 Mar 2024 18:04:33 +0000 (14:04 -0400)]
ceph.spec.in: remove command-with-macro line
A comment clearly left as a breadcrumb for a node-proxy manpage is
causing (intermittent) build failures. Remove the line and hope
the manpage is added if/when appropriate.
Fixes: 0dd73643649ddc2366e60de4fe6c078b6e112091 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 5f25005dfbff51531989d121f26ecae308409356)
Zac Dover [Thu, 2 May 2024 12:54:25 +0000 (22:54 +1000)]
doc/cephadm: Reef default images procedure
Address Adam King's request for version-specific
cephadm-container-image-retrieval procedures, which he requested here:
https://github.com/ceph/ceph/pull/57208#discussion_r1586614140
Co-authored-by: Adam King <adking@redhat.com> Signed-off-by: Zac Dover <zac.dover@proton.me>