Xiubo Li [Wed, 26 Jul 2023 06:34:01 +0000 (14:34 +0800)]
mds: defer trim() until after the last cache_rejoin ack being received
Just before the last cache_rejoin ack being received the entire
subtree, together with the inode subtree root belongs to, were
trimmed the isolated_inodes list couldn't be correctly erased. We
should defer calling the trim() until the last cache_rejoin ack
being received.
Zac Dover [Sat, 15 Jun 2024 11:55:18 +0000 (21:55 +1000)]
doc/rados: explain replaceable parts of command
Add an explanation that directs the reader to replace the "X" part of
the command "ceph tell mon.X mon_status" with the value specific to the
reader's Ceph cluster (which is (probably) not "X").
In the future, such replaceable strings in commands may be bounded by
angle brackets ("<" and ">").
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
Kefu Chai [Thu, 23 May 2024 04:47:26 +0000 (12:47 +0800)]
ceph-volume: use importlib from stdlib on Python 3.8 and up
since packaging was apparently removed from pkg_resources, let's use
importlib.metadata when it is available and pkg_resources on older
Python versions.
Nizamudeen A [Fri, 3 May 2024 08:56:19 +0000 (14:26 +0530)]
mgr/k8sevents: update V1Events to CoreV1Events
centos9 only provides kubernetes 26.1.0 as base dep and hence the
k8sevents code needs to be updated accordingly. the api changes happened
in kuberenetes while 19.0.0 was released
Fixes: https://tracker.ceph.com/issues/65627 Fixes: https://tracker.ceph.com/issues/64981 Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 6af964719217d720e6c2fd1ba2a607f6255d2604)
Rishabh Dave [Tue, 7 May 2024 14:50:55 +0000 (20:20 +0530)]
qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit faa30e03f31551a71ebb8330dbbe7005d9ddd559)
Rishabh Dave [Wed, 8 May 2024 13:59:11 +0000 (19:29 +0530)]
qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd
This issue was not caught in original QA run because "ceph mds fail"
returns 0 even though MDS name received by it in argument is
non-existent. This is done for the sake of idempotency, however it
caused this bug to go uncaught.
Fixea: https://tracker.ceph.com/issues/65864 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit ab643f7a501797634a366fd29bf4acef6a8f0cf2)
Rishabh Dave [Mon, 25 Mar 2024 12:05:38 +0000 (17:35 +0530)]
qa/cephfs: add tests failing MDS and FS when MDS is unhealthy
Add tests to verify that the confirmation flag is mandatory for running
commands "ceph mds fail" and "ceph fs fail" when MDS has one of the two
health warnings: MDS_CACHE_OVERSIZE or MDS_TRIM.
Also, add MDS_CACHE_OVERSIZE and MDS_TRIM to ignorelist for
test_admin.py so that QA jobs knows this an expected failure.
Rishabh Dave [Mon, 25 Mar 2024 12:01:01 +0000 (17:31 +0530)]
qa/cephfs: pass confirmation flag to fs fail in tear down code
Since "ceph fs fail" command now requires the confirmation flag when
Ceph cluster has either health warning MDS_TRIM or MDS_CACHE_OVERSIZE,
update tear down in QA code. During the teardown, the CephFS should be
failed, regardless of whether or not Ceph cluster has health warnings,
since it is teardown.
Rishabh Dave [Wed, 13 Mar 2024 09:31:02 +0000 (15:01 +0530)]
cephfs,mon: require confirmation to fail unhealthy FS
Confirmation flag must be passed when running the command "ceph fs fail"
when the MDS for this FS has either of the two health warnings: MDS_TRIM
or MDS_CACHE_OVERSIZED. Else, the command will fail and print an
appropriate error message.
Restarting an MDS with these health warnings is not recommened since it
will have a slow recovery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b901616494a8359e59f7ec2cd661077c4aced01c)
Conflicts:
- src/mon/FSCommands.cc
- lines surrounding the patch are different in reef compared to main.
the reef code was still accessing "mds_map" directly instead of
accessing it using "get_mds_map()".
- return value of get_filesystem() is different in main.
Add an explanation of leader-peon conditions that obtain when the
cluster is in the "HEALTH_OK" state. Previously, the text discussed
these two monitor states only in the context of a health detail entry.
This improvement to the documentation was suggested on the [ceph-users]
email list by Joel Davidow. This email, an absolute model of user
engagement with an upstream project, can be reviewed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KF67F5TXFSSTPXV7EKL6JKLA5KZQDLDQ/
I will list Joel Davidow here as the co-author for the sake of more
expediently getting this change into the documentation, but though he is
listed as the co-author, he is the true author.
Co-authored-by: Joel Davidow <jdavidow@nso.edu> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6fb9a5ef817eda5184d51ebcb425a6091ca82299)
Zac Dover [Wed, 5 Jun 2024 16:43:15 +0000 (02:43 +1000)]
doc/start: s/intro.rst/index.rst/
Change the filename "doc/start/intro.rst" to "doc/start/index.rst" so
that Sphinx finds the root filename for the "/start" directory in the
default location.
Zac Dover [Tue, 4 Jun 2024 13:37:27 +0000 (23:37 +1000)]
doc/start: s/http/https/ in links
Replace "http" with "https" in doc/start/get-involved.rst.
This commit is, in a way, a repeat of
https://github.com/ceph/ceph/pull/57213/
(1c5383b91bd7dbfa9670c6485fcc5ff28b79f40d), which targeted the Reef
branch instead of the main branch. When this commit has been merged and
backported, I will close https://github.com/ceph/ceph/pull/57213/.
I am listing Casey Cain here as the co-author, but he is in fact the
true author of this change.
Since the command "ceph mds fail" now may require confirmation flag
("--yes-i-really-mean-it"), update this method to allow/disallow adding
this flag to the command arguments.
Rishabh Dave [Fri, 19 Apr 2024 11:28:30 +0000 (16:58 +0530)]
doc/cephfs: mention need of confirmation for "ceph mds fail"
Update docs since command "ceph mds fail" will now fail if MDS has either
health warning MDS_TRIM or MDS_CACHE_OVERSIZED and if confirmation flag
is not passed.
Rishabh Dave [Fri, 8 Mar 2024 15:39:18 +0000 (21:09 +0530)]
cephfs,mon: require confirmation to fail unhealthy MDS
When running the command "ceph mds fail" for an MDS that is unhealthy
due to, MDS_CACHE_OVERSIZED or MDS_TRIM, user must pass confirmation
flag. Else, the command will fail and print an appropriate error
message.
Restarting an MDS with such health warnings is not recommended since it
will have a slow reocvery during restart which will create new problems.
Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit eeda00eea5043d3ba806695a207b732cb53b35c4)
Rishabh Dave [Fri, 8 Mar 2024 15:31:51 +0000 (21:01 +0530)]
mds: add no counters in warning for standby-replay MDS
Don't include inode and stray counters in the health warnings printed
for standby-replay MDSs. Since these counters are present in the health
warnings only due to replay, it can confuse users, and therefore, do not
include them.
Fixes: https://tracker.ceph.com/issues/63514 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 03dcdc1329e471aa4aa403519ea5131db2f99b23)
Ilya Dryomov [Mon, 6 May 2024 06:16:01 +0000 (08:16 +0200)]
qa/workunits/rbd: wait for replaying status in bootstrap tests
wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.