mon: validate everybody understands MSR on set-require-min-compat-client
Unit testing
------------
```
[rzarzynski@o06 build]$ bin/unittest_features
...
[ RUN ] features.release_features
1 argonaut features 0x40000 looks like argonaut
2 bobtail features 0x40000 looks like argonaut
3 cuttlefish features 0x40000 looks like argonaut
4 dumpling features 0x42040000 looks like dumpling
5 emperor features 0x42040000 looks like dumpling
6 firefly features 0x20842040000 looks like firefly
7 giant features 0x20842040000 looks like firefly
8 hammer features 0x1020842040000 looks like hammer
9 infernalis features 0x1020842040000 looks like hammer
10 jewel features 0x401020842040000 looks like jewel
11 kraken features 0xc01020842040000 looks like kraken
12 luminous features 0xe01020842240000 looks like luminous
13 mimic features 0xe01020842240000 looks like luminous
14 nautilus features 0xe01020842240000 looks like luminous
15 octopus features 0xe01020842240000 looks like luminous
16 pacific features 0xe01020842240000 looks like luminous
17 quincy features 0xe01020842240000 looks like luminous
18 reef features 0xe010208d2240000 looks like reef
19 squid features 0xe010248d2240000 looks like squid
[ OK ] features.release_features (0 ms)
```
Manual testing
--------------
\### `reef` client present in `squid` cluster
```
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.? 45407015477380382713f03cffffffdffff
client.? 45401383229067100153f01cfbffffdffff
[rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client
luminous
[rzarzynski@o06 build]$ bin/ceph osd set-require-min-compat-client squid
Error EPERM: cannot set require_min_compat_client to squid: 1 connected client(s) look like reef (missing 0x4000000000); add --yes-i-really-mean-it to do it anyway
```
mon: validate SERVER_REEF on set-require-min-compat-client
Unit testing
-------------
```
[rzarzynski@o06 build]$ bin/unittest_features
...
[ RUN ] features.release_features
1 argonaut features 0x40000 looks like argonaut
2 bobtail features 0x40000 looks like argonaut
3 cuttlefish features 0x40000 looks like argonaut
4 dumpling features 0x42040000 looks like dumpling
5 emperor features 0x42040000 looks like dumpling
6 firefly features 0x20842040000 looks like firefly
7 giant features 0x20842040000 looks like firefly
8 hammer features 0x1020842040000 looks like hammer
9 infernalis features 0x1020842040000 looks like hammer
10 jewel features 0x401020842040000 looks like jewel
11 kraken features 0xc01020842040000 looks like kraken
12 luminous features 0xe01020842240000 looks like luminous
13 mimic features 0xe01020842240000 looks like luminous
14 nautilus features 0xe01020842240000 looks like luminous
15 octopus features 0xe01020842240000 looks like luminous
16 pacific features 0xe01020842240000 looks like luminous
17 quincy features 0xe01020842240000 looks like luminous
18 reef features 0xe010208d2240000 looks like reef
19 squid features 0xe010208d2240000 looks like reef
[ OK ] features.release_features (0 ms)
```
Manual testing
--------------
\### 'quincy` client connected to `main` cluster
There was `ceph -w` from `quincy` running in the background.
```
[rzarzynski@o06 build]$ bin/ceph osd set-require-min-compat-client reef
Error EPERM: cannot set require_min_compat_client to reef: 1 connected client(s) look like luminous (missing 0x80000000); add --yes-i-really-mean-it to do it anyway
```
Rishabh Dave [Thu, 16 May 2024 07:00:49 +0000 (12:30 +0530)]
qa/cephfs: block buggy tests in test_admin.py
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.
This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.
This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.
Rishabh Dave [Thu, 16 May 2024 16:30:01 +0000 (22:00 +0530)]
qa/cephfs: add MDS_CLIENTS_BROKEN_ROOTSQUASH to ignorelist
MDS_CLIENTS_BROKEN_ROOTSQUASH is generated and expected by
test_rootsquash_nofeature but it hasn't be added to ignorelist as a
result of which QA code marks the job as failed even though all tests
finished running successfully.
Introduced-by: bccc8ceb471c441ec04d7eb2c353630f8c5ce843 Fixes: https://tracker.ceph.com/issues/66075 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Tue, 7 May 2024 14:50:55 +0000 (20:20 +0530)]
qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Thu, 16 May 2024 10:40:58 +0000 (12:40 +0200)]
common/options: link to mon_osd_blocklist_default_expire from RBD
"number of seconds to blocklist - set to 0 for OSD default" in the
description of rbd_blocklist_expire_seconds refers to the value that is
controlled by mon_osd_blocklist_default_expire.
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.
Patrick Donnelly [Thu, 16 May 2024 03:01:16 +0000 (23:01 -0400)]
Merge PR #57454 into main
* refs/pull/57454/head:
mds/quiesce-db: optimize peer updates
mds/quiesce-db: track db epoch separately from the membership epoch
mds/quiesce-db: test that a peer on a newer membership epoch can ack a root
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Lucian Petrut [Thu, 12 Jan 2023 10:55:06 +0000 (12:55 +0200)]
qa: add ceph-rbd windows service restart test
We're adding a test that:
* maps a configurable number of images
* runs a specified test - we're reusing the ones from stress_test,
making just a few minor changes to allow running the same test
multiple times
* restarts the ceph-rbd Windows service
* waits for the images to be reconnected and refreshes the mount
information
* reruns the test
* repeats the above workflow for a specified number of times,
reusing the same images
This test ensures that:
* mounted images are still available after a service restart
* drive letters are retained
* the image content is retained
* there are no race conditions when connecting or disconnecting
a large number of images in parallel
* the driver is capable of mapping a specified number of images
simultaneously
Lucian Petrut [Tue, 10 Jan 2023 14:50:04 +0000 (16:50 +0200)]
qa: reorganize Windows python test
We're splitting the rbd-wnbd python test into separate files so
that the common code may easily be reused by other tests. This
also makes the code easier to read and maintain.
Nizamudeen A [Fri, 3 May 2024 08:56:19 +0000 (14:26 +0530)]
mgr/k8sevents: update V1Events to CoreV1Events
centos9 only provides kubernetes 26.1.0 as base dep and hence the
k8sevents code needs to be updated accordingly. the api changes happened
in kuberenetes while 19.0.0 was released
Fixes: https://tracker.ceph.com/issues/65627 Fixes: https://tracker.ceph.com/issues/64981 Signed-off-by: Nizamudeen A <nia@redhat.com>
Patrick Donnelly [Wed, 15 May 2024 00:19:30 +0000 (20:19 -0400)]
Merge PR #57453 into main
* refs/pull/57453/head:
doc: add status badge for backport creation
.github: use shorter name for backport tracker action
.github: document where runs/output can be examined
Leonid Usov [Mon, 13 May 2024 21:10:04 +0000 (00:10 +0300)]
mds/quiesce-db: track db epoch separately from the membership epoch
Tracking the db epoch separately will make sure that replicas
only follow leader's epoch choice, even if they are already on
the new membership epoch. This eliminates races due to the
random order of mdsmap updates.
Fixes: https://tracker.ceph.com/issues/65977 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Ilya Dryomov [Sun, 12 May 2024 09:15:36 +0000 (11:15 +0200)]
qa/suites/krbd: drop pre-single-major test
Single-major mapping scheme was introduced in 2014 and became the
default in 2017. It's getting increasingly difficult to build and,
more importantly, to boot a 10 year old kernel with recent userspace
(systemd, etc). If someone is still running such a kernel, it's
really unlikely that they would have the most recent rbd CLI tool
installed.
This commit changes the default images for both loki and promtail
containers.
Also, to allow this update we need to update the configuration of loki
in order to add a new storage schema configuration:
* tests were passing only because they were not performings their asserts
* tests are now separated with their own attribute
* their topics are now marked "persistent" to workaround the issue in:
https://tracker.ceph.com/issues/65645