Jos Collin [Mon, 6 May 2024 12:47:29 +0000 (18:17 +0530)]
pybind/mgr/mirroring: Fix KeyError: 'directory_count' in daemon status
The directory_count key is missing in self.mgr.get_daemon_status() output json,
intermittently when there is a delay caused by m_listener.handle_mirroring_enabled() to update the
directory_count, which results in ServiceDaemon::update_status() creates a json with out 'directory_count' key/value.
But the mgr/mirroring -> daemon_status() always expects the 'directory_count' key to be present in the json returned by
self.mgr.get_daemon_status().
This issue occurs intermittently when we enable/disable mirroring and check the 'daemon status' in between.
This patch fixes this issue by setting a default value 0 for 'directory_count' in doemon_status().
Fixes: https://tracker.ceph.com/issues/65795 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit b78baa23e562742b8bdc5a75f82e3b6fbf55a8a5)
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.
Ilya Dryomov [Thu, 23 May 2024 16:15:08 +0000 (18:15 +0200)]
.github: expand tests label to all files under qa
The test job definition under qa/suites is an integral part of almost
any test. Often, the test logic is split between the task or workunit
and respective snippet(s) under qa/suites.
Other files under qa are less used, but still related to nothing but
testing, so just add the label on all of it.
Zac Dover [Mon, 20 May 2024 06:29:44 +0000 (16:29 +1000)]
doc/cephfs: separate commands into sections
Separate commands so that each command has its own subsection in the
section "FS Subvolumes" in the file doc/cephfs/fs-volumes.rst.
Previously, the list of commands for manipulating subvolumes was one
long, unbroken list and the beginning of one section could easily be
mistaken for the end of the previous section.
Zac Dover [Mon, 20 May 2024 11:55:16 +0000 (21:55 +1000)]
doc/cephfs: edit "Cloning Snapshots" in fs-volumes.rst
Edit the "Cloning Snapshots" section in doc/cephfs/fs-volumes.rst. This
commit represents only a grammar pass. A future commit (and future PR)
will separate this section into subsections by command.
Lucian Petrut [Thu, 12 Jan 2023 10:55:06 +0000 (12:55 +0200)]
qa: add ceph-rbd windows service restart test
We're adding a test that:
* maps a configurable number of images
* runs a specified test - we're reusing the ones from stress_test,
making just a few minor changes to allow running the same test
multiple times
* restarts the ceph-rbd Windows service
* waits for the images to be reconnected and refreshes the mount
information
* reruns the test
* repeats the above workflow for a specified number of times,
reusing the same images
This test ensures that:
* mounted images are still available after a service restart
* drive letters are retained
* the image content is retained
* there are no race conditions when connecting or disconnecting
a large number of images in parallel
* the driver is capable of mapping a specified number of images
simultaneously
Lucian Petrut [Tue, 10 Jan 2023 14:50:04 +0000 (16:50 +0200)]
qa: reorganize Windows python test
We're splitting the rbd-wnbd python test into separate files so
that the common code may easily be reused by other tests. This
also makes the code easier to read and maintain.
Matthew Vernon [Wed, 22 May 2024 15:31:33 +0000 (16:31 +0100)]
doc: clarify use of location: in host spec
It wasn't clear that you can specify more than one element of the CRUSH hierarchy in a spec file, nor that it might be useful to do so (e.g. to ensure the host ends up beneath the default root).
So update the text to make it clearer, and similarly the example.
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Patrick Donnelly [Thu, 23 May 2024 00:58:23 +0000 (20:58 -0400)]
Merge PR #57342 into squid
* refs/pull/57342/head:
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Patrick Donnelly [Wed, 22 May 2024 18:20:46 +0000 (14:20 -0400)]
Merge PR #57176 into squid
* refs/pull/57176/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters
Patrick Donnelly [Wed, 22 May 2024 18:06:45 +0000 (14:06 -0400)]
Merge PR #57203 into squid
* refs/pull/57203/head:
mds: do not try fragmenting or exporting a quiesced directory
mds: set/test ALL_LOCKED on fragment_dir request
mds: pass bypassfreezing to parent auth pin req
qa: add quiesce tests during fragmentation
qa: translate empty output from rank_tell to empty dict
qa: move reqid_tostr helper
Patrick Donnelly [Wed, 22 May 2024 18:04:24 +0000 (14:04 -0400)]
Merge PR #57013 into squid
* refs/pull/57013/head:
mds/quiesce: don't take mirrored cap-related locks on the replica
mds/quiesce: xlock the file to let clients keep their buffered writes
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
mds: raise health warning if client lacks feature for root_squash
Rather than evict all clients lacking this feature bit, raise a health error
that pushes the administrator to address it. This avoids the surprise of having
all affected clients suddenly evicted in the cluster.
Fixes: https://tracker.ceph.com/issues/65733 Fixes: 954ed30 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 66ff5c9fc8d4664f18b2fa462e96e5548c35951f)
mon/MDSMonitor: add note about missing metadata inclusion
There is a "client_count" metadata on the health warning that apparently was
intended to be used for aggregating warnings but never was. Add a TODO item for
that.
mds: check relevant caps for fs include root_squash
When denying client reconnects because the MDS caps include root_squash and the
client features do not include CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK, ensure those
caps are only for the file system the MDS is joined to.
Fixes: https://tracker.ceph.com/issues/65733 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f79ae86f2c23388f6ecc3177764735e071998e09)
crimson/osd/pg: SnapTrimEvent to support interrupts
SnapTrimEvent operations are scheduled from `PG::on_active_actmap()`
using a `seastar::do_until` loop. This commit replaces the loop type
into an `interruptor::repeat` and SnapTrimEvent are now scheduled by
`start_operation_may_interrupt`.
Previously, `SnapTrimEvent::start` handled interruptions by returning
a `crimson::ct_error::eagain::make();`. Now, the errorator is directly
returned via the `snap_trim_event_ret_t` and interrupts the loop
described above.
As a result, interruptions originated by interval changes are now
supported by SnapTrimEvent.
Xuehan Xu [Wed, 21 Feb 2024 06:53:33 +0000 (14:53 +0800)]
crimson/os/seastore/transaction_manager: add the max_data_allocation_size
configuration
Limit the max size of extents in seastore, which can avoid much read
amplification in case of remapping extents when extents integrity check
is mandatory