Patrick Donnelly [Tue, 14 May 2024 19:28:56 +0000 (15:28 -0400)]
mds: set dispatcher order
This tries to preserve existing order but uses priorities to make it explicit
and robust to future dispatchers being added. Except:
- The beacon and metrics dispatcher have the highest priorities. This is to
ensure we process these messages before trying to acquire any expensive locks
(like mds_lock).
- The monc dispatcher also has a relatively high priority for the same reasons.
This change affects other daemons which may have ordered a dispatcher ahead
of the monc but I cannot think of a legitimate reason to nor do I see an
instance of it.
Fixes: 7fc04be9332704946ba6f0e95cfcd1afc34fc0fe Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 3291f3976459fe6c05b5f54e200bd91cf3b78d8a)
Patrick Donnelly [Tue, 14 May 2024 17:53:09 +0000 (13:53 -0400)]
mds: use regular dispatch for processing beacons
Similar to the issue with MClientMetrics, beacons should also not be handled
via fast dispatch because it's necessary to acquire Beacon::mutex. This is a
big no-no as it may block one of the Messenger threads leading to improbable
deadlocks or DoS.
Instead, use the normal dispatch where acquiring locks is okay to do.
Patrick Donnelly [Tue, 14 May 2024 18:15:21 +0000 (14:15 -0400)]
msg: add priority to dispatcher invocation order
So we can ensure that e.g. MDSRank::ms_dispatch is lowest priority so that we
do not acquire the mds_lock when looking at beacons.
This change maintains the current behavior when the priority is unset: the use
of std::stable_sort will ensure that the add_dispatcher_head and
add_dispatcher_tail calls will preserve order when dispatcher priorities are
equal.
Fixes: 7fc04be9332704946ba6f0e95cfcd1afc34fc0fe Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b463d93b08f392ebd636c24bf5f0fa4249600256)
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Patrick Donnelly [Thu, 23 May 2024 00:58:23 +0000 (20:58 -0400)]
Merge PR #57342 into squid
* refs/pull/57342/head:
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Patrick Donnelly [Wed, 22 May 2024 18:20:46 +0000 (14:20 -0400)]
Merge PR #57176 into squid
* refs/pull/57176/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters
Patrick Donnelly [Wed, 22 May 2024 18:06:45 +0000 (14:06 -0400)]
Merge PR #57203 into squid
* refs/pull/57203/head:
mds: do not try fragmenting or exporting a quiesced directory
mds: set/test ALL_LOCKED on fragment_dir request
mds: pass bypassfreezing to parent auth pin req
qa: add quiesce tests during fragmentation
qa: translate empty output from rank_tell to empty dict
qa: move reqid_tostr helper
Patrick Donnelly [Wed, 22 May 2024 18:04:24 +0000 (14:04 -0400)]
Merge PR #57013 into squid
* refs/pull/57013/head:
mds/quiesce: don't take mirrored cap-related locks on the replica
mds/quiesce: xlock the file to let clients keep their buffered writes
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
mds: raise health warning if client lacks feature for root_squash
Rather than evict all clients lacking this feature bit, raise a health error
that pushes the administrator to address it. This avoids the surprise of having
all affected clients suddenly evicted in the cluster.
Fixes: https://tracker.ceph.com/issues/65733 Fixes: 954ed30 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 66ff5c9fc8d4664f18b2fa462e96e5548c35951f)
mon/MDSMonitor: add note about missing metadata inclusion
There is a "client_count" metadata on the health warning that apparently was
intended to be used for aggregating warnings but never was. Add a TODO item for
that.
mds: check relevant caps for fs include root_squash
When denying client reconnects because the MDS caps include root_squash and the
client features do not include CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK, ensure those
caps are only for the file system the MDS is joined to.
Fixes: https://tracker.ceph.com/issues/65733 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f79ae86f2c23388f6ecc3177764735e071998e09)
Xuehan Xu [Wed, 21 Feb 2024 06:53:33 +0000 (14:53 +0800)]
crimson/os/seastore/transaction_manager: add the max_data_allocation_size
configuration
Limit the max size of extents in seastore, which can avoid much read
amplification in case of remapping extents when extents integrity check
is mandatory
Zac Dover [Sun, 19 May 2024 00:00:29 +0000 (10:00 +1000)]
doc/cephfs: Squid and later - subvolume quiesce
Add a note to the "Subvolume quiesce" section that says that the
information in the section applies only to the Squid and later releases
of Ceph. This is included here so that I don't overwrite the Reef and
Quincy documentation with irrelevant information, and so that I don't
overwrite the Squid information with blank space where the "Subvolume
quiesce" section should be.
Nizamudeen A [Fri, 3 May 2024 08:56:19 +0000 (14:26 +0530)]
mgr/k8sevents: update V1Events to CoreV1Events
centos9 only provides kubernetes 26.1.0 as base dep and hence the
k8sevents code needs to be updated accordingly. the api changes happened
in kuberenetes while 19.0.0 was released
Fixes: https://tracker.ceph.com/issues/65627 Fixes: https://tracker.ceph.com/issues/64981 Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 6af964719217d720e6c2fd1ba2a607f6255d2604)
Ken Dreyer [Tue, 14 May 2024 18:53:51 +0000 (14:53 -0400)]
cmake: disable WITH_QATLIB/ZIP on non-x86
This feature is only relevant to x86 hosts.
Signed-off-by: Ken Dreyer <kdreyer@ibm.com> Fixes: https://tracker.ceph.com/issues/66016 Signed-off-by: Ken Dreyer <kdreyer@ibm.com>
(cherry picked from commit 487cd2fddbab784269af9f48206a130e63f1eca3)
crimson/osd/object_context_loader: SnapTrim to not resolve_oid
SnapTrimObjSubEvent::remove_or_update partially resolves the to be
trimmed clone taking into account in_removed_snaps_queue.
The general resolve_oid is not suitable for this scenario.
Specifically the following check:
```
if (std::find(
citer->second.begin(),
citer->second.end(),
oid.snap) == citer->second.end()) {
logger().debug("{} {} does not contain {} -- DNE",
__func__, ss.clone_snaps, oid.snap);
return std::nullopt;
}
```
because of earlier snap_map_modify call.
ObjectContextLoader interface provides two variants:
* with_obc:
// Use this variant by default
// If oid is a clone object, the clone obc *and* it's
// matching head obc will be locked and can be used in func.
* with_clone_obc_only:
// Use this variant in the case where the head object
// obc is already locked and only the clone obc is needed.
// Avoid nesting with_head_obc() calls by using with_clone_obc()
// with an already locked head.
with_clone_obc_direct variant is equal to with_obc on a clone obc
since both the head and the clone obcs will be locked and can be used.
crimson/osd/object_context_loader: fix with_clone_obc on resolve_oid case
Resolve_oid on a clone object may actually return the head:
```
// Because oid.snap > ss.seq, we are trying to read from a snapshot
// taken after the most recent write to this object. Read from head.
```
In this case, with_clone_obc should apply `func` same as with_head_obc would have.
Note: previously, with_clone_obc_only was called on the resolved head object.
While it didn't cause any errors, using the head_obc as clone is wrong.
At present, if a transaction gets interrupted right after it enters
WritePipeline::ReserveProjectedUsage and before any later continuations
get executed, WritePipeline::ReserveProjectedUsage will be locked
forever.