Patrick Donnelly [Thu, 23 May 2024 01:16:32 +0000 (21:16 -0400)]
Merge PR #57469 into main
* refs/pull/57469/head:
mds: set dispatcher order
mds: use regular dispatch for processing beacons
msg: add priority to dispatcher invocation order
mds: note when dispatcher is called
Patrick Donnelly [Thu, 23 May 2024 01:15:07 +0000 (21:15 -0400)]
Merge PR #57215 into main
* refs/pull/57215/head:
doc: document new --output-file switch
test/cli: ignore tmp_file_template
qa/workunits: add --output-file test in cephtool workunit
common,ceph: add output file switch to dump json to
common/options: add configs for temporary files made by daemons
common/Formatter: write the pending string on flush
Reviewed-by: Leonid Usov <leonid.usov@ibm.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Patrick Donnelly [Tue, 21 May 2024 02:38:44 +0000 (22:38 -0400)]
Merge PR #57332 into main
* refs/pull/57332/head:
mds/quiesce: drop remote authpins before waiting for the quiesce lock
qa/cephfs/test_quiesce: test proper handling of remote authpins
mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
mds: enhance the `lock path` asok command
mds/quiesce: overdrive fragmenting that's still freezing
revert: mds: provide a mechanism to authpin while freezing
qa/cephfs/test_quiesce: enhance the fragmentation test
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Mon, 20 May 2024 23:47:22 +0000 (07:47 +0800)]
cmake: link rados_snap_set_diff_obj and krbd against legacy-option-headers
in c24a6ffe20, we tried to link all target dependent on legacy option
headers against legacy-option-headers, but we missed some of them.
in our CI, we spotted build failure like:
```
FAILED: src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -MF src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o.d -o src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc:7:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:7:10: fatal error: 'osd_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[111/1748] Generating immutable-object-cache_options.cc, ../../../include/immutable-object-cache_legacy_options.h
[112/1748] Building CXX object src/CMakeFiles/krbd.dir/krbd.cc.o
FAILED: src/CMakeFiles/krbd.dir/krbd.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/krbd.dir/krbd.cc.o -MF src/CMakeFiles/krbd.dir/krbd.cc.o.d -o src/CMakeFiles/krbd.dir/krbd.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc:44:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/MonMap.h:28:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/mon_types.h:20:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/include/Context.h:19:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/dout.h:29:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:11:10: fatal error: 'rgw_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
so in this change, we link the related targets to
`legacy-option-headers` as well to fulfill the build dependency.
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Rishabh Dave [Thu, 16 May 2024 16:57:10 +0000 (22:27 +0530)]
src/ptl-tool: allow not pushing branch to ceph-ci
Sometimes we need a branch but don't want to launch builds on shaman for
it. For such cases, provide an option that allows not pushing the branch
to ceph-ci.
Creating a branch that'll only be passed to "teuthology-suite" option
"--suite-branch" is an example of such a case.
jiawd [Fri, 12 Nov 2021 03:48:56 +0000 (03:48 +0000)]
osd: full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest
when write before truncate, need trim length, if truncate is to 0,
write is [0~128k], write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k~128k], write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k~128k], write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232~128k], write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).
so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.
Zac Dover [Sun, 19 May 2024 00:00:29 +0000 (10:00 +1000)]
doc/cephfs: Squid and later - subvolume quiesce
Add a note to the "Subvolume quiesce" section that says that the
information in the section applies only to the Squid and later releases
of Ceph. This is included here so that I don't overwrite the Reef and
Quincy documentation with irrelevant information, and so that I don't
overwrite the Squid information with blank space where the "Subvolume
quiesce" section should be.
Rishabh Dave [Thu, 16 May 2024 07:00:49 +0000 (12:30 +0530)]
qa/cephfs: block buggy tests in test_admin.py
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.
This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.
This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.
Rishabh Dave [Thu, 16 May 2024 16:30:01 +0000 (22:00 +0530)]
qa/cephfs: add MDS_CLIENTS_BROKEN_ROOTSQUASH to ignorelist
MDS_CLIENTS_BROKEN_ROOTSQUASH is generated and expected by
test_rootsquash_nofeature but it hasn't be added to ignorelist as a
result of which QA code marks the job as failed even though all tests
finished running successfully.
Introduced-by: bccc8ceb471c441ec04d7eb2c353630f8c5ce843 Fixes: https://tracker.ceph.com/issues/66075 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Tue, 7 May 2024 14:50:55 +0000 (20:20 +0530)]
qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Thu, 16 May 2024 10:40:58 +0000 (12:40 +0200)]
common/options: link to mon_osd_blocklist_default_expire from RBD
"number of seconds to blocklist - set to 0 for OSD default" in the
description of rbd_blocklist_expire_seconds refers to the value that is
controlled by mon_osd_blocklist_default_expire.
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.
Patrick Donnelly [Thu, 16 May 2024 03:01:16 +0000 (23:01 -0400)]
Merge PR #57454 into main
* refs/pull/57454/head:
mds/quiesce-db: optimize peer updates
mds/quiesce-db: track db epoch separately from the membership epoch
mds/quiesce-db: test that a peer on a newer membership epoch can ack a root
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 14 May 2024 19:28:56 +0000 (15:28 -0400)]
mds: set dispatcher order
This tries to preserve existing order but uses priorities to make it explicit
and robust to future dispatchers being added. Except:
- The beacon and metrics dispatcher have the highest priorities. This is to
ensure we process these messages before trying to acquire any expensive locks
(like mds_lock).
- The monc dispatcher also has a relatively high priority for the same reasons.
This change affects other daemons which may have ordered a dispatcher ahead
of the monc but I cannot think of a legitimate reason to nor do I see an
instance of it.
Fixes: 7fc04be9332704946ba6f0e95cfcd1afc34fc0fe Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 14 May 2024 17:53:09 +0000 (13:53 -0400)]
mds: use regular dispatch for processing beacons
Similar to the issue with MClientMetrics, beacons should also not be handled
via fast dispatch because it's necessary to acquire Beacon::mutex. This is a
big no-no as it may block one of the Messenger threads leading to improbable
deadlocks or DoS.
Instead, use the normal dispatch where acquiring locks is okay to do.
Fixes: 7fc04be9332704946ba6f0e95cfcd1afc34fc0fe
See-also: linux.git/f7c2f4f6ce16fb58f7d024f3e1b40023c4b43ff9 Fixes: https://tracker.ceph.com/issues/65658 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>