Patrick Donnelly [Thu, 23 May 2024 01:16:32 +0000 (21:16 -0400)]
Merge PR #57469 into main
* refs/pull/57469/head:
mds: set dispatcher order
mds: use regular dispatch for processing beacons
msg: add priority to dispatcher invocation order
mds: note when dispatcher is called
Patrick Donnelly [Thu, 23 May 2024 01:15:07 +0000 (21:15 -0400)]
Merge PR #57215 into main
* refs/pull/57215/head:
doc: document new --output-file switch
test/cli: ignore tmp_file_template
qa/workunits: add --output-file test in cephtool workunit
common,ceph: add output file switch to dump json to
common/options: add configs for temporary files made by daemons
common/Formatter: write the pending string on flush
Reviewed-by: Leonid Usov <leonid.usov@ibm.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Matthew Vernon [Wed, 22 May 2024 15:31:33 +0000 (16:31 +0100)]
doc: clarify use of location: in host spec
It wasn't clear that you can specify more than one element of the CRUSH hierarchy in a spec file, nor that it might be useful to do so (e.g. to ensure the host ends up beneath the default root).
So update the text to make it clearer, and similarly the example.
Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>
Patrick Donnelly [Tue, 21 May 2024 02:38:44 +0000 (22:38 -0400)]
Merge PR #57332 into main
* refs/pull/57332/head:
mds/quiesce: drop remote authpins before waiting for the quiesce lock
qa/cephfs/test_quiesce: test proper handling of remote authpins
mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
mds: enhance the `lock path` asok command
mds/quiesce: overdrive fragmenting that's still freezing
revert: mds: provide a mechanism to authpin while freezing
qa/cephfs/test_quiesce: enhance the fragmentation test
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Mon, 20 May 2024 23:47:22 +0000 (07:47 +0800)]
cmake: link rados_snap_set_diff_obj and krbd against legacy-option-headers
in c24a6ffe20, we tried to link all target dependent on legacy option
headers against legacy-option-headers, but we missed some of them.
in our CI, we spotted build failure like:
```
FAILED: src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -MF src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o.d -o src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc:7:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:7:10: fatal error: 'osd_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[111/1748] Generating immutable-object-cache_options.cc, ../../../include/immutable-object-cache_legacy_options.h
[112/1748] Building CXX object src/CMakeFiles/krbd.dir/krbd.cc.o
FAILED: src/CMakeFiles/krbd.dir/krbd.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/krbd.dir/krbd.cc.o -MF src/CMakeFiles/krbd.dir/krbd.cc.o.d -o src/CMakeFiles/krbd.dir/krbd.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc:44:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/MonMap.h:28:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/mon_types.h:20:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/include/Context.h:19:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/dout.h:29:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:11:10: fatal error: 'rgw_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
so in this change, we link the related targets to
`legacy-option-headers` as well to fulfill the build dependency.
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Rishabh Dave [Thu, 16 May 2024 16:57:10 +0000 (22:27 +0530)]
src/ptl-tool: allow not pushing branch to ceph-ci
Sometimes we need a branch but don't want to launch builds on shaman for
it. For such cases, provide an option that allows not pushing the branch
to ceph-ci.
Creating a branch that'll only be passed to "teuthology-suite" option
"--suite-branch" is an example of such a case.
jiawd [Fri, 12 Nov 2021 03:48:56 +0000 (03:48 +0000)]
osd: full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest
when write before truncate, need trim length, if truncate is to 0,
write is [0~128k], write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k~128k], write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k~128k], write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232~128k], write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).
so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.
Zac Dover [Sun, 19 May 2024 00:00:29 +0000 (10:00 +1000)]
doc/cephfs: Squid and later - subvolume quiesce
Add a note to the "Subvolume quiesce" section that says that the
information in the section applies only to the Squid and later releases
of Ceph. This is included here so that I don't overwrite the Reef and
Quincy documentation with irrelevant information, and so that I don't
overwrite the Squid information with blank space where the "Subvolume
quiesce" section should be.
Rishabh Dave [Thu, 16 May 2024 07:00:49 +0000 (12:30 +0530)]
qa/cephfs: block buggy tests in test_admin.py
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.
This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.
This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.
Rishabh Dave [Thu, 16 May 2024 16:30:01 +0000 (22:00 +0530)]
qa/cephfs: add MDS_CLIENTS_BROKEN_ROOTSQUASH to ignorelist
MDS_CLIENTS_BROKEN_ROOTSQUASH is generated and expected by
test_rootsquash_nofeature but it hasn't be added to ignorelist as a
result of which QA code marks the job as failed even though all tests
finished running successfully.
Introduced-by: bccc8ceb471c441ec04d7eb2c353630f8c5ce843 Fixes: https://tracker.ceph.com/issues/66075 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Tue, 7 May 2024 14:50:55 +0000 (20:20 +0530)]
qa/cephfs: set joinable on FS before exiting tests in TestFSFail
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.
Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this
may take a long time to complete for each individual MDS. The entire quiesce
set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Thu, 16 May 2024 10:40:58 +0000 (12:40 +0200)]
common/options: link to mon_osd_blocklist_default_expire from RBD
"number of seconds to blocklist - set to 0 for OSD default" in the
description of rbd_blocklist_expire_seconds refers to the value that is
controlled by mon_osd_blocklist_default_expire.
Rishabh Dave [Wed, 8 May 2024 13:59:11 +0000 (19:29 +0530)]
qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd
This issue was not caught in original QA run because "ceph mds fail"
returns 0 even though MDS name received by it in argument is
non-existent. This is done for the sake of idempotency, however it
caused this bug to go uncaught.
Fixea: https://tracker.ceph.com/issues/65864 Signed-off-by: Rishabh Dave <ridave@redhat.com>
We are currently conducting regular ceph-dencoder tests for backward compatibility.
However, we are omitting tests for forward compatibility.
This suite will introduce tests against the ceph-objects-corpus to address forward
compatibility issues that may arise.
the script will install N-2 version and run against the latest version corpus objects
that we have, then install N-1 to N version and check them as well.