Patrick Donnelly [Thu, 23 May 2024 01:16:32 +0000 (21:16 -0400)]
Merge PR #57469 into main
* refs/pull/57469/head:
mds: set dispatcher order
mds: use regular dispatch for processing beacons
msg: add priority to dispatcher invocation order
mds: note when dispatcher is called
Patrick Donnelly [Thu, 23 May 2024 01:15:07 +0000 (21:15 -0400)]
Merge PR #57215 into main
* refs/pull/57215/head:
doc: document new --output-file switch
test/cli: ignore tmp_file_template
qa/workunits: add --output-file test in cephtool workunit
common,ceph: add output file switch to dump json to
common/options: add configs for temporary files made by daemons
common/Formatter: write the pending string on flush
Reviewed-by: Leonid Usov <leonid.usov@ibm.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Dan Mick [Wed, 22 May 2024 22:25:51 +0000 (15:25 -0700)]
doc/dev/release-process.rst: note new 'project' arguments
Support added to the release scripts (from ceph-build.git) to
work for ceph-iscsi, so 'project' must be passed to these scripts,
and will appear in the prerelease pathnames. See also
https://github.com/ceph/ceph-build/pull/2243 and
https://github.com/ceph/ceph-container/pull/2210
Casey Bodley [Wed, 8 May 2024 16:42:42 +0000 (12:42 -0400)]
common/async: add spawn_throttle for bounded concurrency with optional_yield
a primitive for structured concurrency with stackful coroutines from
boost::asio::spawn(). this relies on spawn()'s support for per-op
cancellation to guarantee that the lifetime of child coroutines won't
exceed the lifetime of their spawn_throttle, making it safe for children
to access memory from their parent's stack
by taking optional_yield in the constructor, spawn_throttle transparently
supports synchronous execution (where optional_yield is empty) and
asynchronous execution within a stackful coroutine (where optional_yield
contains the parent's yield_context)
Matthew Vernon [Wed, 22 May 2024 15:31:33 +0000 (16:31 +0100)]
doc: clarify use of location: in host spec
It wasn't clear that you can specify more than one element of the CRUSH hierarchy in a spec file, nor that it might be useful to do so (e.g. to ensure the host ends up beneath the default root).
So update the text to make it clearer, and similarly the example.
Signed-off-by: Matthew Vernon <mvernon@wikimedia.org>
Patrick Donnelly [Tue, 21 May 2024 02:38:44 +0000 (22:38 -0400)]
Merge PR #57332 into main
* refs/pull/57332/head:
mds/quiesce: drop remote authpins before waiting for the quiesce lock
qa/cephfs/test_quiesce: test proper handling of remote authpins
mds: don't clear `AUTHPIN_FROZEN` until `FROZEN` in rename_prep
mds: enhance the `lock path` asok command
mds/quiesce: overdrive fragmenting that's still freezing
revert: mds: provide a mechanism to authpin while freezing
qa/cephfs/test_quiesce: enhance the fragmentation test
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Mon, 20 May 2024 23:47:22 +0000 (07:47 +0800)]
cmake: link rados_snap_set_diff_obj and krbd against legacy-option-headers
in c24a6ffe20, we tried to link all target dependent on legacy option
headers against legacy-option-headers, but we missed some of them.
in our CI, we spotted build failure like:
```
FAILED: src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -MF src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o.d -o src/CMakeFiles/rados_snap_set_diff_obj.dir/librados/snap_set_diff.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/librados/snap_set_diff.cc:7:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:7:10: fatal error: 'osd_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[111/1748] Generating immutable-object-cache_options.cc, ../../../include/immutable-object-cache_legacy_options.h
[112/1748] Building CXX object src/CMakeFiles/krbd.dir/krbd.cc.o
FAILED: src/CMakeFiles/krbd.dir/krbd.cc.o
/usr/bin/ccache /usr/bin/clang++-14 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_ASIO_HAS_IO_URING -DBOOST_ASIO_NO_TS_EXECUTORS -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_REENTRANT -D_THREAD_SAFE -D__CEPH__ -D__STDC_FORMAT_MACROS -D__linux__ -I/home/jenkins-build/build/workspace/ceph-api/build/src/include -I/home/jenkins-build/build/workspace/ceph-api/src -isystem /opt/ceph/include -isystem /home/jenkins-build/build/workspace/ceph-api/build/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/api/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/exporters/jaeger/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/ext/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/jaegertracing/opentelemetry-cpp/sdk/include -isystem /home/jenkins-build/build/workspace/ceph-api/src/xxHash -isystem /home/jenkins-build/build/workspace/ceph-api/src/fmt/include -g -Werror -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DBOOST_PHOENIX_STL_TUPLE_H_ -Wall -fno-strict-aliasing -fsigned-char -Wtype-limits -Wignored-qualifiers -Wpointer-arith -Werror=format-security -Winit-self -Wno-unknown-pragmas -Wnon-virtual-dtor -Wno-ignored-qualifiers -ftemplate-depth-1024 -Wpessimizing-move -Wredundant-move -Wno-inconsistent-missing-override -Wno-mismatched-tags -Wno-unused-private-field -Wno-address-of-packed-member -Wno-unused-function -Wno-unused-local-typedef -Wno-varargs -Wno-gnu-designator -Wno-missing-braces -Wno-parentheses -Wno-deprecated-register -DCEPH_DEBUG_MUTEX -D_GLIBCXX_ASSERTIONS -fdiagnostics-color=auto -std=c++20 -MD -MT src/CMakeFiles/krbd.dir/krbd.cc.o -MF src/CMakeFiles/krbd.dir/krbd.cc.o.d -o src/CMakeFiles/krbd.dir/krbd.cc.o -c /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc
In file included from /home/jenkins-build/build/workspace/ceph-api/src/krbd.cc:44:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/MonMap.h:28:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/mon/mon_types.h:20:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/include/Context.h:19:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/dout.h:29:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-api/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-api/src/common/options/legacy_config_opts.h:11:10: fatal error: 'rgw_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
so in this change, we link the related targets to
`legacy-option-headers` as well to fulfill the build dependency.
Leonid Usov [Sat, 11 May 2024 14:00:21 +0000 (17:00 +0300)]
mds: enhance the `lock path` asok command
* when the quiesce lock is taken by this op, don't consider the inode `quiesced`
* drop all locks taken during traversal
* drop all local authpins after the locks are taken
* add --await functionality that will block the command until locks are taken or an error is encountered
* return the RC that represents the operation result. 0 if the operation was scheduled and hasn't failed so far
* add authpin control flags
** --ap-freeze - to auth_pin_freeze the target inode
** --ap-dont-block - to pass auth_pin_nonblocking when acquiring the target inode locks
Leonid Usov [Thu, 9 May 2024 01:39:12 +0000 (04:39 +0300)]
mds/quiesce: overdrive fragmenting that's still freezing
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
Leonid Usov [Sun, 12 May 2024 16:19:34 +0000 (19:19 +0300)]
revert: mds: provide a mechanism to authpin while freezing
This is a functional revert of a9964a7ccc4394f923fb0f1c76eb8fa03fe8733d
git revert was giving too many conflicts, as the code has changed
too much since the original commit.
The bypass freezing mechanism lead us into several deadlocks,
and when we found out that a freezing inode defers reclaiming
client caps, we realized that we needed to try a different approach.
This commit removes the bypass freezing related changes to clear way
for a different approach to resolving the conflict between quiesce
and freezing.
Zac Dover [Mon, 20 May 2024 11:55:16 +0000 (21:55 +1000)]
doc/cephfs: edit "Cloning Snapshots" in fs-volumes.rst
Edit the "Cloning Snapshots" section in doc/cephfs/fs-volumes.rst. This
commit represents only a grammar pass. A future commit (and future PR)
will separate this section into subsections by command.
Zac Dover [Mon, 20 May 2024 06:29:44 +0000 (16:29 +1000)]
doc/cephfs: separate commands into sections
Separate commands so that each command has its own subsection in the
section "FS Subvolumes" in the file doc/cephfs/fs-volumes.rst.
Previously, the list of commands for manipulating subvolumes was one
long, unbroken list and the beginning of one section could easily be
mistaken for the end of the previous section.
Rishabh Dave [Thu, 16 May 2024 16:57:10 +0000 (22:27 +0530)]
src/ptl-tool: allow not pushing branch to ceph-ci
Sometimes we need a branch but don't want to launch builds on shaman for
it. For such cases, provide an option that allows not pushing the branch
to ceph-ci.
Creating a branch that'll only be passed to "teuthology-suite" option
"--suite-branch" is an example of such a case.
jiawd [Fri, 12 Nov 2021 03:48:56 +0000 (03:48 +0000)]
osd: full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest
when write before truncate, need trim length, if truncate is to 0,
write is [0~128k], write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k~128k], write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k~128k], write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232~128k], write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).
so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.
Zac Dover [Sun, 19 May 2024 00:00:29 +0000 (10:00 +1000)]
doc/cephfs: Squid and later - subvolume quiesce
Add a note to the "Subvolume quiesce" section that says that the
information in the section applies only to the Squid and later releases
of Ceph. This is included here so that I don't overwrite the Reef and
Quincy documentation with irrelevant information, and so that I don't
overwrite the Squid information with blank space where the "Subvolume
quiesce" section should be.
Rishabh Dave [Thu, 16 May 2024 07:00:49 +0000 (12:30 +0530)]
qa/cephfs: block buggy tests in test_admin.py
Block test_idem_unaffected_root_squash temporarily and
test_multifs_single_path_rootsquash.
This test fails due to a known bug. Block it temporarily so that
test_admin.py can run fully and PRs under QA can be tested fully.
Otherwise, this test fails and that halts test_admin.py, which leaves
the PR partially untested.
This failure is then seen as an unrelated failure which lets the buggy
code get merged. This has happened recently.
Rishabh Dave [Thu, 16 May 2024 16:30:01 +0000 (22:00 +0530)]
qa/cephfs: add MDS_CLIENTS_BROKEN_ROOTSQUASH to ignorelist
MDS_CLIENTS_BROKEN_ROOTSQUASH is generated and expected by
test_rootsquash_nofeature but it hasn't be added to ignorelist as a
result of which QA code marks the job as failed even though all tests
finished running successfully.
Introduced-by: bccc8ceb471c441ec04d7eb2c353630f8c5ce843 Fixes: https://tracker.ceph.com/issues/66075 Signed-off-by: Rishabh Dave <ridave@redhat.com>