* refs/pull/57192/head:
PendingReleaseNotes: add note on the client incompatibility health warning and feature bit
doc/cephfs: add client_mds_auth_caps client feature bit
doc/cephfs: add missing client feature bits
doc/cephfs: document MDS_CLIENTS_BROKEN_ROOTSQUASH health error
qa: add tests for MDS_CLIENTS_BROKEN_ROOTSQUASH
mds: raise health warning if client lacks feature for root_squash
mon/MDSMonitor: add note about missing metadata inclusion
mds: check relevant caps for fs include root_squash
mds: refactor out fs_name match in MDSAuthCaps
qa: test for root_squash with multiple caps
qa: pass kwargs to mount from remount
qa: simplify update_attrs and only update relevant keys
client: allow overriding client features
Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Rishabh Dave <ridave@redhat.com>
Ronen Friedman [Tue, 7 May 2024 13:07:21 +0000 (16:07 +0300)]
tests/rgw: fix ints returned where a string is expected
Modifying test_rgw_iam_policy.cc to avoid C++23
compilation errors. C++23 does not allow int-to-string
conversions, and '0' cannot be returned where a string
is expected.
mds: raise health warning if client lacks feature for root_squash
Rather than evict all clients lacking this feature bit, raise a health error
that pushes the administrator to address it. This avoids the surprise of having
all affected clients suddenly evicted in the cluster.
Fixes: https://tracker.ceph.com/issues/65733 Fixes: 954ed30 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mon/MDSMonitor: add note about missing metadata inclusion
There is a "client_count" metadata on the health warning that apparently was
intended to be used for aggregating warnings but never was. Add a TODO item for
that.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: check relevant caps for fs include root_squash
When denying client reconnects because the MDS caps include root_squash and the
client features do not include CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK, ensure those
caps are only for the file system the MDS is joined to.
Fixes: https://tracker.ceph.com/issues/65733 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Mon, 6 May 2024 06:16:01 +0000 (08:16 +0200)]
qa/workunits/rbd: wait for replaying status in bootstrap tests
wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.
rbd-mirror: remove callout when destroying pool replayer
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.
this change was created in the same spirit of b8c30a79.
in BlueFS.test_shared_alloc and BlueFS.test_shared_alloc_sparse, we
keep the return value of `fs.get_perf_counters()`, and dereference it
after umounting the fs, but the `PerfCounters*` pointer returned from
`fs.get_perf_counters()` is destroyed in `BlueFS::_shutdown_logger()`
which is in turn called by `BlueFS::umount()`. so ASan points this out:
```
==548153==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000336c0 at pc 0x7fc810326654 bp 0x7ffd869be8f0 sp 0x7ffd869be8e8
READ of size 8 at 0x6110000336c0 thread T0
#0 0x7fc810326653 in ceph::common::PerfCounters::get(int) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/perf_counters.cc:246:8
#1 0x564e7a5397a5 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1265:3
#2 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#3 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#4 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#5 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#6 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#7 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#8 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#9 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#10 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#11 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#12 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#13 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#14 0x7fc80d775e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#15 0x564e7a4296a4 in _start (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x2856a4) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
0x6110000336c0 is located 0 bytes inside of 208-byte region [0x6110000336c0,0x611000033790)
freed by thread T0 here:
#0 0x564e7a4e7b1d in operator delete(void*) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x343b1d) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
#1 0x564e7a686ce3 in BlueFS::_shutdown_logger() /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:462:3
#2 0x564e7a6a9b55 in BlueFS::umount(bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:1076:3
#3 0x564e7a539767 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1262:6
#4 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#5 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#6 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#7 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#8 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#9 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#10 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#11 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#12 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#13 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#14 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#15 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
```
in this change, instead of keeping `logger` across the `umount()` and
`mount()` calls, we get another instance of `logger`, query it for
the perf counter that we are interested, and compare the value
to see if it is unchanged.
Casey Bodley [Fri, 3 May 2024 19:43:39 +0000 (15:43 -0400)]
rgw: move publish_complete() back to RGWCompleteMultipart::execute()
move publish_complete() and meta_obj->delete_object() back to execute()
so they only run on success. this allows several member variables to
move back to execute()'s stack as well
Casey Bodley [Fri, 3 May 2024 19:29:00 +0000 (15:29 -0400)]
rgw: CompleteMultipart uses s->object for Notification
get_notification() should be associated with the target object
s->object. the meta_obj has the wrong object name, so required passing
s->object->get_name() as an extra argument
importantly, Notification no longer depends on the lifetime of meta_obj
to avoid a dangling pointer, while the lifetime of s->object is guaranteed
test_multifs_single_path_rootsquash was never run with vstart_runner.py
or with teuthology and is therefore full of bugs. Fix it to make sure it
runs fine.
Introduced-by: 1fda8ed2d4a9 Fixes: https://tracker.ceph.com/issues/65246 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Thu, 11 Apr 2024 16:39:45 +0000 (22:09 +0530)]
qa/cephfs: run test_multifs_single_path_rootsquash for kclient too
Root squash is valid for kclient too, Kotresh ran test recently fo it
against main branch. Therefore it is safe to remove.
https://github.com/ceph/ceph/pull/56846#discussion_r1587507868
Some test jobs fail with `local_shared_foreign_ptr: Assertion `ptr && *ptr' failed`
It seems that we attempt to use a connection which is not yet ready to use
after setting up the daemons on boot.
Adjust the mgr_stats_period to allow more time for the daemons to set up.
See: https://tracker.ceph.com/issues/62162#note-10
Note: This is not a fix but more of a temporary solution to avoid noise
in the testing suite (Tracker stays open).
crimson/osd: make osd_op_params::at_version coherent with last log entry
Before this commit we were doing something like:
1. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
2. produce a log entry at such version.
3. increment `at_version` for the sake of a further production
that may never come.
The problem is `osd_op_params::at_version` is higher by one
than the last log entry which hurts at later stages of
`osd_op_params` processing (I was hit in the shared EC code
by the assertion in `PG::op_applied`).
This patch changes the algorithm to:
A. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
B. increment `at_version` for the sake of the very next production.
C. produce a log entry at this version.
* refs/pull/57059/head:
mds: do not try fragmenting or exporting a quiesced directory
mds: set/test ALL_LOCKED on fragment_dir request
mds: pass bypassfreezing to parent auth pin req
qa: add quiesce tests during fragmentation
qa: translate empty output from rank_tell to empty dict
qa: move reqid_tostr helper
Patrick Donnelly [Tue, 30 Apr 2024 20:46:06 +0000 (16:46 -0400)]
Merge PR #56997 into main
* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists