mon/MDSMonitor: add note about missing metadata inclusion
There is a "client_count" metadata on the health warning that apparently was
intended to be used for aggregating warnings but never was. Add a TODO item for
that.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
mds: check relevant caps for fs include root_squash
When denying client reconnects because the MDS caps include root_squash and the
client features do not include CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK, ensure those
caps are only for the file system the MDS is joined to.
Fixes: https://tracker.ceph.com/issues/65733 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Mon, 6 May 2024 06:16:01 +0000 (08:16 +0200)]
qa/workunits/rbd: wait for replaying status in bootstrap tests
wait_for_replay_complete() doesn't wait for image status to get
updated. This didn't matter previously because these tests are run on
two different pools and nothing else was following.
rbd-mirror: remove callout when destroying pool replayer
If a pool replayer is removed in an error state (e.g. after failing to
connect to the remote cluster), its callout should be removed as well.
Otherwise, the error would persist causing "daemon health: ERROR"
status to be reported even after a new pool replayer is created and
started successfully.
this change was created in the same spirit of b8c30a79.
in BlueFS.test_shared_alloc and BlueFS.test_shared_alloc_sparse, we
keep the return value of `fs.get_perf_counters()`, and dereference it
after umounting the fs, but the `PerfCounters*` pointer returned from
`fs.get_perf_counters()` is destroyed in `BlueFS::_shutdown_logger()`
which is in turn called by `BlueFS::umount()`. so ASan points this out:
```
==548153==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000336c0 at pc 0x7fc810326654 bp 0x7ffd869be8f0 sp 0x7ffd869be8e8
READ of size 8 at 0x6110000336c0 thread T0
#0 0x7fc810326653 in ceph::common::PerfCounters::get(int) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/perf_counters.cc:246:8
#1 0x564e7a5397a5 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1265:3
#2 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#3 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#4 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#5 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#6 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#7 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#8 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#9 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#10 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#11 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#12 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#13 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#14 0x7fc80d775e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#15 0x564e7a4296a4 in _start (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x2856a4) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
0x6110000336c0 is located 0 bytes inside of 208-byte region [0x6110000336c0,0x611000033790)
freed by thread T0 here:
#0 0x564e7a4e7b1d in operator delete(void*) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x343b1d) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
#1 0x564e7a686ce3 in BlueFS::_shutdown_logger() /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:462:3
#2 0x564e7a6a9b55 in BlueFS::umount(bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:1076:3
#3 0x564e7a539767 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1262:6
#4 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#5 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#6 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#7 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#8 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#9 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#10 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#11 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#12 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#13 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#14 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#15 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
```
in this change, instead of keeping `logger` across the `umount()` and
`mount()` calls, we get another instance of `logger`, query it for
the perf counter that we are interested, and compare the value
to see if it is unchanged.
Casey Bodley [Fri, 3 May 2024 19:43:39 +0000 (15:43 -0400)]
rgw: move publish_complete() back to RGWCompleteMultipart::execute()
move publish_complete() and meta_obj->delete_object() back to execute()
so they only run on success. this allows several member variables to
move back to execute()'s stack as well
Casey Bodley [Fri, 3 May 2024 19:29:00 +0000 (15:29 -0400)]
rgw: CompleteMultipart uses s->object for Notification
get_notification() should be associated with the target object
s->object. the meta_obj has the wrong object name, so required passing
s->object->get_name() as an extra argument
importantly, Notification no longer depends on the lifetime of meta_obj
to avoid a dangling pointer, while the lifetime of s->object is guaranteed
test_multifs_single_path_rootsquash was never run with vstart_runner.py
or with teuthology and is therefore full of bugs. Fix it to make sure it
runs fine.
Introduced-by: 1fda8ed2d4a9 Fixes: https://tracker.ceph.com/issues/65246 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Thu, 11 Apr 2024 16:39:45 +0000 (22:09 +0530)]
qa/cephfs: run test_multifs_single_path_rootsquash for kclient too
Root squash is valid for kclient too, Kotresh ran test recently fo it
against main branch. Therefore it is safe to remove.
https://github.com/ceph/ceph/pull/56846#discussion_r1587507868
Some test jobs fail with `local_shared_foreign_ptr: Assertion `ptr && *ptr' failed`
It seems that we attempt to use a connection which is not yet ready to use
after setting up the daemons on boot.
Adjust the mgr_stats_period to allow more time for the daemons to set up.
See: https://tracker.ceph.com/issues/62162#note-10
Note: This is not a fix but more of a temporary solution to avoid noise
in the testing suite (Tracker stays open).
crimson/osd: make osd_op_params::at_version coherent with last log entry
Before this commit we were doing something like:
1. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
2. produce a log entry at such version.
3. increment `at_version` for the sake of a further production
that may never come.
The problem is `osd_op_params::at_version` is higher by one
than the last log entry which hurts at later stages of
`osd_op_params` processing (I was hit in the shared EC code
by the assertion in `PG::op_applied`).
This patch changes the algorithm to:
A. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
B. increment `at_version` for the sake of the very next production.
C. produce a log entry at this version.
* refs/pull/57059/head:
mds: do not try fragmenting or exporting a quiesced directory
mds: set/test ALL_LOCKED on fragment_dir request
mds: pass bypassfreezing to parent auth pin req
qa: add quiesce tests during fragmentation
qa: translate empty output from rank_tell to empty dict
qa: move reqid_tostr helper
Patrick Donnelly [Tue, 30 Apr 2024 20:46:06 +0000 (16:46 -0400)]
Merge PR #56997 into main
* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Patrick Donnelly [Tue, 30 Apr 2024 16:19:31 +0000 (12:19 -0400)]
Merge PR #56934 into main
* refs/pull/56934/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters
test/librbd: correct expected_overlap in SnapshotCopyup
Changing the end of second interval from 2096640 to copyup_end - 512
with copyup_end potentially set to 1 << order in commit 750e61ac91d7
("librbd: clone copy-on-write operations should preserve sparseness")
was incorrect because the test image size is just 2M. There are no
end-to-end tests for enable_sparse_copyup = false case, so this went
unnoticed.
A year later, commit 38622b5ca12d ("librbd: copyup state machine
should always issue a sparse-read") dropped the respective branch in
CopyupRequest, thus eliminating the reason for branching on
enable_sparse_copyup altogether.
Ken Dreyer [Mon, 29 Apr 2024 20:29:11 +0000 (16:29 -0400)]
ceph.spec.in: ceph-mgr-dashboard does not require werkzeug
Nothing in the dashboard codebase imports werkzeug. It appears this was
leftover from the time when the dashboard was packaged with the rest of
the mgr modules.
Fixes: https://tracker.ceph.com/issues/65693 Signed-off-by: Ken Dreyer <kdreyer@ibm.com>
Add a list of default monitor images to the documentation. This commit
is made in response to a request from Eugen Block, and is made using the
information developed by Mr Block here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/.