this change was created in the same spirit of b8c30a79.
in BlueFS.test_shared_alloc and BlueFS.test_shared_alloc_sparse, we
keep the return value of `fs.get_perf_counters()`, and dereference it
after umounting the fs, but the `PerfCounters*` pointer returned from
`fs.get_perf_counters()` is destroyed in `BlueFS::_shutdown_logger()`
which is in turn called by `BlueFS::umount()`. so ASan points this out:
```
==548153==ERROR: AddressSanitizer: heap-use-after-free on address 0x6110000336c0 at pc 0x7fc810326654 bp 0x7ffd869be8f0 sp 0x7ffd869be8e8
READ of size 8 at 0x6110000336c0 thread T0
#0 0x7fc810326653 in ceph::common::PerfCounters::get(int) const /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/perf_counters.cc:246:8
#1 0x564e7a5397a5 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1265:3
#2 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#3 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#4 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#5 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#6 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#7 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#8 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#9 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#10 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#11 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#12 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#13 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#14 0x7fc80d775e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#15 0x564e7a4296a4 in _start (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x2856a4) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
0x6110000336c0 is located 0 bytes inside of 208-byte region [0x6110000336c0,0x611000033790)
freed by thread T0 here:
#0 0x564e7a4e7b1d in operator delete(void*) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluefs+0x343b1d) (BuildId: fd4e4e0b1c2f9a3b0c1a7051d8ed68b3576e3277)
#1 0x564e7a686ce3 in BlueFS::_shutdown_logger() /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:462:3
#2 0x564e7a6a9b55 in BlueFS::umount(bool) /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueFS.cc:1076:3
#3 0x564e7a539767 in BlueFS_test_shared_alloc_sparse_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1262:6
#4 0x564e7a644006 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#5 0x564e7a5fdbc2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#6 0x564e7a5ae7ec in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2680:5
#7 0x564e7a5b0822 in testing::TestInfo::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2858:11
#8 0x564e7a5b1e5b in testing::TestSuite::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:3012:28
#9 0x564e7a5cf2e8 in testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5723:44
#10 0x564e7a64c8b6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2605:10
#11 0x564e7a604662 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2641:14
#12 0x564e7a5ce672 in testing::UnitTest::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:5306:10
#13 0x564e7a55a410 in RUN_ALL_TESTS() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/include/gtest/gtest.h:2486:46
#14 0x564e7a551295 in main /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluefs.cc:1609:10
#15 0x7fc80d775d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
```
in this change, instead of keeping `logger` across the `umount()` and
`mount()` calls, we get another instance of `logger`, query it for
the perf counter that we are interested, and compare the value
to see if it is unchanged.
test_multifs_single_path_rootsquash was never run with vstart_runner.py
or with teuthology and is therefore full of bugs. Fix it to make sure it
runs fine.
Introduced-by: 1fda8ed2d4a9 Fixes: https://tracker.ceph.com/issues/65246 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Thu, 11 Apr 2024 16:39:45 +0000 (22:09 +0530)]
qa/cephfs: run test_multifs_single_path_rootsquash for kclient too
Root squash is valid for kclient too, Kotresh ran test recently fo it
against main branch. Therefore it is safe to remove.
https://github.com/ceph/ceph/pull/56846#discussion_r1587507868
Some test jobs fail with `local_shared_foreign_ptr: Assertion `ptr && *ptr' failed`
It seems that we attempt to use a connection which is not yet ready to use
after setting up the daemons on boot.
Adjust the mgr_stats_period to allow more time for the daemons to set up.
See: https://tracker.ceph.com/issues/62162#note-10
Note: This is not a fix but more of a temporary solution to avoid noise
in the testing suite (Tracker stays open).
crimson/osd: make osd_op_params::at_version coherent with last log entry
Before this commit we were doing something like:
1. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
2. produce a log entry at such version.
3. increment `at_version` for the sake of a further production
that may never come.
The problem is `osd_op_params::at_version` is higher by one
than the last log entry which hurts at later stages of
`osd_op_params` processing (I was hit in the shared EC code
by the assertion in `PG::op_applied`).
This patch changes the algorithm to:
A. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
B. increment `at_version` for the sake of the very next production.
C. produce a log entry at this version.
* refs/pull/57059/head:
mds: do not try fragmenting or exporting a quiesced directory
mds: set/test ALL_LOCKED on fragment_dir request
mds: pass bypassfreezing to parent auth pin req
qa: add quiesce tests during fragmentation
qa: translate empty output from rank_tell to empty dict
qa: move reqid_tostr helper
Patrick Donnelly [Tue, 30 Apr 2024 20:46:06 +0000 (16:46 -0400)]
Merge PR #56997 into main
* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Patrick Donnelly [Tue, 30 Apr 2024 16:19:31 +0000 (12:19 -0400)]
Merge PR #56934 into main
* refs/pull/56934/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters
Ken Dreyer [Mon, 29 Apr 2024 20:29:11 +0000 (16:29 -0400)]
ceph.spec.in: ceph-mgr-dashboard does not require werkzeug
Nothing in the dashboard codebase imports werkzeug. It appears this was
leftover from the time when the dashboard was packaged with the rest of
the mgr modules.
Fixes: https://tracker.ceph.com/issues/65693 Signed-off-by: Ken Dreyer <kdreyer@ibm.com>
Add a list of default monitor images to the documentation. This commit
is made in response to a request from Eugen Block, and is made using the
information developed by Mr Block here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/.
Adam King [Mon, 29 Apr 2024 17:54:37 +0000 (13:54 -0400)]
qa/cephadm: ignore stray daemon warning during rados_api_tests
The "stray daemon" that is getting logged about in this test is
from "stray daemon laundry.pid70383 on host smithi027 not managed by cephadm".
It seems the rados_api_tests is creating some additional "laundry" entity
during these tests that gets reported as an actual daemon in the mgr,
but cephadm is unaware of it, resulting in the warning. Originally
we thought to maybe add "laundry" itself to the ignorelist, but
without an additional patch that added extra logging for debug
purposes (which can't be merged) the log statement found in
the logs due to this problem will not say what daemon it found
to be stray. There will just be a generic warning about a stray
daemon. In a real cluster, a user would then check "ceph health detail"
to find out what daemon is stray, but the log scraper can't do this
and just fails the test due to the presence of the warning.