crimson/osd: make osd_op_params::at_version coherent with last log entry
Before this commit we were doing something like:
1. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
2. produce a log entry at such version.
3. increment `at_version` for the sake of a further production
that may never come.
The problem is `osd_op_params::at_version` is higher by one
than the last log entry which hurts at later stages of
`osd_op_params` processing (I was hit in the shared EC code
by the assertion in `PG::op_applied`).
This patch changes the algorithm to:
A. initialize `at_version` with PG::projected_last_update`
**incremented by one**.
B. increment `at_version` for the sake of the very next production.
C. produce a log entry at this version.
Patrick Donnelly [Tue, 30 Apr 2024 20:46:06 +0000 (16:46 -0400)]
Merge PR #56997 into main
* refs/pull/56997/head:
pybind/mgr: disable sqlite3/python autocommit
qa/tasks/mgr: add tests for sqlite autocommit
qa/tasks/vstart_runner: run daemons in foreground
qa/tasks/vstart_runner: add missing poll method
qa/suites/rados/mgr: add cli/devicehealth tasks
qa: reorganize mgr unit tests
qa: use position-independent link
qa: add missing terminating newline
pybind/mgr: add killpoint for sqlite3 database setup
mgr: allow specifying module option level
mon/MgrMonitor: promote standby when unsetting down flag
mon/MgrMonitor: only drop active if exists
Patrick Donnelly [Tue, 30 Apr 2024 16:19:31 +0000 (12:19 -0400)]
Merge PR #56934 into main
* refs/pull/56934/head:
mds: move drop_locks to directly after rdonly check
qa: test quiesce.block is replicated
qa: test that ceph.dir.subvolume is replicated properly
mds: add debug "lock path" command
qa: move reqid_tostr helper
qa: return run_shell process for waiters
Add a list of default monitor images to the documentation. This commit
is made in response to a request from Eugen Block, and is made using the
information developed by Mr Block here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/.
Adam King [Mon, 29 Apr 2024 17:54:37 +0000 (13:54 -0400)]
qa/cephadm: ignore stray daemon warning during rados_api_tests
The "stray daemon" that is getting logged about in this test is
from "stray daemon laundry.pid70383 on host smithi027 not managed by cephadm".
It seems the rados_api_tests is creating some additional "laundry" entity
during these tests that gets reported as an actual daemon in the mgr,
but cephadm is unaware of it, resulting in the warning. Originally
we thought to maybe add "laundry" itself to the ignorelist, but
without an additional patch that added extra logging for debug
purposes (which can't be merged) the log statement found in
the logs due to this problem will not say what daemon it found
to be stray. There will just be a generic warning about a stray
daemon. In a real cluster, a user would then check "ceph health detail"
to find out what daemon is stray, but the log scraper can't do this
and just fails the test due to the presence of the warning.
At present, if a transaction gets interrupted right after it enters
WritePipeline::ReserveProjectedUsage and before any later continuations
get executed, WritePipeline::ReserveProjectedUsage will be locked
forever.
crimson/osd/osdop_params:Unify OpsExecuter::user_modify and osd_op_params_t::user_modify
Before this change OpsExecuter::user_modify was maintained in OpsExecuter::do_write_op.
However, osd_op_params->user_modify was not updated when used in OpsExecuter::prepare_transaction
crimson/osd/pg: SnapTrimEvent to support interrupts
SnapTrimEvent operations are scheduled from `PG::on_active_actmap()`
using a `seastar::do_until` loop. This commit replaces the loop type
into an `interruptor::repeat` and SnapTrimEvent are now scheduled by
`start_operation_may_interrupt`.
Previously, `SnapTrimEvent::start` handled interruptions by returning
a `crimson::ct_error::eagain::make();`. Now, the errorator is directly
returned via the `snap_trim_event_ret_t` and interrupts the loop
described above.
As a result, interruptions originated by interval changes are now
supported by SnapTrimEvent.
test/ceph_crypto: define __has_feature if the compiler doesn't have it
Refer to https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005ffeature.html
and https://clang.llvm.org/docs/LanguageExtensions.html#has-feature-and-has-extension
for further information
so, in this change, let's manage the lifecycle of the `CrushWrapper`
instance with a smart pointer, so that it is destroyed and free'd
properly, and this should silence the ASan warning.
erasure-code/shec: use free() to release alloc()'ed memory chunk
ASan warns
```
==445793==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x602000039b10
#0 0x5604a544112d in operator delete(void*) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_erasure_code_shec_all+0x1e012d) (BuildId: 8cfc74d22471b6905f9b23304aed2af945265a13)
#1 0x7fc14752f588 in ErasureCodeShecTableCache::~ErasureCodeShecTableCache() /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/shec/ErasureCodeShecTableCache.cc:61:19
#2 0x5604a544ccbe in ParameterTest_parameter_all_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/erasure-code/TestErasureCodeShec_all.cc:263:1
...
0x602000039b10 is located 0 bytes inside of 4-byte region [0x602000039b10,0x602000039b14)
allocated by thread T0 here:
#0 0x5604a5405afe in malloc (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_erasure_code_shec_all+0x1a4afe) (BuildId: 8cfc74d22471b6905f9b23304aed2af945265a13)
#1 0x7fc1474c9617 in reed_sol_vandermonde_coding_matrix /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/jerasure/jerasure/src/reed_sol.c:86:10
#2 0x7fc147528634 in ErasureCodeShec::shec_reedsolomon_coding_matrix(int) /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/shec/ErasureCodeShec.cc:514:12
#3 0x7fc147526cd8 in ErasureCodeShecReedSolomonVandermonde::prepare() /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/shec/ErasureCodeShec.cc:390:14
#4 0x7fc1475187aa in ErasureCodeShec::init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::ostream*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/shec/ErasureCodeShec.cc:57:3
```
where we use `delete` to free the encoder matrix allocated using
`malloc()`. as jerasure is a library implemented in C language,
unless we want to reimplment it in C++, we should use `free()` to
free the memory chunk allocated by
`reed_sol_vandermonde_coding_matrix()`. also, please note,
jerasure does not provide a function to free the memory allocated
by this function, we have to explore its implementation, and use
`malloc()` directly. this should silence the ASan warning.
erasure-code/shec: replace 0 with nullptr when appropriate
0 fails to send the message to human readers that the variable is
a pointer, but nullptr does. for improving the readability, let's
use nullptr when the variable in question is a pointer.
Explain that an error message received in response to
"redirect_resolve_ip_addr True" might be caused by having an
insufficiently recent release of Ceph running in your cluster.
John Mulligan [Tue, 23 Apr 2024 12:16:19 +0000 (08:16 -0400)]
qa/tasks/cephadm: add a wait_for_service_not_present task func
Add a wait_for_service_not_present task function that will wait until a
given service name is not present in the list of running cephadm
services. This is intended for testing service cleanup operations.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 30 Mar 2024 20:50:29 +0000 (16:50 -0400)]
doc/mgr: add documentation for new smb mgr module
Add initial documentation for the new smb mgr module. It doesn't cover
every possible thing or expected future changes but it should cover
the basics of interacting with the module from the cli.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 20 Mar 2024 18:08:24 +0000 (14:08 -0400)]
ceph.spec.in: add smb module and python-dataclasses dependency
The only distro ceph squid+ is building for at the moment that does not
already have a python version that includes dataclasses is centos/rhel
8. Add a dependency for the backport package on rhel8.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 30 Jan 2024 21:49:25 +0000 (16:49 -0500)]
pybind/mgr: use black & isort on the smb module
Provide tox envs that check or reformat code with black and isort,
currently applied to only the new smb module.
This is similar to what we recently did for enabling tox in the
cephadmlib dir as it only applies to new code. However, other modules
that want to opt-in to automated, python-community-wide typical,
stop-thinking-and-let-tools-do-it approach to code formatting can
be added to the new envs later on.
Signed-off-by: John Mulligan <jmulligan@redhat.com>