These are intended to replace do_osd_ops*. The implementation
is simpler and does not involve passing success and failure
callbacks. It also moves responsibility for dealing with
the MOSDOpReply and client related error handling over to
ClientRequest.
do_osd_op* will be removed once users are switched over.
Samuel Just [Fri, 20 Sep 2024 02:39:08 +0000 (19:39 -0700)]
crimson: PG::submit_error_log returns eversion_t rather than optional
It seems like the motivation here was to allow do_osd_ops_execute to
communicate that it didn't submit an error log by making
maybe_submit_error_log a std::optional<eversion_t>. However,
submit_error_log itself always returns a version. Fix submit_error_log
and compensate in do_osd_ops_execute.
Samuel Just [Fri, 20 Sep 2024 02:23:47 +0000 (02:23 +0000)]
crimson: futures from flush_changes_n_do_ops_effects must not fail
The return signature previously suggested that the second future
returned could be an error. This seemed necessary due to how
effects are handled:
template <typename MutFunc>
OpsExecuter::rep_op_fut_t
OpsExecuter::flush_changes_n_do_ops_effects(
const std::vector<OSDOp>& ops,
SnapMapper& snap_mapper,
OSDriver& osdriver,
MutFunc mut_func) &&
{
...
all_completed =
std::move(all_completed).then_interruptible([this, pg=this->pg] {
// let's do the cleaning of `op_effects` in destructor
return interruptor::do_for_each(op_effects,
[pg=std::move(pg)](auto& op_effect) {
return op_effect->execute(pg);
});
However, all of the actual execute implementations (created via
OpsExecuter::with_effect_on_obc) return a bare seastar::future and
cannot fail.
In a larger sense, it's actually critical that neither future returned
from flush_changes_n_do_ops_effects may fail -- they represent applying
the transaction locally and remotely. If either portion fails, there
would need to be an interval change to recover.
Samuel Just [Thu, 19 Sep 2024 00:59:21 +0000 (00:59 +0000)]
crimson: remove the eagain error from PG::do_osd_ops
The idea here is that PG::do_osd_ops propogates an eagain after starting
a repair upon encountering an eio to indicate that the op should restart
from the top of ClientRequest::process_op.
However, InternalClientRequest's handler for this error simply ignores
it. ClientRequest's handling, while superficially reasonable, doesn't
actually work. Re-calling process_op would mean reentering previous
stages. This is problematic for at least a few reasons:
1. Reentering a prior stage with the same handler doesn't actually work
since the corresponding event entries will already be populated.
2. There might be other ops on the same object waiting on the process
stage. They'd need to be sent back as well in order to preserve
ordering.
Because this mechanism doesn't really seem to be fully baked, let's
remove it for now and try to reintroduce it later after
do_osd_ops[_execute] are a bit simpler.
Samuel Just [Mon, 16 Sep 2024 22:16:37 +0000 (22:16 +0000)]
crimson/osd: move pipelines to osd_operation.h
Each of the two existing pipelines are shared across multiple
ops. Rather than defining them in a specific op or in
osd_operations/common/pg_pipeline.h, just declare them in
osd_operation.h.
Afreen Misbah [Fri, 11 Oct 2024 08:57:24 +0000 (14:27 +0530)]
mgr/dashboard: Fix listener deletion
Listener deletion is broken due to passing wrong gateway address.
Including `traddr` in DELETE API of listener to choose correct gateway address for deletion.
Ronen Friedman [Tue, 8 Oct 2024 13:25:56 +0000 (08:25 -0500)]
qa/standalone/scrub: remove TEST_recovery_scrub_2
That test does no longer match the actual requirements and
implementation of scrubbing.
It was already deactivated in
https://github.com/ceph/ceph/pull/59590. Here - it is
fully removed, mainly for the sake of backporting.
Ronen Friedman [Sat, 5 Oct 2024 12:33:49 +0000 (07:33 -0500)]
osd/scrub: modify ScrubStore contents retrieval
A separate commit added a simple test to verify the new
store implementation (creating both shallow & deep errors),
scrubbing (step 1), deep scrubbing (step 2), then shallow
scrubbing again (step 3). The test verifies that
the results after step 2 include all shallow errors data (*),
and that the results after step 3 include all deep errors
data.
The test highlighted the need to correctly partition and
retrieve the "shards inconsistencies" and the "selected
shard" data, which was not fully implemented in the
previous commit. Thus, this commit adds the following:
- add_object_error() no longer filters out data saved during
deep scrubbing; it also filters less of the shallow scrubs
"shards inconsistencies" data;
- merge_encoded_error_wrappers() now merges the "shards
inconsistencies" data correctly, handling the multiple
scenarios possible.
(*) note the special case of not being able to read the
object's version during deep scrubbing (due to a read
error). In this case - the data collected during the
shallow scrub will not be reported.
common/scrub,osd/scrub: minor cleanups to ScrubStore
Including:
- introducing 'no out param' encode() for the inconsistent wrappers;
- renaming the ambiguous 'empty()' to 'is_empty()';
- removing unused code;
- a few other minor cleanups.
osd/scrub: add dout() capability to the ScrubStore
now that the ScrubSTore object is directly created by the
scrubber, (and has a lifetime that does not extend beyond
the scrubber object), we can add the same dout()
mechanism used by the other scrubber sub-objects.
Note: that mechanism will be changed shortly, so that the
sub-objects would use one prefix() creator supplied by
the Scrubber object.
osd/scrub: directly create or reinit the ScrubStore
The ScrubStore is now directly created or reinitialized by the
Scrubber. Note that the store object is not identical to the
errors DB: the errors DB is an entity in the OSD store (a
collection of OMap entries in a uniquely-named object(s)),
while the ScrubSTore object is a cacher and accessor for
that entity. That one can be recreated or disposed of at
will.
We now do not recreate the ScrubStore object for every scrub.
Naman Munet [Mon, 7 Oct 2024 05:11:29 +0000 (10:41 +0530)]
mgr/dashboard: unable to edit pipe config for bucket level policy of a bucket
Fixes: https://tracker.ceph.com/issues/68387
Fixes Includes:
1) Passing additional parameter for 'user' and 'mode' as the user can be either system/dashboard or other values while creating pipe.
2) Previously while removing the src/dest bucket field, we were getting same old values on editing pipe, but now it will become '*' if empty value passed from frontend.
Signed-off-by: Naman Munet <namanmunet@li-ff83bccc-26af-11b2-a85c-a4b04bfb1003.ibm.com>
Lee Sanders [Fri, 4 Oct 2024 14:13:57 +0000 (15:13 +0100)]
qa/suites/tasks/cbt.py: Deprecating cosbench from Teuthology in preparation for deletion of cosbench support
from CBT. The code being deleting is infrastructure code, no qa test suite uses this function. Therefore it can
be safely deleted.
Vallari Agrawal [Wed, 9 Oct 2024 07:27:32 +0000 (12:57 +0530)]
qa/workunits/nvmeof/setup_subsystem.sh: use --no-group-append
In newer version of nvmeof cli, "subsystem add" needs
this tag to ensure subsystem name is value of --subsystem.
Otherwise, in newer cli version, the gateway group is appended
at the end of the subsystem name.
This fixes the teuthology nvmeof suite (currently all jobs fails
because of this).
Avan Thakkar [Wed, 9 Oct 2024 13:01:11 +0000 (18:31 +0530)]
qa/cephfs: update earmark values to valid ones in test_volumes.py
smb.test is an invalid earmark now it should be either smb or
smb.cluster.<cluster_id>. Update the test_volumes.py to set
valid earmarks wherever used.
JonBailey1993 [Wed, 9 Oct 2024 10:28:42 +0000 (11:28 +0100)]
common/io_exerciser: Modify is_locked_by_me call in ceph_test_rados_io_sequence
is_locked_by_me() is a function of ceph::mutex which is only used in debug builds. By using the ceph_mutex_is_locked_by_me macro, we can neatly make sure we only run this function in debug mode, allowing compilation to no longer be affected when running in release mode.
Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
* refs/pull/60037/head:
test/common: add death test for double !recursive lock
common/test: do not test exception raised from recursive lock
test/common: fix invalid vim mode
common,osdc: remove obsolete ceph::mutex_debugging
common: assert debug mutex lock is not held if !recursive