test/ceph_test_rados: fix seeking over virtual, in-memory objects
`ceph_test_rados` maintains a database of in-memory pseudo-objects
that are used to dynamically 1) generate content for writing into
RADOS objects and 2) regenerate it later when comparing buffers of
completed RADOS reads.
Content of these objects is ephemeral; to preserve space we store
only starting parameters for a pseudo-random number generator.
However, this memory optimization requires strict management of
calls to the PRGN. Unfortunately, adding partial reads support has
unveiled a bug in this aspect: `ObjectDesc::iterator::seek()` was
missing a call to underlying `RandGenerator::iterator_impl::seek()`,
resulting in mismatches on read buffer validation.
osd: drop the is_systematic from ErasureCodeInterface
This method has introduced by the intial EC partial read
effort. However, at the moment there is no implementation
of the `ErasureCodeInterface` that returns `false`.
In other words: all out EC algorithms are systematic
(for explanation of this term please refer to Wikipedia:
https://en.wikipedia.org/wiki/Systematic_code).
osd: EC Partial Stripe Reads (Retry of #23138 and #52746)
This commit is a further ressurection of the EC partial reads
concept; this time of the Mark Nelson's work sent as PR #52746.
The modifications in this commit are mostly about settling
Mark's work on top of the recent rework of `ECBackend` which
had shared the EC codebase with the crimson-osd.
At the original description says, Mark's work is based on earlier
attempt from Xiaofei Cui.
Therefore credits go to:
* Mark Nelson (Clyso),
* Xiaofei Cui (cuixiaofei@sangfor.com.cn).
The original commit description is preserved below:
> This is a re-implementation of PR #23138 rebased on main with a couple of nitpicky changes to make the code a little more clear (to me at least). Credit goes to Xiaofei Cui [cuixiaofei@sangfor.com.cn](mailto:cuixiaofei@sangfor.com.cn) for the original implementation.
>
> Looking at the original PR's review, it does not appear that we can use the same technique as in 468ad4b. We don't have the ReadOp yet. I'm not sure if @gregsforytwo's idea to query the plugin works, but it's clear we are not doing the efficient thing from the get-go here.
>
> The performance and efficiency benefits for small random reads appears to be quite substantial, especially for large stripe widths.
Lucian Petrut [Fri, 24 May 2024 10:03:11 +0000 (10:03 +0000)]
rbd-wnbd: wait for the disk cleanup to complete
The WNBD disk removal workflow is asynchronous, which is why we'll
need to wait for the cleanup to complete when stopping the service.
The "disconnect_all_mappings" function is moved to
RbdMappingDispatcher::stop, allowing us to access the mapping list
more easily and reject new mappings after a stop has been requested.
Venky Shankar [Wed, 29 May 2024 09:34:58 +0000 (15:04 +0530)]
Merge PR #55758 into main
* refs/pull/55758/head:
doc: update 'journal reset' command with --yes-i-really-really-mean-it
qa: fix cephfs-journal-tool command options and make fs inactive
cephfs-journal-tool: Add warning messages during 'journal reset' and prevent execution on active fs
Patrick Donnelly [Tue, 28 May 2024 16:46:08 +0000 (12:46 -0400)]
Merge PR #57579 into main
* refs/pull/57579/head:
mds/quiesce: disable quiesce root debug parameters by default
mds/quiesce-agt: never send a synchronous ack
mds/quiesce-agt: add test for a rapid async ack
mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
mds/quiesce: overdrive an export if it hasn't frozen the tree yet
mds/quiesce: quiesce_inode should not hold on to remote auth pins
qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
mds: add `--lifetime` parameter to the `lock path` asok command
mds/quiesce: accept a regular file as the quiesce root
mds: command_quiesce_path: rename `--wait` to `--await` for consistency
mds: command_quiesce_path: do not block the asok thread and return an adequate rc
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
RGW: Remove get_obj_state()/set_obj_state from SAL
RGWObjState is the state for the StoreObject class. It has historically
been accessible via get_obj_state()/set_obj_state(), but the double
pointer nature of this access has caused multiple bugs, and the
RGWObjState itself is an implementation detail that doesn't need to be
exposed.
Instead, add a load_obj_state() that loads the state from the store, and
use proper getters/setters for the data.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Rishabh Dave [Mon, 27 May 2024 19:37:35 +0000 (01:07 +0530)]
doc/developer_guide: update doc about installing teuthology
There are 2 more ways to install teuthology. Approach with boostrap
script is easier and more convenient while other approach is more
elaborate but manual, document both of them. Don't delete the currently
documented approach because it lets users install teuthology
conveniently in a custom virtual environment. So, keep all three.
Ilya Dryomov [Mon, 27 May 2024 13:56:26 +0000 (15:56 +0200)]
qa/suites/rbd: override extra_system_packages directly on install task
[1] and [2] added support for applying extra_system_packages overrides
directly on install task, but at the same time broke our long standing
workaround where we sneaked extra_system_packages directive in through
an override on ceph task. This is likely getting addressed in [3], but
it's better to not rely on this odd feature in the first place.
Zac Dover [Mon, 27 May 2024 11:09:40 +0000 (21:09 +1000)]
doc/cephfs: s/subvolumegroups/subvolume groups
Use the term "subvolume groups" instead of "subvolumegroups" where the
term appears in plain English. The string "subvolumegroups" is correct
in commands, and remains unchanged.
Also add formatting to command output, to make clearer that the output
is indeed output.
avanthakkar [Fri, 19 May 2023 11:37:28 +0000 (17:07 +0530)]
mgr/dashboard: add helpers for compression in pool form
Fixes: https://tracker.ceph.com/issues/61297 Signed-off-by: avanthakkar <avanjohn@gmail.com>
Adding helpers for compression mode, algorithm, min/max blob size and
compression ratio which is set to 0.875 as default.
The number of journal segments should not be based on the committed
journal_head. Otherwise, if a new journal segment is just opened and the
committed journal_head hasn't been updated, the result will be wrong.
This causes ceph_assert(get_segments_reclaimable() == 0) in
SegmentCleaner::get_next_reclaim_segment().
Leonid Usov [Mon, 20 May 2024 16:17:04 +0000 (19:17 +0300)]
mds/quiesce: overdrive an export if it hasn't frozen the tree yet
Just like with the fragmenting, we should abort an ongoing export
if a quiesce is attempted for the directory.
To minimize the stress for the system, we only allow the abort
if the export hasn't yet managed to freeze the tree. If that is the case,
then quiesce will have to wait for the export to finish.
Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Leonid Usov [Mon, 20 May 2024 22:03:15 +0000 (01:03 +0300)]
mds/quiesce: quiesce_inode should not hold on to remote auth pins
1. avoid taking a remote authpin for the quiesce lock
2. drop remote authpins that were taken because of other locks
We should not be forcing a mustpin when taking quiesce lock.
This creates unnecessary overhead due to the distributed nature
of the quiesce: all ranks will execute quiesce_inode, including
the auth rank, which will authpin the inode.
Auth pinning on the auth rank is important to synchronize quiesce
with operations that are managed by the auth, like fragmenting
and exporting.
If we let a remote quiesce process take a foreign authpin then
it may block freezing on the auth, which will stall quiesce locally.
This wouldn't be a problem if the quiesce that is blocked on the auth
and the quiesce that's holding a remote authpin from the replica side
were unrelated, but in our case it may be the same logical quiesce
that effectively steps on its own toes. This creates an opportunity
for a deadlock.
Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Kefu Chai [Thu, 23 May 2024 04:47:26 +0000 (12:47 +0800)]
ceph-volume: use importlib from stdlib on Python 3.8 and up
since packaging was apparently removed from pkg_resources, let's use
importlib.metadata when it is available and pkg_resources on older
Python versions.
Kefu Chai [Fri, 24 May 2024 09:51:55 +0000 (17:51 +0800)]
cmake: : link shec_utils against legacy-option-headers
in c24a6ffe20, we tried to link all target dependent on legacy option
headers against legacy-option-headers, but we missed some of them.
in our CI, we spotted build failure like:
```
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/erasure-code/ErasureCode.cc:26:
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/osd/osd_types.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/ceph_context.h:41:
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/config_proxy.h:6:
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/config.h:27:
In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/config_values.h:59:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/common/options/legacy_config_opts.h:1:10: fatal error: 'global_legacy_options.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
so in this change, we link the shec_utils to `legacy-option-headers`
as well to fulfill the build dependency.
Ilya Dryomov [Thu, 23 May 2024 16:15:08 +0000 (18:15 +0200)]
.github: expand tests label to all files under qa
The test job definition under qa/suites is an integral part of almost
any test. Often, the test logic is split between the task or workunit
and respective snippet(s) under qa/suites.
Other files under qa are less used, but still related to nothing but
testing, so just add the label on all of it.