Patrick Donnelly [Mon, 11 Dec 2023 13:37:39 +0000 (08:37 -0500)]
Merge PR #54726 into main
* refs/pull/54726/head:
PendingReleaseNotes: announce cephfs-shell avail. on rhel9
qa: test fs:shell on all distros
qa: add cephfs-shell to installed rpm packages
ceph.spec.in: enable support for cephfs-shell by default via EPEL9
John Mulligan [Wed, 15 Nov 2023 21:39:07 +0000 (16:39 -0500)]
cephadm: move abstract script handling functions to runscripts.py
Add a new file runscripts.py for the lower-level management of scripts
and related files that are invoked by systemd units. This patch ended up
uglier than I desired because there was a bunch of daemon specific logic
that remains in cephadm.py and those functions all needed to be updated
to avoid calling functions that write to the scripts directly.
Now customizations are done by passing a list of commands: these
commands can be either a string that will be literally added to the
scipt, a list that will be quoted and then added to the script, or
a ContainerCommand which is basically a wrapper around the arguments to
_write_container_cmd_to bash.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 21:18:18 +0000 (17:18 -0400)]
cephadm: add a higher-level function for managing systemd units
Add the function update_files to systemd_unit.py to encapsulate and
abstract the details regarding the generation of system unit files.
This will make it simpler in the future to add more advanced systemd
configurations include managing customized unit files and systemd
unit drop-in files.
Some additional work was needed to update the recently added
command_unit_install function. Because the new systemd_unit.update_files
function requires a full daemon identity. The command_unit_install
function now requires a daemon name. In addition, while testing this
change it was found that the function could not have worked as it was
because it required the fsid but neither used the infer_fsid decorator
nor provided a `--fsid` argument. Both were added.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 21:17:03 +0000 (17:17 -0400)]
cephadm: update unit file test imports
Update systemd unit file tests file to use the canonical module for
systemd units functions rather than importing them indirectly from
cephadm.py.
This future proofs the test in case the imports in cephadm.py
change.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 21:01:11 +0000 (17:01 -0400)]
cephadm: update container engine test imports
Update container engine tests file to use the canonical module for
container engines rather than importing them indirectly from cephadm.py.
This future proofs the test in case the imports in cephadm.py change.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Ilya Dryomov [Sat, 9 Dec 2023 20:00:42 +0000 (21:00 +0100)]
test/librbd: avoid config-related crashes in DiscardWithPruneWriteOverlap
For reasons that I think no longer apply today, set_val() and
set_val_or_die() refuse to set "type: str" config options that aren't
marked as "can be changed at runtime" -- set_val() returns an error and
set_val_or_die() terminates the process. What is and isn't marked as
"can be changed at runtime" seems to be pretty much random both within
and outside of RBD, so let's just refactor how config is set here.
While at it, I realized that reproducer config is underspecified:
- for rbd_cache_policy and rbd_cache_writethrough_until_flush settings
to matter, rbd_cache must be set to true and rbd_cache_max_dirty must
be set to a positive number
- order should be set explicitly, because rbd_default_order can be as
low as 12 (for 4096-byte objects), interfering with the logic of the
test
Zac Dover [Sat, 9 Dec 2023 03:46:00 +0000 (04:46 +0100)]
doc/radosgw: format POST statements
Format the POST methods so that they appear in the rendered text as
examples of POST API calls and not as plain old unformatted text, which
is how they looked before this commit. The content of these API calls
remains to be tested and confirmed to work, but this is a first step.
Casey Bodley [Thu, 7 Dec 2023 14:11:12 +0000 (09:11 -0500)]
vstart: add --rgw_store option for rados|dbstore|posix
enables dbstore for rgw_backend_store and rgw_config_store, allowing
vstart to run without any mons or osds. database files are put under
the dev subdirectory
when rgw_store=posix, the posix filter is added on top of dbstore
Shachar Sharon [Thu, 30 Nov 2023 11:29:30 +0000 (13:29 +0200)]
client/fuse: handle case of renameat2 with non-zero flags
When user issues renameat(2) with non-zero flags (RENAME_EXCHANGE or
RENAME_NOREPALCE) the current code ignores those flags and treat the
call as ordinary rename. This, in turn, may yield successful rename with
wrong semantics then those expected by the caller.
Follow the same semantics as kernel's cephfs client: return -EINVAL when
having non-zero flags to renameat2 (see 'ceph_rename' at fs/ceph/dir.c).
Mark Kogan [Tue, 28 Nov 2023 12:34:31 +0000 (14:34 +0200)]
rgw: d3n: fix valgrind reported leak related to libaio worker threads
which sporadically reproduces on teuthology ubuntu instances
happens because a race between RGW shutdown occurring before
the libaio worker threads had terminated
to fix, reduced the libaio threads inactivity shutdown time
ref:
man aio_init
...
aio_idle_time
This field specifies the amount of time in seconds that a worker thread
should wait for further requests before terminating, after having
completed a previous request. The
default value is 1.
...
Fixes: https://tracker.ceph.com/issues/63445 Signed-off-by: Mark Kogan <mkogan@ibm.com>
Zac Dover [Sat, 2 Dec 2023 05:32:26 +0000 (06:32 +0100)]
doc/radosgw: add gateway starting command
Add a command that properly starts (or restarts) the RADOS gateway after
RGW settings have been changed. This commit has been added in response
to an issue reported anonymously on
https://pad.ceph.com/p/Report_Documentation_Bugs.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Joshua Baergen [Thu, 9 Nov 2023 16:43:22 +0000 (09:43 -0700)]
librbd: Append one journal event per image request
In the case where an image request is split across multiple object
extents and journaling is enabled, multiple journal events are appended.
Prior to this change, all object requests would wait for the last
journal event to complete, since journal events complete in order and
thus the last one completing implies that all prior journal events were
safe at that point.
The issue with this is that there's nothing stopping that last journal
event from being cleaned up before all object requests have stopped
referring to it. Thus, it's entirely possible for the following sequence
to occur:
1. An image request gets split into two image extents and two object
requests. Journal events are appended (one per image extent).
2. The first object request gets delayed due to an overlap, but the
second object request gets submitted and starts waiting on the last
journal event (which also causes a C_CommitIOEvent to be instantiated
against that journal event).
3. Journaling completes, and the C_CommitIOEvent fires. The
C_CommitIOEvent covers the entire range of data that was journaled in
this event, and so the event is cleaned up.
4. The first object request from above is allowed to make progress; it
tries to wait for the journal event that was just cleaned up which
causes the assert in wait_event() to fire.
As far as I can tell, this is only possible on the discard path today,
and only recently. Up until 21a26a752843295ff946d1543c2f5f9fac764593
(librbd: Fix local rbd mirror journals growing forever), m_image_extents
always contained a single extent for all I/O types; this commit changed
the discard path so that if discard granularity changed the discard
request, m_image_extents would be repopulated, and if the request
happened to cross objects then there would be multiple m_image_extents.
It appears that the intent here was that there should be one journal
event per image request and the pending_extents kept track of what had
completed thus far. This commit restores that 1:1 relationship.
Casey Bodley [Tue, 5 Dec 2023 21:12:56 +0000 (16:12 -0500)]
osd/scrubber: fix signed comparison warning
[681/1140] Building CXX object src/osd/CMakeFiles/osd.dir/scrubber/scrub_resources.cc.o
src/osd/scrubber/scrub_resources.cc: In member function ‘bool Scrub::ScrubResources::inc_scrubs_remote(pg_t)’:
src/osd/scrubber/scrub_resources.cc:84:18: warning: comparison of integer expressions of different signedness: ‘long unsigned int’ and ‘const int64_t’ {aka ‘const long int’} [-Wsign-compare]
84 | if (pre_op_cnt < conf->osd_max_scrubs) {
| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
Zac Dover [Tue, 5 Dec 2023 19:46:26 +0000 (20:46 +0100)]
doc/radosgw: update link in rgw-cache.rst
Update link in doc/radosgw/rgw-cache.rst. The link updated here is a
link to all the Nginx configuration files. The old link was broken. This
update comes to us from an anonymous report on
https://pad.ceph.com/p/Report_Documentation_Bugs.
Casey Bodley [Tue, 5 Dec 2023 17:21:18 +0000 (12:21 -0500)]
common: use inline for monostate dencoders
fix a 'multiple definition' error when included by multiple sources:
src/common/versioned_variant.h:31: multiple definition of `ceph::encode(std::monostate const&, ceph::buffer::v15_2_0::list&)';
rgw_main.cc.o:src/common/versioned_variant.h:31: first defined here
When no `service_id` is provided to service spec (osd) it results in
OSDs created with "osdspec_affinity" attribute set to a string
containing "None".
The DriveSelection class relies on the comparison of the actual
value of this attribute with the value of the service_id which has
the python type `None` in that case.
If any existing deployments were created without the service_id
attribute, we now have to support this case and make sure the check
won't filter out devices unexpectedly.