qa/tasks/ceph: provide configuration for setting configs via mon
These configs may be set using:
ceph:
cluster-config:
entity:
foo: bar
same as the current:
ceph:
config:
entity:
foo: bar
The configs will be set in parallel using the `ceph config set` command.
The main benefit here is to avoid using the ceph.conf to set configs which
cannot be overriden using subsequent `ceph config` command. The only way to
override is to change the ceph.conf in the test (yuck) or the admin socket
(which gets reset when the daemon restarts).
Finally, we can now exploit the `ceph config reset` command will let us
trivially rollback config changes after a test completes. That is exposed
as the `ctx.config_epoch` variable.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This change is necessary because the new way of setting config is to use the
ceph config command or the asok interface rather than the old way which
involved editing the ceph.conf and restarting the daemons to reflect the
changes. Have updated the code to support runtime config changes.
Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com> Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Wed, 5 Jun 2024 16:43:15 +0000 (02:43 +1000)]
doc/start: s/intro.rst/index.rst/
Change the filename "doc/start/intro.rst" to "doc/start/index.rst" so
that Sphinx finds the root filename for the "/start" directory in the
default location.
* refs/pull/57673/head:
qa: use tell interface for command that may fail
mds: dump AsyncHandler ss to stderr if present
mds: unconditionally dump message in formatter
mds: use appropriate abbrev. for variable name
mds: dump formatter even for errors
common/admin_socket: create type for finisher callback
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Leonid Usov <leonid.usov@ibm.com>
Patrick Donnelly [Thu, 23 May 2024 17:47:15 +0000 (13:47 -0400)]
qa: use tell interface for command that may fail
The asok interface will mangle stdout if the command actually fails.
The reason `flush path` is done via the asok interface is because the tell/asok
interfaces were unified after these tests were written and `flush path` was
only available via the asok interface.
Fixes: https://tracker.ceph.com/issues/66184 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/57553/head:
mds: try to choose a new batch head in request_clientup()
Revert "mds: find a new head for the batch ops when the head is dead"
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 4 Jun 2024 13:37:27 +0000 (23:37 +1000)]
doc/start: s/http/https/ in links
Replace "http" with "https" in doc/start/get-involved.rst.
This commit is, in a way, a repeat of
https://github.com/ceph/ceph/pull/57213/
(1c5383b91bd7dbfa9670c6485fcc5ff28b79f40d), which targeted the Reef
branch instead of the main branch. When this commit has been merged and
backported, I will close https://github.com/ceph/ceph/pull/57213/.
I am listing Casey Cain here as the co-author, but he is in fact the
true author of this change.
Kefu Chai [Sat, 1 Jun 2024 00:53:28 +0000 (08:53 +0800)]
ceph.spec.in: remove setuptools dependency
in 844b66de, we stopped using pkg_resources for import packaging.
and the exact reason why we introduced pkg_resources was for using
the packaging python module, see cf608920.
Kefu Chai [Sat, 1 Jun 2024 00:39:03 +0000 (08:39 +0800)]
debian: package mgr/rgw in ceph-mgr-modules-core
in 110db72e, we added the rgw mgr module to ceph-mgr-modules-core
rpm package. but we didn't add this module to the corresponding
debian package.
rgw mgr module provides a simple interface to deploy RGW multisite
setup. so it would be nice to have it in ceph's debian packages as
well.
despite that rgw is not part of the core features, since this module
is already in ceph-mgr-modules-core rpm package, and it is relatively
small and does not pulling extra dependencies, let's added to the
debian packge with the same name. we can revisit this decision and
extract it out in a following up change if it is necessary in future.
J. Eric Ivancich [Wed, 29 May 2024 18:19:25 +0000 (14:19 -0400)]
rgw: track initiator of reshard queue entries
The logic for managing the reshard queue (log) can vary depending on
whether the entry was added by an admin or by dynamic resharding. For
example, if it's a reshard reduction, dynamic resharding won't
overwrite the queue entry so as not to disrupt the reduction wait
period. On the other hand, and admin should be able to overwrite the
entry at will.
So we now track the initiator of each entry on the queue. This adds
another field to that at rest data structure, and it updates the logic
to make use of it.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Tue, 21 May 2024 18:06:47 +0000 (14:06 -0400)]
rgw: provide testing support to dynamic resharding with reduction
Adds a config option rgw_reshard_debug_interval that will allow us to
make the resharding algorithms run on a faster schedule by allowing
one day to be simulated by a set number of seconds.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Tue, 14 May 2024 19:09:03 +0000 (15:09 -0400)]
rgw: add shard reduction ability to dynamic resharding
Previously, dynamic resharding could only *increase* the number of
bucket index shards for a given bucket. This adds the ability to also
*reduce* the number of shards.
So in addition the existing 100,000 entries (current default value)
per shard trigger for an increase, there's a new trigger of 10,000
entries per shard for a decrease.
However, for buckets with object-counts that go up and down regularly,
we don't want to keep resharding up and down to chase the number of
objects. So for shard reduction to take place there's also a time
delay (default 5 days). Once the entry on the reshard queue (log) is
added for reduction, processing will not result in a reshard reduction
within this delay period as the queue is processed. Only when the
reshard entry is processed after this delay can it perform the shard
reduction.
However, if at any point between the time the shard reduction entry is
added to the queue and after the delay, if the entry is processed and
there are *not* few enough entries to trigger a shard reduction, the
entry on the reshard queue entry will be discarded.
So using the defaults, this effectively means the bucket must have few
enough objects for a shard reduction for 5 consecutive days before the
reshard will take place.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Lucian Petrut [Fri, 24 May 2024 10:03:11 +0000 (10:03 +0000)]
rbd-wnbd: wait for the disk cleanup to complete
The WNBD disk removal workflow is asynchronous, which is why we'll
need to wait for the cleanup to complete when stopping the service.
The "disconnect_all_mappings" function is moved to
RbdMappingDispatcher::stop, allowing us to access the mapping list
more easily and reject new mappings after a stop has been requested.
J. Eric Ivancich [Fri, 17 May 2024 23:23:48 +0000 (19:23 -0400)]
cls/rgw: adding an entry to reshard queue has O_CREAT option
Adds the ability to prevent overwriting a reshard queue (log) entry
for a given bucket with a newer entry. This adds a flag to the op, so
it will either CREATE or make no changes. If an entry already exists
when this flag is set, -EEXIST will be returned.
This is a preparatory step to adding shard reduction to dynamic
resharding.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Venky Shankar [Wed, 29 May 2024 09:34:58 +0000 (15:04 +0530)]
Merge PR #55758 into main
* refs/pull/55758/head:
doc: update 'journal reset' command with --yes-i-really-really-mean-it
qa: fix cephfs-journal-tool command options and make fs inactive
cephfs-journal-tool: Add warning messages during 'journal reset' and prevent execution on active fs
Patrick Donnelly [Tue, 28 May 2024 16:46:08 +0000 (12:46 -0400)]
Merge PR #57579 into main
* refs/pull/57579/head:
mds/quiesce: disable quiesce root debug parameters by default
mds/quiesce-agt: never send a synchronous ack
mds/quiesce-agt: add test for a rapid async ack
mds/quiesce: always abort fragmenting asynchronously to prevent reentrancy
mds/quiesce: overdrive an export if it hasn't frozen the tree yet
mds/quiesce: quiesce_inode should not hold on to remote auth pins
qa/cephfs: check that a completed quiesce doesn't hold remote auth pins
mds: add `--lifetime` parameter to the `lock path` asok command
mds/quiesce: accept a regular file as the quiesce root
mds: command_quiesce_path: rename `--wait` to `--await` for consistency
mds: command_quiesce_path: do not block the asok thread and return an adequate rc
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Tue, 28 May 2024 16:27:53 +0000 (02:27 +1000)]
doc/dev: add note about intro of perf counters
Add a note to the "perf counter" section of doc/dev/perf_counters.rst
that explains that this feature was introduced in the Reef release of
Ceph. This note will prevent us from accidentally backporting
perf-counter-related PRs to Quincy.
RGW: Remove get_obj_state()/set_obj_state from SAL
RGWObjState is the state for the StoreObject class. It has historically
been accessible via get_obj_state()/set_obj_state(), but the double
pointer nature of this access has caused multiple bugs, and the
RGWObjState itself is an implementation detail that doesn't need to be
exposed.
Instead, add a load_obj_state() that loads the state from the store, and
use proper getters/setters for the data.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>