Leonid Usov [Thu, 30 Nov 2023 14:42:22 +0000 (16:42 +0200)]
mds/quiesce: MDSRankQuiesce - integration of the quiesce db manager
* create an instance of the QuiesceDbManager in the rank
* update membership with a new mdsmap
* add an admin socket command for sending requests to the manager
Leonid Usov [Thu, 4 Jan 2024 17:51:32 +0000 (19:51 +0200)]
mgr/volumes: use `volume_exception_to_retval` as a decorator
When used as a decorator, it saves one indented try-catch block inside the decorated method.
This can be applied to most of the methods in the file, subject to a separate refactoring commit
Leonid Usov [Sun, 26 Nov 2023 11:29:11 +0000 (13:29 +0200)]
mds/quiesce: QuiesceAgent implementation and unit tests
QuiesceAgent is the layer that converts updates from the QuiesceDb
into calls to the QuiesceProtocol APIs, and then sends async acks
back to the db manager following the quiesce protocol events.
Leonid Usov [Tue, 31 Oct 2023 12:46:55 +0000 (14:46 +0200)]
mds/quiesce: QuiesceDb.h and QuiesceDbManager with tests
Quiesce DB is one of the components of the "Consistent Snapshots" epic.
The solution is discussed in a slide deck available for viewing to @redhat users:
https://docs.google.com/presentation/d/1wE3-e9AAme7Q3qmeshUSthJoQGw7-fKTrtS9PsdAIVo/edit?usp=sharing
This commit is focusing on the replicated quiesce database maintained by the MDS rank cluster.
One of the major goals was to design the component in a way that can be easily tested
outside of the MDS infrastructure, which is why the communication layer
has been asbtracted out by introducing just two communication callbacks
that will need to be implemented by the infrastructure.
The most of the component code is delivered in a single coherent commit, along with the uint tests.
Other commits will be dedicated to integration with the MDS infrastructure and other changes
that can't be attributed to the core quiesce db code or its tests.
The quiesce db component is composed of the following major parts/actors:
* QuiesceDbManager is the main actor, implementing both the leader and the replica roles.
Normally, there will be an instance of the manager per MDS rank, although given the
decoupling of the infrastructure and the manager, one can run any number of instances
on a single node, which is how test are working.
* The manager interfaces to the infrastructure via two main APIs with the infrastructure
that provides communication and cluster configuration (actor 2) and the quiesce db
client that is responsible for the quiescing of the roots (actor 3)
** ClusterMembership is how manager is configured to be part of a (virtual) cluster.
This structure will deliver information about other peers, the leader and provide
two communication APIs: send_listing_to for db replication from the leader to replicas
and send_ack for reporting quiesce success from the agents.
** Client Interface consists of a QuisceMap notify callback and a dedicated manager
method to submit asynchronous acks following the agent (rank) quiesce progress.
The API of the quiesce db is described in the slide deck mentioned above. The full scope
of capabilities are encapsulated in a single QuiesceDbRequest structure. This should
simplify the implementation of other components that will have to propagate the functionality
to the administrator user of the volumes plugin.
Zac Dover [Sun, 3 Mar 2024 10:28:00 +0000 (20:28 +1000)]
doc/rados: remove PGcalc from docs
Remove mention of the "PG calc" tool from the documentation. I have
removed all mention of this in one fell swoop to help posterity restore
mention of this tool if we decide we need to do so.
Kefu Chai [Tue, 27 Feb 2024 14:25:32 +0000 (22:25 +0800)]
cmake: bump liburing from 0.7 to 2.5
this allows us to use newer liburing features. Seastar is using
some of them which are not provided by liburing 0.7.
in this change, `--use-libc` is passed to configure. otherwise
it does not link against libc, and the symbles like memset()
won't be available when compiling liburing.so with -fPIC using
clang, which does not pull libc in that case.
Dan Mick [Thu, 29 Feb 2024 19:36:51 +0000 (11:36 -0800)]
.github/workflows/create-backport-trackers.yml: update versions of actions
Getting warning about node16 being deprecated. The workflow doesn't use node
directly, but through the external actions. Moving to node20 requires
changing setup-python version; Bhacaz/checkout-files is deprecated and
recommends actions/checkout.
Zac Dover [Fri, 1 Mar 2024 12:11:14 +0000 (22:11 +1000)]
doc/install: add manual RADOSGW install procedure
Add a manual RADOSGW installation procedure to
doc/install/manual-deployment.rst. This procedure was developed by Janne
Johansson and reported to the ceph-users mailing list on 29 Jan 2024
here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LB3YRIKAPOHXYCW7MKLVUJPYWYRQVARU/
Co-authored-by: Janne Johansson <icepic.dz@gmail.com> Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
The rbd-wnbd daemon currently caches one rados context per cluster.
However, it's registering hooks against the global context
admin socket, which won't be available. For this reason,
the "rbd-wnbd stats" command no longer works.
To address this issue, we'll ensure that rbd-wnbd sets command hooks
against the right admin socket instance, leveraging the image
context.
The "rbd-wnbd unmap" command is currently telling the WNBD driver
to remove the mapping without contacting the rbd-wnbd daemon
and waiting for it to perform its cleanup.
For this reason, attempting to delete the image immediately after
unmapping it can fail due to existing watchers.
As a temporary solution, we'll retry the image remove operation.
At a later time, we'll update the "rbd-wnbd unmap" command to go
through the rbd-wnbd daemon, ensuring that all the necessary
cleanup is performed before returning.
While at it, we're dropping a redundant LOG.error call so that we
won't print expected exceptions.
This commit will store the mapping config in the Windows registry
only after initializing the mapping. This ensures that we aren't
replacing the registry settings for already mapped images.
We'll also check if the registry setting was added by us before
cleaning it up.
Lucian Petrut [Mon, 12 Jun 2023 13:16:39 +0000 (13:16 +0000)]
rbd-wnbd: use one daemon process per host
We're currently using one rbd-wnbd process per image mapping.
Since OSD connections aren't shared across those processes,
we end up with an excessive amount of TCP sessions, potentially
exceeding Windows limits:
https://ask.cloudbase.it/question/3598/ceph-for-windows-tcp-session-count/
In order to improve rbd-wnbd's scalability, we're going to use
a single process per host (unless "-f" is passed when mapping the
image, in which case the daemon will run as part of the same
process). This allows OSD sessions to be shared across image
mappings.
Another advantage is that the "ceph-rbd" service starts faster,
especially when having a large number of image mappings.
Zac Dover [Thu, 29 Feb 2024 08:08:10 +0000 (18:08 +1000)]
doc/glossary: improve "MDS" entry
Improve the entry for "MDS" in doc/glossary.rst by linking to the
"ceph-mds" man page and mentioning the relationship between clients and
MDS (or MDSes).
Ramana Raja [Thu, 29 Feb 2024 17:12:19 +0000 (12:12 -0500)]
qa/suites: add diff-continuous and compare-mirror-image tests
... to rbd and krbd suites respectively.
This allows the compare-mirror-image tests introduced in ea3a567
to be run against various kernel branches, e.g., testing branch.
And allows diff_continuous test in rbd_suite to run against distro
kernel.
Fixes: https://tracker.ceph.com/issues/64574 Signed-off-by: Ramana Raja <rraja@redhat.com>
daijufang [Mon, 4 Dec 2023 07:46:29 +0000 (07:46 +0000)]
src/rgw: fix for the multipart interface in the WORM function
1. Save the WORM configuration information in the initialization chunk information for use when merging chunks.
2. Support x-amz-bypass-governance-retention when merging chunks.
John Mulligan [Thu, 22 Feb 2024 18:49:10 +0000 (13:49 -0500)]
qa/tasks: add templating functions to cephadm module
Add functions to cephadm.py that will be later used to template
strings within the yaml files in the cephadm suites. This will be used
to replace the specific subst_vip call with generic calls that let
tests access "any" variables stored on the test ctx.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 20 Feb 2024 15:09:50 +0000 (10:09 -0500)]
qa/tasks: fix VIPs log line
While testing my previous patches were correct I noticed that the string
here was logged exactly as written, and was thus pretty useless. This
was probably meant to be an f-string. So make it one. Also get rid of
the unnecessary map call, the list and IP address type can repr
themselves just fine IMO.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 20 Feb 2024 00:14:52 +0000 (19:14 -0500)]
qa/tasks: change map_vips to raise exceptions instead of returning None
None of the callers of map_vips ever checks for a None return. So
instead of handling any error conditions it would always just blow
up with a semi-obscure TypeError. Convert the function to always
raise an exception (one that tries to breifly explain the condition)
when something goes wrong. I also take the opportunity to make
more clearer logging and reduce an indentation level.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Xuehan Xu [Wed, 28 Feb 2024 05:42:04 +0000 (13:42 +0800)]
crimson/os/seastore: copy attrs and omaps when cloning objects
At present, we just copy attrs and omaps one by one, which is not
efficient but very important in terms of functionality especially for
the teuthology tests
Venky Shankar [Wed, 28 Feb 2024 04:02:42 +0000 (09:32 +0530)]
Merge PR #52258 into main
* refs/pull/52258/head:
client: check mds down status bofore getting mds_gid_t from mdsmap
mgr/dashboard: allow sending back error status code fetching clients fails
Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com>