Ramana Raja [Thu, 25 May 2023 16:48:12 +0000 (16:48 +0000)]
qa: Add tests to validate syncing of images using rbd-mirror
Introduce functional tests to validate that the images under
workloads are correctly mirrored between two clusters using snapshot
based mirroring.
Run workload on a primary image using a krbd or nbd client. Take
mirror snapshots of the image under workload. Unmount the mapped image
and calculate its MD5 checksum before demoting it. After demotion,
wait for the mirror status of the image to be 'up+unknown' in both
the clusters. This is to make sure that the non-primary image in the
other cluster is ready to be promoted. Now promote the non-primary
image in the other cluster. Map the promoted image and calculate its
MD5 checksum. Verify that the checksums of the demoted and promoted
images in the two clusters are the same.
The above test is run as part of two different workunits:
- a workunit that validates the syncing of multiple mirrored images
with workloads running on them
- another workunit that validates the syncing of a single mirrored
image with workload running on it and the image is set as primary
alternatively between the two clusters, as it happens during
failover and failback scenarios.
Fixes: https://tracker.ceph.com/issues/61617 Signed-off-by: Ramana Raja <rraja@redhat.com> Co-authored-by: Ilya Dryomov <idryomov@redhat.com> Co-authored-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit b7aae5c3c5a1dd24c4cb7ceb499292af00bae680)
Cherry-pick notes:
- In qa/workunits/rbd/compare_mirror_images.sh, replace
`wait_for_replaying_status_in_pool_dir` with `wait_for_status_in_pool_dir`
Commit 3fd8a03 that added `wait_for_replaying_status_in_pool_dir`
not backported
Ramana Raja [Fri, 9 Feb 2024 00:32:37 +0000 (19:32 -0500)]
qa/workunits: make wait_for_status_in_pool_dir() reentrant
In rbd_mirror_helpers.sh, the `wait_for_status_in_pool_dir()` helper
stored `mirror image status` and `mirror pool status` command outputs
in files that could be shared over successive calls or calls from
multiple threads. Instead store the command outputs in local variables
to make `wait_for_status_in_pool_dir()` reentrant.
Zac Dover [Mon, 19 Feb 2024 08:41:45 +0000 (18:41 +1000)]
doc/cephfs: edit add-remove-mds
Disambiguate a note in doc/cephfs/add-remove-mds.rst to help readers
distinguish between cases in which they might want to use an automated
tool such as cephadm to deploy MDSes and cases in which they might want
to manually deploy MDSes.
See: https://github.com/ceph/ceph/pull/45639
Tracker: https://tracker.ceph.com/issues/54551
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 39ad6264aa1c97b04974e04046033887325ed2b2)
Zac Dover [Fri, 9 Feb 2024 15:24:17 +0000 (01:24 +1000)]
doc/mgr: remove ceph-exporter (Quincy)
Remove mention of ceph-exporter in the Quincy branch. ceph-exporter was
in one release of Quincy, but was later removed because it was broken.
This PR is made in response to Eugen Block's having brought this matter
to my attention.
Zac Dover [Wed, 7 Feb 2024 13:18:35 +0000 (23:18 +1000)]
doc/radosgw: add confval directives
Add confval directives to the documentation of "quota cache" options.
This addresses a request made by Antony D'Atri in https://github.com/ceph/ceph/pull/55075/files#r1444006246.
Zac Dover [Sun, 4 Feb 2024 15:36:10 +0000 (01:36 +1000)]
doc/rados: update PG guidance
Update the "Creating a Pool" section of doc/rados/operations/pools.rst
so that the documentation no longer insists that the user change the
values of "osd_pool_default_pg_num" and "osd_pool_default_pgp_num".
See also: https://github.com/ceph/ceph/pull/55419
Tracker: https://tracker.ceph.com/issues/64259
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 5ad241442d2c141ba508faba61f39d70f3f09679)
Zac Dover [Fri, 2 Feb 2024 01:53:45 +0000 (11:53 +1000)]
doc/rados: update config for autoscaler
Update doc/rados/configuration/pool-pg-config-ref.rst to account for the
behavior of autoscaler.
Previously, this file was last meaningfully altered in 2013, prior to
the invention of autoscaler. A recent confusion was brought to my
attention on the Ceph Slack whereby a user attempted to alter the
default values of a Quincy cluster, as suggested in this documentation.
That alteration caused Ceph to throw the error "Error ERANGE: 'pgp_num'
must be greater than 0 and lower or equal than 'pg_num', which in this
case is one" and a related "rgw_init_ioctx ERROR" reading in part
"Numerical result out of range". The user removed the
"osd_pool_default_pgp_num" configuration line from ceph.conf and the
cluster worked as expected. I presume that this is because the removal
of this configuration line allowed autoscaler to work as intended.
Fixes: https://tracker.ceph.com/issues/64259 Co-authored-by: David Orman <ormandj@corenode.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 4dc12092be584da44baca14e31ca33231164235f)
Ramana Raja [Tue, 23 Jan 2024 21:07:04 +0000 (16:07 -0500)]
rbd-nbd: log errors during netlink_resize() using derr
When using rbd CLI to map the images to NBD devices via netlink,
any errors that arose during image resizing in netlink_resize()
were not logged. Switching the error logging from using cerr to
derr helps log the errors from netlink_resize().
Ramana Raja [Mon, 22 Jan 2024 22:06:58 +0000 (17:06 -0500)]
rbd_nbd: fix resize of images mapped using netlink
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
Ilya Dryomov [Sat, 6 Jan 2024 16:08:04 +0000 (17:08 +0100)]
librbd: try to preserve object map for diff-iterate in fast-diff mode
As an optimization, try to ensure that the object map for the end
version is preloaded through the acquisition of exclusive lock and
as a consequence remains around until exclusive lock is released.
If it's not around, DiffRequest would (re)load it on each call.
Ilya Dryomov [Sat, 6 Jan 2024 16:05:39 +0000 (17:05 +0100)]
librbd/object_map: potentially use in-memory object map in DiffRequest
If the object map for the end version is around (already loaded in
memory, either due to the end version being a snapshot or due to
exclusive lock being held), use it to run diff-iterate against the
beginning of time. Since it's the only object map needed in that
case, such calls would be satisfied locally.
Ilya Dryomov [Fri, 5 Jan 2024 12:15:54 +0000 (13:15 +0100)]
librbd/object_map: decouple object map processing in DiffRequest
In preparation for potentially using in-memory object map, decouple
object map processing from loading object maps and place the logic in
prepare_for_object_map() and process_object_map().
Ilya Dryomov [Fri, 5 Jan 2024 11:23:24 +0000 (12:23 +0100)]
common/bit_vector: fix iterator vs reference constness confusion
T (ConstIterator or Iterator) is confused with const T here:
IteratorImpl dereference operator is wrongly overloaded on const
and returns Reference instead of ConstReference for ConstIterator.
This then fails inside bufferlist bowels because Reference is
incompatible with bufferlist::const_iterator.
Ilya Dryomov [Thu, 4 Jan 2024 10:39:20 +0000 (11:39 +0100)]
librbd/object_map: don't resize object map in handle_load_object_map()
Currently it's done in two cases:
- if the loaded object map is larger than expected based on byte size,
it's truncated to expected number of objects
- in case of deep-copy, if the loaded object map is smaller than diff
state, it's expanded to get "track the largest of all versions in the
set" semantics
Both of these cases can be easily dealt with without modifying the
object map. Being able to process a const object map is needed for
working on in-memory object map which is external to DiffRequest.
It's totally broken: instead of returning the current position and
moving to the next position, it returns the next position and doesn't
move anywhere. Luckily it hasn't been used until now.
Ilya Dryomov [Thu, 28 Dec 2023 09:14:18 +0000 (10:14 +0100)]
librbd: propagate diff-iterate range to parent in fast-diff mode
When getting parent diff, pass the overlap-reduced image extent instead
of the entire 0..overlap range to avoid a similar quadratic slowdown on
cloned images.
Ilya Dryomov [Wed, 27 Dec 2023 17:07:05 +0000 (18:07 +0100)]
librbd/object_map: add support for ranged diff-iterate
Currently diff-iterate in fast-diff mode is performed on the entire
image no matter what image extent is passed to the API. Then, unused
diff just gets discarded as DiffIterate ends up querying only objects
that the passed image extent maps to. This hasn't been an issue for
internal consumers ("rbd du", "rbd diff", etc) because they work on the
entire image, but turns out to lead to quadratic slowdown in some QEMU
use cases.
0..UINT64_MAX range is carved out for deep-copy which is unranged by
definition. To get effectively unranged diff-iterate, 0..UINT64_MAX-1
range can be used.
Ilya Dryomov [Sat, 23 Dec 2023 14:19:09 +0000 (15:19 +0100)]
test/librbd: expand TestMockObjectMapDiffRequest edge case coverage
For each covered edge case or error, run through the following
scenarios:
- where the edge case concerns snap_id_start
- where the edge case concerns snap_id_end
- where the edge case concerns intermediate snapshot and
snap_id_start == 0 (diff against the beginning of time)
- where the edge case concerns intermediate snapshot and
snap_id_start != 0 (diff from snapshot)
Ilya Dryomov [Sat, 23 Dec 2023 13:47:54 +0000 (14:47 +0100)]
librbd/object_map: allow intermediate snaps to be skipped on diff-iterate
In case of diff-iterate against the beginning of time, the result
depends only on the end version. Loading and processing object maps
or intermediate snapshots is redundant and can be skipped.
This optimization is made possible by commit be507aaed15f ("librbd:
diff-iterate shouldn't ever report "new hole" against a hole") and, to
a lesser extent, the previous commit.
Getting FastDiffInvalid, LoadObjectMapError and ObjectMapTooSmall to
pass required tweaking not just expectations, but also start/end snap
ids and thus also the meaning of these tests. This is addressed in the
next commit.
Ilya Dryomov [Fri, 22 Dec 2023 17:50:20 +0000 (18:50 +0100)]
librbd/object_map: resurrect diff-iterate behavior when image is shrunk
The new "track the largest of all versions in the set, diff state is
only ever grown" semantics introduced in commit 330f2a7bb94f ("librbd:
helper state machine for computing diffs between object-maps") don't
make sense for diff-iterate. It's a waste because DiffIterate won't
query beyond the end version size -- this is baked into the API.
Limit this behavior to deep-copy and resurrect the original behavior
from 2015 for diff-iterate.