Failed the test in EC Pool configuration because PGs are
not going into active+clean (our fault for over thrashing and checking the wrong thing).
Also, PG would not go into active because we thrash below min_size
in an EC pool config, not enough shards in the acting set.
Therefore, failed the wait_for_recovery check.
Moreover, When we revive osds, we didn't add the osd back in the cluster,
this messes up true count for live_osds in the test.
Solution:
Instead of randomly choosing OSDs to thrash,
we randomly select a PG from each pool and
thrash the OSDs in the PG's acting set until
we reach min_size, then we check to see if the
PG is still active. After that we revive all
the OSDs to see if the PG recovered cleanly.
We removed some of the unnecessary part such
as `min_dead`, `min_live`, `min_out` and etc.
Also, we refractored the part of where we are
assigning k,m for the EC pools so that we get
better code readablility.
Zac Dover [Mon, 12 Aug 2024 12:47:08 +0000 (22:47 +1000)]
doc/cephfs: improve cache-configuration.rst
Improve the text in the section about dealing with cache-pressure alerts
that was added in https://github.com/ceph/ceph/pull/59077. The changes
in this commit were suggested by Anthony D'Atri.
Co-authored-by: Patrick Donnelly <pdonnelly@redhat.com> Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit aa3bdae2314fef2fca8fc12dca006af657235e17)
Ilya Dryomov [Fri, 2 Aug 2024 07:27:42 +0000 (09:27 +0200)]
librbd/migration: make ImageDispatch handle encryption for non-native formats
With NativeFormat now being handled via dispatch, handling encryption
for non-native formats (i.e. mapping to raw image extents and performing
decryption/mapping back on completion) in the migration layer is really
straightforward.
Note that alignment doesn't need to be performed in the migration layer
because it happens on the destination image -- the "align and resubmit"
logic in C_UnalignedObjectReadRequest should kick in before the call to
read_parent().
Fixes: https://tracker.ceph.com/issues/53674 Co-authored-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0000c3447407772039121bb4499f243df1c889da)
Conflicts:
src/librbd/migration/ImageDispatch.cc [ commit 20aee5bbbcb5
("neorados: Make IOContext getters/setters less weird") not
in reef ]
librbd/migration: close source image in OpenSourceImageRequest
Currently, on errors in FormatInterface::open(), RawFormat disposes
of src_image_ctx, but QCOWFormat doesn't, which is a leak. Rather than
having each format do it internally, do it in OpenSourceImageRequest.
librbd/migration: don't instantiate NativeFormat, handle it via dispatch
Trying to shoehorn NativeFormat under FormatInterface doesn't really
work. It fundamentally doesn't fit in:
- Unlike for RawFormat and QCOWFormat, src_image_ctx for NativeFormat
is not dummy -- it's an ImageCtx for a real RBD image. Pre-creating
it in OpenSourceImageRequest with the expectation that placeholder
values would be overridden later forces NativeFormat to reach into
ImageCtx guts, duplicating the logic in the constructor. This also
necessitates calling snap_set() in a separate step, since snap_id
isn't known at the time ImageCtx is created.
- Unlike for RawFormat and QCOWFormat, get_image_size() and
get_snapshots() implementations for NativeFormat are dummy.
- read() and list_snaps() implementations for NativeFormat are
inconsistent: read() passes through io::ImageDispatch layer, but
list_snaps() doesn't. Both can be passing through, meaning that in
essence these are also dummy.
All of this is with today's code. Additional complications arise with
planned support for migrating from external clusters where src_image_ctx
would require more invasive patching to "move" to an IoCtx belonging to
an external cluster's CephContext and also with other work.
With the above in mind, NativeFormat actually consists of:
1. Code that parses the "type: native" source spec
2. Code that patches ImageCtx, working around the fact that it's
pre-created in OpenSourceImageRequest
3. A bunch of dummy implementations for FormatInterface
With this change, (1) is wrapped into a static method that also creates
ImageCtx after all required parameters are known and (2) and (3) go away
entirely. NativeFormat no longer implements FormatInterface and doesn't
get instantiated at all.
In preparation for not instantiating NativeFormat and losing a copy of
the source spec JSON object in m_json_object, refactor the parsing code
to use only const methods (which std::map's operator[] isn't) and local
variables where possible.
librbd/migration/NativeFormat: do pool lookup instead of creating io_ctx
A Rados instance is sufficient to map the pool name to the pool ID,
no need to involve an IoCtx instance as well. While at it, report
distinctive errors for a non-existing pool and an invalid JSON value
for pool_name key cases.
librbd/migration: make SourceSpecBuilder::parse_source_spec() static
In preparation for divorcing NativeFormat from FormatInterface and
changing when/how src_image_ctx is created, make parse_source_spec()
independent of src_image_ctx. The "invalid source-spec JSON" error is
duplicated by the "failed to parse migration source-spec" error, so
just get rid of the former to spare having to pass CephContext to
parse_source_spec().
Add missing spaces, don't use the word stream when reporting errors
on POSIX file operations (open() and lseek64()) and fix a cut-and-paste
typo in RawSnapshot.
Zac Dover [Wed, 7 Aug 2024 13:11:11 +0000 (23:11 +1000)]
doc/cephfs: add cache pressure information
Add information to doc/cephfs/cache-configuration.rst about how to deal
with a message that reads "clients failing to respond to cache
pressure". This procedure explains how to slow the growth of the
recall_caps value so that it does not exceed the
mds_recall_warning_threshold.
The information in this commit was developed by Eugen Block. See
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/5ROH5CWKKOEIQMVXOVRT5OO7CWK2HPM3/#J65DFUPP4BY57MICPANXKI7KAXSZ5Z5P
and https://www.spinics.net/lists/ceph-users/msg73188.html.
Fixes: https://tracker.ceph.com/issues/57115 Co-authored-by: Eugen Block <eblock@nde.ag> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit bf26274ae4737417193f8c2b56bea20eb2a358aa)
Casey Bodley [Fri, 3 May 2024 19:43:39 +0000 (15:43 -0400)]
rgw: move publish_complete() back to RGWCompleteMultipart::execute()
move publish_complete() and meta_obj->delete_object() back to execute()
so they only run on success. this allows several member variables to
move back to execute()'s stack as well
Casey Bodley [Fri, 3 May 2024 19:29:00 +0000 (15:29 -0400)]
rgw: CompleteMultipart uses s->object for Notification
get_notification() should be associated with the target object
s->object. the meta_obj has the wrong object name, so required passing
s->object->get_name() as an extra argument
importantly, Notification no longer depends on the lifetime of meta_obj
to avoid a dangling pointer, while the lifetime of s->object is guaranteed
`set_dmcrypt_no_workqueue()` from `ceph_volume.util.encryption`
The function `set_dmcrypt_no_workqueue` in `encryption.py` now
dynamically retrieves the installed cryptsetup version using `cryptsetup
--version` command. It then parses the version string using a regular
expression to accommodate varying digit counts. If the retrieved version
is greater than or equal to the specified target version,
`conf.dmcrypt_no_workqueue` is set to True, allowing for flexible version
handling.
Igor Fedotov [Fri, 31 May 2024 14:05:29 +0000 (17:05 +0300)]
ceph-volume: do source devices zapping if they're detached.
One needs to zap source device(s) after DB/WAL migration.
Original imlementation removes LVM tags only which leaves device(s) in a
state where "ceph-volume raw activate" still reconginizes them as
attached to OSD due to information preserved in bdev label.
Hence the need to do more zapping. Fixes: https://tracker.ceph.com/issues/66315 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit ae5ef432845dcf9b061258357ffd97f4eae59a63)
qa/workunits/rbd: avoid caching effects in luks-encryption.sh
Commit 40f6f5224bce ("qa/workunits/rbd: fix issues in
luks-encryption.sh") did the right thing for reads, which solved
most of the issue. However, it actually made a step in the opposite
direction for writes -- depending on the RBD cache settings, rbd-nbd
virtual devices can behave as physical devices with a volatile write
cache, so fsync is required.
While at it, involving O_DIRECT for reads isn't needed outside of
test_encryption_format().