Zac Dover [Fri, 30 Aug 2024 11:16:57 +0000 (21:16 +1000)]
doc/ceph-volume: add spillover fix procedure
Add a procedure that explains how, after an upgrade, to move bytes that
have spilled over to a relatively slow device back to the faster device.
This procedure was developed by Chris Dunlop on the [ceph-users] mailing
list, here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/POPUFSZGXR3P2RPYPJ4WJ4HGHZ3QESF6/
Eugen Block requested the addition of this procedure to the
documentation on 30 Aug 2024.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 98618aaa1c8b786c7d240a210b62cc737fdb048d)
Adam King [Thu, 2 May 2024 17:35:41 +0000 (13:35 -0400)]
mgr/cephadm: make SMB and NVMEoF upgrade last in staggered upgrade
This needs to happen as some work on the NVMEoF side (still unmerged
as of writing this) will make the NVMEoF daemon dependent on the mon.
Prior to this patch, in a staggered upgrade, all daemons not using the
ceph image were upgraded after the mgr since we typically only care
about the default image changing or potential changes to how we handle
our systemd units which only needs the mgr to be upgraded to be applied.
This NVMEoF dependency on the mon changes this and we can no longer
upgrade it directly after the mgr. This patch changes it so the NVMEoF
daemon is instead upgraded after all ceph image daemons have been
upgraded in a staggered upgrade scenario. Non-staggered upgrades
are unaffected as the NVMEoF daemon was already upgraded near the
end in that scenario. The SMB dameon has no reason it needs to be
upgraded later, but it's in the (small) pool of daemons that don't
use the ceph image and aren't for monitoring, so it's been affected
by this as well.
NOTE: This is a bit of an ugly patch imo and shows that a refactoring
of the upgrade code is likely required. Hopefully this patch is more
of a stopgap until that larger effort can be made
Zac Dover [Sat, 17 Aug 2024 03:37:58 +0000 (13:37 +1000)]
doc/cephfs: s/mountpoint/mount point/
Change the string "mountpoint" to "mount point" in English-language
strings (as opposed to in commands, where the string "mountpoint"
sometimes appears and is correct).
cf. https://github.com/ceph/ceph/pull/58908#discussion_r1697715486
in which page 345 of The IBM Style Guide is referenced to back up this
change.
Zac Dover [Sat, 17 Aug 2024 03:44:30 +0000 (13:44 +1000)]
doc/cephfs: s/mountpoint/mount point/
Change the string "mountpoint" to "mount point" in English-language
strings (as opposed to in commands, where the string "mountpoint"
sometimes appears and is correct).
cf. https://github.com/ceph/ceph/pull/58908#discussion_r1697715486 in
which page 345 of The IBM Style Guide is referenced to back up this
change.
This commit alters only English-language text and example commands in
which the string "{mount point}" is meant to be replaced. No commands
meant for cutting-and-pasting have been altered in this commit.
Failed the test in EC Pool configuration because PGs are
not going into active+clean (our fault for over thrashing and checking the wrong thing).
Also, PG would not go into active because we thrash below min_size
in an EC pool config, not enough shards in the acting set.
Therefore, failed the wait_for_recovery check.
Moreover, When we revive osds, we didn't add the osd back in the cluster,
this messes up true count for live_osds in the test.
Solution:
Instead of randomly choosing OSDs to thrash,
we randomly select a PG from each pool and
thrash the OSDs in the PG's acting set until
we reach min_size, then we check to see if the
PG is still active. After that we revive all
the OSDs to see if the PG recovered cleanly.
We removed some of the unnecessary part such
as `min_dead`, `min_live`, `min_out` and etc.
Also, we refractored the part of where we are
assigning k,m for the EC pools so that we get
better code readablility.
Zac Dover [Mon, 12 Aug 2024 12:47:08 +0000 (22:47 +1000)]
doc/cephfs: improve cache-configuration.rst
Improve the text in the section about dealing with cache-pressure alerts
that was added in https://github.com/ceph/ceph/pull/59077. The changes
in this commit were suggested by Anthony D'Atri.
Co-authored-by: Patrick Donnelly <pdonnelly@redhat.com> Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit aa3bdae2314fef2fca8fc12dca006af657235e17)
Ilya Dryomov [Fri, 2 Aug 2024 07:27:42 +0000 (09:27 +0200)]
librbd/migration: make ImageDispatch handle encryption for non-native formats
With NativeFormat now being handled via dispatch, handling encryption
for non-native formats (i.e. mapping to raw image extents and performing
decryption/mapping back on completion) in the migration layer is really
straightforward.
Note that alignment doesn't need to be performed in the migration layer
because it happens on the destination image -- the "align and resubmit"
logic in C_UnalignedObjectReadRequest should kick in before the call to
read_parent().
Fixes: https://tracker.ceph.com/issues/53674 Co-authored-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0000c3447407772039121bb4499f243df1c889da)
Conflicts:
src/librbd/migration/ImageDispatch.cc [ commit 20aee5bbbcb5
("neorados: Make IOContext getters/setters less weird") not
in reef ]
librbd/migration: close source image in OpenSourceImageRequest
Currently, on errors in FormatInterface::open(), RawFormat disposes
of src_image_ctx, but QCOWFormat doesn't, which is a leak. Rather than
having each format do it internally, do it in OpenSourceImageRequest.
librbd/migration: don't instantiate NativeFormat, handle it via dispatch
Trying to shoehorn NativeFormat under FormatInterface doesn't really
work. It fundamentally doesn't fit in:
- Unlike for RawFormat and QCOWFormat, src_image_ctx for NativeFormat
is not dummy -- it's an ImageCtx for a real RBD image. Pre-creating
it in OpenSourceImageRequest with the expectation that placeholder
values would be overridden later forces NativeFormat to reach into
ImageCtx guts, duplicating the logic in the constructor. This also
necessitates calling snap_set() in a separate step, since snap_id
isn't known at the time ImageCtx is created.
- Unlike for RawFormat and QCOWFormat, get_image_size() and
get_snapshots() implementations for NativeFormat are dummy.
- read() and list_snaps() implementations for NativeFormat are
inconsistent: read() passes through io::ImageDispatch layer, but
list_snaps() doesn't. Both can be passing through, meaning that in
essence these are also dummy.
All of this is with today's code. Additional complications arise with
planned support for migrating from external clusters where src_image_ctx
would require more invasive patching to "move" to an IoCtx belonging to
an external cluster's CephContext and also with other work.
With the above in mind, NativeFormat actually consists of:
1. Code that parses the "type: native" source spec
2. Code that patches ImageCtx, working around the fact that it's
pre-created in OpenSourceImageRequest
3. A bunch of dummy implementations for FormatInterface
With this change, (1) is wrapped into a static method that also creates
ImageCtx after all required parameters are known and (2) and (3) go away
entirely. NativeFormat no longer implements FormatInterface and doesn't
get instantiated at all.
In preparation for not instantiating NativeFormat and losing a copy of
the source spec JSON object in m_json_object, refactor the parsing code
to use only const methods (which std::map's operator[] isn't) and local
variables where possible.
librbd/migration/NativeFormat: do pool lookup instead of creating io_ctx
A Rados instance is sufficient to map the pool name to the pool ID,
no need to involve an IoCtx instance as well. While at it, report
distinctive errors for a non-existing pool and an invalid JSON value
for pool_name key cases.
librbd/migration: make SourceSpecBuilder::parse_source_spec() static
In preparation for divorcing NativeFormat from FormatInterface and
changing when/how src_image_ctx is created, make parse_source_spec()
independent of src_image_ctx. The "invalid source-spec JSON" error is
duplicated by the "failed to parse migration source-spec" error, so
just get rid of the former to spare having to pass CephContext to
parse_source_spec().
Add missing spaces, don't use the word stream when reporting errors
on POSIX file operations (open() and lseek64()) and fix a cut-and-paste
typo in RawSnapshot.
Zac Dover [Wed, 7 Aug 2024 13:11:11 +0000 (23:11 +1000)]
doc/cephfs: add cache pressure information
Add information to doc/cephfs/cache-configuration.rst about how to deal
with a message that reads "clients failing to respond to cache
pressure". This procedure explains how to slow the growth of the
recall_caps value so that it does not exceed the
mds_recall_warning_threshold.
The information in this commit was developed by Eugen Block. See
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/5ROH5CWKKOEIQMVXOVRT5OO7CWK2HPM3/#J65DFUPP4BY57MICPANXKI7KAXSZ5Z5P
and https://www.spinics.net/lists/ceph-users/msg73188.html.
Fixes: https://tracker.ceph.com/issues/57115 Co-authored-by: Eugen Block <eblock@nde.ag> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit bf26274ae4737417193f8c2b56bea20eb2a358aa)
Casey Bodley [Fri, 3 May 2024 19:43:39 +0000 (15:43 -0400)]
rgw: move publish_complete() back to RGWCompleteMultipart::execute()
move publish_complete() and meta_obj->delete_object() back to execute()
so they only run on success. this allows several member variables to
move back to execute()'s stack as well
Casey Bodley [Fri, 3 May 2024 19:29:00 +0000 (15:29 -0400)]
rgw: CompleteMultipart uses s->object for Notification
get_notification() should be associated with the target object
s->object. the meta_obj has the wrong object name, so required passing
s->object->get_name() as an extra argument
importantly, Notification no longer depends on the lifetime of meta_obj
to avoid a dangling pointer, while the lifetime of s->object is guaranteed