Or Ozeri [Tue, 9 Mar 2021 20:14:49 +0000 (22:14 +0200)]
librbd: crypto format api semantics change
This commit alters the semantics of the encryption format api
to also load the encryption after format completes.
Additionally, several other small changes in librbd crypto are included,
in preparation of supporting clone formatting.
David Zafman [Sat, 13 Mar 2021 05:56:28 +0000 (05:56 +0000)]
test: osd-recovery-scrub.sh: Test fails if no scrubs happened for a recovering pg
Change TEST_recovery_scrub_2 to create more objects and use
osd_recovery_sleep to prevent recovery from finihing before
we start to scrub. Verify that at least 1 scrub was started
while the pg was reovering.
Fixes: https://tracker.ceph.com/issues/49779 Signed-off-by: David Zafman <dzafman@redhat.com>
Kefu Chai [Fri, 12 Mar 2021 06:51:30 +0000 (14:51 +0800)]
osd: do not handle pre-octopus messages
MOSDPGQuery and MOSDPGInfo messages are sent by
pre-octopus OSD, so in quincy and up clusters, we do not need
to handle them anymore, as we can only upgrade from octopus and
up to quincy.
we can drop MOSDPGNotify after Q + 2, though, after we stop sending
MOSDPGNotify in Q release.
Kefu Chai [Fri, 12 Mar 2021 07:09:12 +0000 (15:09 +0800)]
osd: send MOSDPGNotify2 instead of MOSDPGNotify
as we prefer sending MOSDPGNotify2 over MOSDPGNotify in PeeringState
in post octopus, to be more consistent and have one less thing to
worry, let's just use MOSDPGNotify2 in OSD.cc as well.
Kefu Chai [Fri, 12 Mar 2021 06:40:47 +0000 (14:40 +0800)]
osd: drop OSD::create_context()
OSD::create_context() was used for creating PeeringCtx from OSD's
require_osd_release. but since the check against require_osd_release
is not required anymore, let's drop this helper.
Kefu Chai [Fri, 12 Mar 2021 06:23:50 +0000 (14:23 +0800)]
osd/PeeringState: do not check for require_osd_release
before this change, we always check for require_osd_release when
creating MOSDPGNotify2 or MOSDPGNotify, if require_osd_release is
greater or equal to octopus, MOSDPGNotify2 is created.
since we are in a post-quincy era, and we only need to upgrade from
octopus and up to quincy, there is no need to be compatible with
osd whose version is lower than octopus.
in this change, the check in `BufferedRecoveryMessages::send_notify()`
is dropped.
The tests needs to scrub while recovery is in progress, so catching
recovery from the logs after the fact isn't the proper setup.
We can use osd_recovery_sleep config.
Sage Weil [Sat, 13 Mar 2021 16:34:43 +0000 (11:34 -0500)]
osd: propagate base pool application_metadata to tiers
If there is application metadata on the base pool, it should be mirrored
to any other tiers in the set. This aligns with the fact that the
'ceph osd pool application ...' commands refuse to operate on a non-base
pool.
This fixes problems with accessing tiers (e.g., cache tiers) when the
cephx cap is written in terms of application metadata.
Fixes: https://tracker.ceph.com/issues/49788 Signed-off-by: Sage Weil <sage@newdream.net>
Kefu Chai [Fri, 12 Mar 2021 11:39:28 +0000 (19:39 +0800)]
cmake: do not build lockdep for Release build
lockdep create large data structures on .bss and on heap for tracking
the locks and their dependencies. but we don't need to pay for this
if lockdep is not enabled.
lockdep helps us to track the lock dependencies related issue on Debug
build. and Release build, this feature hurts the performance and more
importantly, lockdeps is a feature only kicks in when using the
mutex_debug and friends. they are not used in Release build at all.
so, after this change, lockdep is not built in Release build. and
the static variables defined in lockdep.cc are not allocated anymore
in Release build.
Kefu Chai [Fri, 12 Mar 2021 11:32:16 +0000 (19:32 +0800)]
cmake: do not build mutex_debug.cc if !WITH_CEPH_DEBUG_MUTEX
there is no need to build shared_mutex_debug.cc and
mutex_debug.cc, if they are not used at all. in Release build
we just use the mutex primitives offered by C++ standard library and
the POSIX API offered by libc.
Kefu Chai [Fri, 12 Mar 2021 04:02:22 +0000 (12:02 +0800)]
ceph.spec: build with system libpmem on fedora and el8
* build with WITH_SYSTEM_PMDK=ON on fedora, as f32 and f33 ship
libpmem1.8 and libpmem1.9 respectively. and we need libpmem v1.7
* build with WITH_SYSTEM_PMDK=ON on el8, as el8 and CentOS8 AppStream
ships libpmem v1.6,
quote from nvml.spec:
> By design, PMDK does not support any 32-bit architecture.
> Due to dependency on some inline assembly, PMDK can be compiled only
> on these architectures:
> - x86_64
> - ppc64le (experimental)
> - aarch64 (unmaintained, supporting hardware doesn't exist?)
so far, only x86_64 and ppc64le packages are built.
see also,
https://src.fedoraproject.org/rpms/nvml/blob/rawhide/f/nvml.spec
Kefu Chai [Fri, 12 Mar 2021 11:29:54 +0000 (19:29 +0800)]
cmake: make "WITH_CEPH_DEBUG_MUTEX" depend on CMAKE_BUILD_TYPE
this option is available only if CMAKE_BUILD_TYPE is Debug.
this change helps us to unify the checks for WITH_CEPH_DEBUG_MUTEX,
without this change, we always have to check both WITH_CEPH_DEBUG_MUTEX
*and* CMAKE_BUILD_TYPE.
after this change, we only respect WITH_CEPH_DEBUG_MUTEX.
Sebastian Wagner [Fri, 12 Mar 2021 11:04:54 +0000 (12:04 +0100)]
Merge pull request #39857 from adk3798/dup-labels
mgr/cephadm: remove duplicate labels when adding a host
Reviewed-by: Juan Miguel Olmo MartÃnez <jolmomar@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
Sage Weil [Thu, 11 Mar 2021 21:56:52 +0000 (16:56 -0500)]
cephadm: use image id, not name, when inspecting for RepoDigests
The name is ambiguous, but the image_id is not! This fixes problems
during upgrade where upgrade thinks the container is upgraded (due to
an incorrect digest) when in fact it is not.
Fixes: 0826c45e0cb5d60fcf8cd71cd14edd34a6997cd4 Signed-off-by: Sage Weil <sage@newdream.net>
Xuehan Xu [Wed, 10 Mar 2021 07:31:23 +0000 (15:31 +0800)]
crimson/osd: retrive client_requests' prev_op_id right before "start_op"
ClientRequest::prev_op_id should record its immediate predecessor in the
pipeline. If we capture sequencer's last_issued when creating the client
request, it may not represent that predecessor
Adam Kupczyk [Mon, 7 Dec 2020 13:57:04 +0000 (14:57 +0100)]
os/bluestore: Added asserts for allocator regions
Functions release/init_add_free/init_rm_free did not check its input against device size.
It is incorrect and had been a problem when you shrink device.
Samuel Just [Tue, 9 Mar 2021 02:09:15 +0000 (18:09 -0800)]
crimson/os/seastore: add releasing state for segments pending close
This should fix a bug by which we might start scanning a segment a second
time as it is released and possibly even reused resulting in nonsensical
behavior.
Adam King [Wed, 24 Feb 2021 21:13:01 +0000 (16:13 -0500)]
mgr/cephadm: update caps if necessary when getting keyring
If the caps change from the old version to the new one it causes
issues in the upgrade. This allows the caps to be updated. Currently
only seeing this with iscsi but changing it for other as a precaution
Neha Ojha [Tue, 9 Mar 2021 00:48:58 +0000 (00:48 +0000)]
pybind/mgr/balancer/module.py: assign weight-sets to all buckets before balancing
Add an additional check to make sure that the choose_args section has the same
number of buckets as the crushmap. If not, ensure that
get_compat_weight_set_weights assigns weight-sets to all buckets.
Without this change, if we end up with an orig_ws, which has fewer buckets
than the crushmap, the mgr will crash due a KeyError in do_crush_compat().