Rachanaben Patel [Tue, 16 Mar 2021 22:37:46 +0000 (15:37 -0700)]
doc/RBD:fixes for ceph-immutable-object-cache daemon enable command
Document for rbd-persistent-read-only-cache show how to manage
ceph-immutable-object-cache daemon using systemd.
command example needs fixing.It should be
Sage Weil [Tue, 16 Mar 2021 17:04:55 +0000 (13:04 -0400)]
Merge PR #39931 into master
* refs/pull/39931/head:
mgr/cephadm: fall back to service spec port if none on DaemonDescription
mgr/cephadm: fix redeploy when daemons have ip:port
mgr/cephadm/schedule: add test case
qa/suites/rados/cephadm/smoke-roleless: add rgw test on many ports
doc/cephadm/rgw: update docs to show count-per-host
mgr/cephadm: add support for rgw_frontend_type (beast or civetweb)
mgr/cephadm: remove ssl_frontend_ssl_key from RGWSpec
mgr/cephadm: fix beast private key config option
mgr/cephadm: fix rgw ssl cert/key config-key path
mgr/cephadm/schedule: dynamically assign ports for rgw
mgr/cephadm/schedule: only 1 port in DaemonPlacement
mgr/cephadm: move rgw frontend logic into RgwService
mgr/cephadm/schedule: return DaemonPlacement instead of HostPlacementSpec
mgr/cephadm/schedule: remove unused methods
mgr/cephadm: propagate ip:port from CephadmDaemoNDeploySpec to deployment
cephadm: populate ports if known and not included in unit.meta
mgr/cephadm: gather and report ports in 'orch ps' output
Zac Dover [Mon, 15 Mar 2021 15:03:06 +0000 (01:03 +1000)]
doc/cephadm: break mon section into sections
This PR breaks the "Deploy Additional Monitors" section
of the cephadm documentation into several subsections
whose titles spotlight the matter under discussion in
those respective subsections.
inb4: Another PR is on deck that rewrites the sentences
in this chapter of the cephadm documentation. I'd like
to get this chapter broken up into these subsections before
I rewrite those sentences. So I'm hoping for no grammatical
mission creep on this one. The grammar and clarity updates
are coming.
Sage Weil [Mon, 15 Mar 2021 22:58:15 +0000 (18:58 -0400)]
mgr/cephadm: fall back to service spec port if none on DaemonDescription
For an RGW instance that is deployed in an older version, we won't have
IP or port metadata in the DaemonDescription. In that case, fall back
to the port in the ServiceSpec. This should be safe since any such
instance will only have 1 daemon per host.
Sage Weil [Mon, 15 Mar 2021 19:34:13 +0000 (15:34 -0400)]
mgr/cephadm: fix redeploy when daemons have ip:port
The _daemon_action() method can be called directly by upgrade and by
the 'orch daemon <action> <name>' commands. When this happens, construct
a CephadmDaemonDeploySpec from the DaemonDescription that incldes the
metadata we assigned when teh service was created: the IP and port(s).
This fixes upgrade and the CLI.
Sage Weil [Wed, 10 Mar 2021 19:58:09 +0000 (14:58 -0500)]
mgr/cephadm: remove ssl_frontend_ssl_key from RGWSpec
Since this didn't work anyway, stop collecting and passing through the
private key portion of the certificate. Instead, users should include
both in the first option. This is simpler, and provides consistency
across civetweb and beast rgw backends (for whatever that is worth).
Sage Weil [Wed, 10 Mar 2021 19:50:30 +0000 (14:50 -0500)]
mgr/cephadm: fix beast private key config option
This has always been broken. However, beast SSL will also accept a pem
(cert + key) via the ssl_certificate option, so any existing non-broken
users (if they exist) must be using that.
Sage Weil [Wed, 10 Mar 2021 18:59:18 +0000 (13:59 -0500)]
mgr/cephadm/schedule: only 1 port in DaemonPlacement
It would be weird to dynamically number multiple ports (although doable).
But until we have plans to support something like that, no need to handle
it here.
Sage Weil [Wed, 10 Mar 2021 17:24:32 +0000 (12:24 -0500)]
mgr/cephadm/schedule: return DaemonPlacement instead of HostPlacementSpec
Create a new type for the result of scheduling/place(). Include new fields
like ip:port. Introduce a matches_daemon() to see whether an existing
daemon matches a scheduling slot.
Sage Weil [Mon, 15 Mar 2021 18:49:24 +0000 (14:49 -0400)]
Merge PR #39979 into master
* refs/pull/39979/head:
python-common: fix PlacementSpec target size method
python-common: count-per-host must be combined with label or hosts or host_pattern
mgr/cephadm: handle bare 'count-per-host:NNN', fix comments
mgr/cephadm/schedule: remove Scheduler abstraction (for now at least)
mgr/cephadm/schedule: calculate additions/removals in place()
mgr/cephadm/schedule: allow colocation of certain daemon types
mgr/cephadm/schedule: shuffle candidates, not final placements
mgr/cephadm/schedule: pass per-type allow_colo to the scheduler
mgr/cephadm/services/cephadmservice: fix typo
mgr/cephadm/schedule: pass daemons, not get_daemons_func
mgr/cephadm: use local var
mgr/cephadm/schedule: move host filtering into get_candidates()
python-common/ceph/deployment/service_spec: disallow max-per-host + explicit placement
mgr/cephadm/schedule: respect count-per-host
mgr/cephadm: adjust deployment logic to allow multiple daemons per host
python-common: add count-per-host to PlacementSpec
mgr/cephadm: do not worry about even # of monitors
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Sebastian Wagner <swagner@suse.com>
Sage Weil [Mon, 15 Mar 2021 16:55:36 +0000 (11:55 -0500)]
mgr/cephadm: tolerate failure to update daemon caps
If we're upgrading from 15.2.0, we may fail to update caps. Instead of
failing the upgrade hard, warn to the log and continue. This is less
than ideal, but the caps will get corrected the next time the daemon is
redeployed on the next upgrade, and most likely the previous caps will
continue to work (given they were presumably working before the upgrade).
Or Ozeri [Tue, 9 Mar 2021 20:14:49 +0000 (22:14 +0200)]
librbd: crypto format api semantics change
This commit alters the semantics of the encryption format api
to also load the encryption after format completes.
Additionally, several other small changes in librbd crypto are included,
in preparation of supporting clone formatting.
David Zafman [Sat, 13 Mar 2021 05:56:28 +0000 (05:56 +0000)]
test: osd-recovery-scrub.sh: Test fails if no scrubs happened for a recovering pg
Change TEST_recovery_scrub_2 to create more objects and use
osd_recovery_sleep to prevent recovery from finihing before
we start to scrub. Verify that at least 1 scrub was started
while the pg was reovering.
Fixes: https://tracker.ceph.com/issues/49779 Signed-off-by: David Zafman <dzafman@redhat.com>
Kefu Chai [Fri, 12 Mar 2021 06:51:30 +0000 (14:51 +0800)]
osd: do not handle pre-octopus messages
MOSDPGQuery and MOSDPGInfo messages are sent by
pre-octopus OSD, so in quincy and up clusters, we do not need
to handle them anymore, as we can only upgrade from octopus and
up to quincy.
we can drop MOSDPGNotify after Q + 2, though, after we stop sending
MOSDPGNotify in Q release.
Kefu Chai [Fri, 12 Mar 2021 07:09:12 +0000 (15:09 +0800)]
osd: send MOSDPGNotify2 instead of MOSDPGNotify
as we prefer sending MOSDPGNotify2 over MOSDPGNotify in PeeringState
in post octopus, to be more consistent and have one less thing to
worry, let's just use MOSDPGNotify2 in OSD.cc as well.
Kefu Chai [Fri, 12 Mar 2021 06:40:47 +0000 (14:40 +0800)]
osd: drop OSD::create_context()
OSD::create_context() was used for creating PeeringCtx from OSD's
require_osd_release. but since the check against require_osd_release
is not required anymore, let's drop this helper.
Kefu Chai [Fri, 12 Mar 2021 06:23:50 +0000 (14:23 +0800)]
osd/PeeringState: do not check for require_osd_release
before this change, we always check for require_osd_release when
creating MOSDPGNotify2 or MOSDPGNotify, if require_osd_release is
greater or equal to octopus, MOSDPGNotify2 is created.
since we are in a post-quincy era, and we only need to upgrade from
octopus and up to quincy, there is no need to be compatible with
osd whose version is lower than octopus.
in this change, the check in `BufferedRecoveryMessages::send_notify()`
is dropped.
The tests needs to scrub while recovery is in progress, so catching
recovery from the logs after the fact isn't the proper setup.
We can use osd_recovery_sleep config.
Sage Weil [Sat, 13 Mar 2021 16:34:43 +0000 (11:34 -0500)]
osd: propagate base pool application_metadata to tiers
If there is application metadata on the base pool, it should be mirrored
to any other tiers in the set. This aligns with the fact that the
'ceph osd pool application ...' commands refuse to operate on a non-base
pool.
This fixes problems with accessing tiers (e.g., cache tiers) when the
cephx cap is written in terms of application metadata.
Fixes: https://tracker.ceph.com/issues/49788 Signed-off-by: Sage Weil <sage@newdream.net>
Kefu Chai [Fri, 12 Mar 2021 11:39:28 +0000 (19:39 +0800)]
cmake: do not build lockdep for Release build
lockdep create large data structures on .bss and on heap for tracking
the locks and their dependencies. but we don't need to pay for this
if lockdep is not enabled.
lockdep helps us to track the lock dependencies related issue on Debug
build. and Release build, this feature hurts the performance and more
importantly, lockdeps is a feature only kicks in when using the
mutex_debug and friends. they are not used in Release build at all.
so, after this change, lockdep is not built in Release build. and
the static variables defined in lockdep.cc are not allocated anymore
in Release build.
Greg Farnum [Fri, 12 Mar 2021 22:41:03 +0000 (22:41 +0000)]
osd: PeeringState: implement an acting_set_writeable() function
Use it instead of direct checks against min_size and stretch_set_can_peer()
when deciding whether to go STATE_ACTIVE/STATE_PEERED or do updates
to things like last_epoch_started.
Greg Farnum [Fri, 12 Mar 2021 20:13:38 +0000 (20:13 +0000)]
osd: PeeringState: fix stretch peering so PGs can go peered but not active
I misunderstood and there was a pretty serious error here: to prevent
accidents, choose_acting() was preventing PGs from *finishing* peering
if they didn't satisfy the stretch cluster rules. What we actually want
to do is to finish peering, but not go active.
Happily, this is easy to fix -- we just add a call to stretch_set_can_peer()
alongside existing min_size checks when we choose whether to go PG_STATE_ACTIVE
or PG_STATE_PEERED!
Greg Farnum [Thu, 11 Mar 2021 22:19:10 +0000 (22:19 +0000)]
osd: PeeringState: don't add acting-set OSDs to candidate set in stretch mode
We were adding them once from the acting set, and then once from the all_infos
set, and that hit an assert later on. (I think it was otherwise harmless, but
I don't want to weaken the assert!)
There was a major error here! get_ancestor() was type-deduced to return
a bucket_candidates_t -- a *copy* of what was in the map, not the reference
to it we wanted to actually amend!
Fix this by returning a pointer instead. There's a way to coerce things
to return a reference instead but the syntax seems clumsier to me
and I'm not familiar with it anyway -- this works just fine.
Greg Farnum [Thu, 11 Mar 2021 07:40:52 +0000 (07:40 +0000)]
osd: PeeringState: respect stretch peering constraints for async recovery
Happily this is pretty simple: we just need to check that the resulting
wanted set can peer, which we have a function for. Run it before actually
swapping the want and candidate_want sets.
If we're not in stretch mode, this is a cheap function call that
will always return true, so it's equivalent to what we already have for them.
Greg Farnum [Thu, 11 Mar 2021 07:19:26 +0000 (07:19 +0000)]
osd: PeeringState: add a comment about using size as a proxy for activateable
When reviewing, I mistakenly thought we needed to skip a size check in
choose_acting() in case of mismatches between size and bucket counts, but that
is not accurate!
Sage Weil [Fri, 12 Mar 2021 20:21:49 +0000 (15:21 -0500)]
mgr: wait for ~3 beacons on startup if mons are pre-pacific
If we are going active and the mons are pre-pacific, they may have the
bug https://tracker.ceph.com/issues/49778 which prevents our modules
metadata (including options) from being updated (until the next beacon).
Wait a bit (6s by default, 3x the 2s mgr_tick_period) to let this
happen.
This allows us to upgrade from broken pre-pacific mons using cephadm,
which may (if orig cluster is <15.2.5) immediately do a cephadm
migration that relies on the mgr/cephadm/migration_current config
option being present in the mon's mgrmap.
Workaround for https://tracker.ceph.com/issues/49778