xie xingguo [Thu, 12 Mar 2020 01:41:10 +0000 (09:41 +0800)]
osd/PeeringState: fix pending want_acting vs osd offline race
In general there are two scenarios we might call choose_acting to
post a pending want_change change:
1) we are in the middle of peering, and we decide to select some
peers other than current acting set in order to continue serving
client reads and writes.
In this case, when any OSD from the pending want_acting set goes down,
primary will restart peering process and tidy want_acting up properly
(see PeeringState::Primary::exit()).
2) PG is active, and we want to transit all successfully backfilled
(or async-recovered) peers back into up set.
In this case, any want_acting member is deemed to be either coming from
the current up set or acting set (as we pass restrict_to_up_acting == true
when calling down into choose_acting).
From 1, we know we'd never leak a want_acting set that might
contain stray peers into Active state. From 2, we know that assert would
effectively catch any potential bad Active choose_acting callers without
setting restrict_to_up_acting properly.
However, in 023524a I did introduce a third scenario that might be
against rule 2 — we now call choose_acting with restrict_to_up_acting
option off on any stray peer coming back to life when PG is active,
and if that peer is down (again) and the corresponding pg_temp change
is still in-flight, then we would reliably fire the assert.
Fix by calling choose_acting again whenever Active sees a new map
that marks down an stray osd in want_acting, so we don't leave
a dirty want_acting (and pg_temp) there.
Sage Weil [Sun, 22 Mar 2020 23:32:11 +0000 (18:32 -0500)]
Merge PR #34042 into octopus
* refs/pull/34042/head:
mgr/rook: list rgw services
mgr/rook: tolerate timestamps that are None
mgr/orch: add 'subcluster' property to RGWSpec
mgr/rook: do not create radosgw pools
mgr/rook: refactor apply/add for rgw
mgr/cephadm: configure rgw_frontends for rgw service
mgr/orch: accept port and ssl flags to 'apply rgw'
python-common/ceph/deployment/service_spec: add ssl to RGWSpec
mgr/rook: fix 'orch ps' for osds
Sage Weil [Fri, 20 Mar 2020 18:56:47 +0000 (14:56 -0400)]
mgr/rook: do not create radosgw pools
First, we don't know how big they should be or what they should look like.
The caller should already know that, and/or radosgw can create the pools
itself.
This depends on https://github.com/rook/rook/pull/5058
Sage Weil [Wed, 18 Mar 2020 21:20:12 +0000 (17:20 -0400)]
mgr/rook: refactor apply/add for rgw
A few caveats here:
- enforce that realm == zone, since that is all rook does at the moment.
- we force a (bad!) pool configuration, since rook requires that these
be present (instead of allowing radosgw or the caller to create the pools)
Jason Dillaman [Fri, 20 Mar 2020 16:59:14 +0000 (12:59 -0400)]
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
When a new leader acquires the lock, it will send out a lock acquired
notification along with periodic heartbeats. The get locker will attempt to
run immediately, but if a heartbeat arrives before it executes the heartbeat
will cancel the timer and reschedule it for the future. This process repeats
for each periodic heartbeat and the locker is never re-read from the OSD.
This is an issue only for namespace replayers due to the delayed fashion in
which the leader instance id is retrieved.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 20 Mar 2020 14:54:43 +0000 (10:54 -0400)]
rbd-mirror: snapshot sync request needs to check for interruption
If the sync request was locally canceled, we need to resume the paused
shut down logic instead of just notifying the image replayer state
machine of the change -- since it had already requested a shut down and
will not re-request it.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 19 Mar 2020 14:57:03 +0000 (10:57 -0400)]
librbd: request exclusive lock when moving to trash
Even if the image is in-use, moving it to the trash does not
remove any data. This also solves a race between snapshot-based
mirroring shutting down and being able to move a mirrored image
to the trash.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 18 Mar 2020 19:01:32 +0000 (15:01 -0400)]
rbd-mirror: basic integration with sync throttling
snapshot-based mirroring did not have any throttling to prevent
too many concurrent syncs from running. Since each sync might need
to iterate over every object of an image, that could potentially
put an extreme burden on the remote cluster.
A future PR will add a more intelligent throttle based on the actual
number of objects needed to be scanned.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Thu, 19 Mar 2020 17:21:52 +0000 (12:21 -0500)]
Merge PR #34030 into octopus
* refs/pull/34030/head:
cephadm: env over last used
cephadm: fall back to default for infer_image
cephadm: remove outdated check
cephadm: consolidate default image logic
cephadm: only infer image for shell, run, inspect-image, pull, ceph-volume
Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Ricardo Marques <rimarques@suse.com>
Sage Weil [Thu, 19 Mar 2020 13:22:40 +0000 (08:22 -0500)]
Merge PR #34027 into octopus
* refs/pull/34027/head:
qa/workunits/cephadm/test_cephadm: mark services unmanaged for test
mgr/cephadm: do not reconfig unmanaged services
qa/workunits/cephadm/test_cephadm: output file for pub key
Sage Weil [Thu, 19 Mar 2020 00:04:14 +0000 (19:04 -0500)]
mgr/progress: fix duration strings
- simplify the code to just calculate the durations when we need them
(I'm not sure why we had those temporary strings!)
- use a nicer time delta format
Fixes: https://tracker.ceph.com/issues/44672 Signed-off-by: Sage Weil <sage@redhat.com>
Jason Dillaman [Wed, 18 Mar 2020 16:54:16 +0000 (12:54 -0400)]
qa/workunits/rbd: use context managers to control Rados lifespan
There is a potential race between the expected exceptions being
thrown and Python shutting down racing with librados background
threads. Ensure that librados is properly shut down prior to
exiting Python.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Wed, 18 Mar 2020 14:45:16 +0000 (09:45 -0500)]
Merge PR #33981 into octopus
* refs/pull/33981/head:
doc/install: edits
doc/cephadm: more edits
doc/cephadm/install: edits
doc/cephadm/adoption: improvements
doc/cephadm/install: a few edits
doc/cephadm/install: do not install ceph-common on host (by default)
doc/cephadm: drop os recs link
doc/cephadm/upgrade: improvements
doc/cephadm/upgrade: document upgrade
doc/cephadm/install: revamp install docs
doc: reorganize cephadm docs
doc/cephadm/administration: update docs on customizing SSH config
doc/cephadm/administration: add a note about the 'removed' dir
Kiefer Chang [Wed, 18 Mar 2020 07:21:35 +0000 (15:21 +0800)]
mgr/dashboard: fix adding/removing host errors
Send a HostSpec instance to the Orchestrator when adding a host.
Also, to be consistent with other components:
- Reword from Add/Remove hosts to Create/Delete hosts
- Display a modal when there is no Orchestrator backend enabled
Venky Shankar [Tue, 14 Jan 2020 09:13:16 +0000 (04:13 -0500)]
mgr/volumes: introduce 'canceled' state in clone op state machine
When fetching the next execution state, -EINTR jumps to 'canceled'
state signifying a canceled (interrupted) operation. Also include
a helper routine to check if a given state machine is in initial
state.