Jason Dillaman [Fri, 20 Mar 2020 16:59:14 +0000 (12:59 -0400)]
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
When a new leader acquires the lock, it will send out a lock acquired
notification along with periodic heartbeats. The get locker will attempt to
run immediately, but if a heartbeat arrives before it executes the heartbeat
will cancel the timer and reschedule it for the future. This process repeats
for each periodic heartbeat and the locker is never re-read from the OSD.
This is an issue only for namespace replayers due to the delayed fashion in
which the leader instance id is retrieved.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 20 Mar 2020 14:54:43 +0000 (10:54 -0400)]
rbd-mirror: snapshot sync request needs to check for interruption
If the sync request was locally canceled, we need to resume the paused
shut down logic instead of just notifying the image replayer state
machine of the change -- since it had already requested a shut down and
will not re-request it.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 19 Mar 2020 14:57:03 +0000 (10:57 -0400)]
librbd: request exclusive lock when moving to trash
Even if the image is in-use, moving it to the trash does not
remove any data. This also solves a race between snapshot-based
mirroring shutting down and being able to move a mirrored image
to the trash.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 18 Mar 2020 19:01:32 +0000 (15:01 -0400)]
rbd-mirror: basic integration with sync throttling
snapshot-based mirroring did not have any throttling to prevent
too many concurrent syncs from running. Since each sync might need
to iterate over every object of an image, that could potentially
put an extreme burden on the remote cluster.
A future PR will add a more intelligent throttle based on the actual
number of objects needed to be scanned.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Thu, 19 Mar 2020 17:21:52 +0000 (12:21 -0500)]
Merge PR #34030 into octopus
* refs/pull/34030/head:
cephadm: env over last used
cephadm: fall back to default for infer_image
cephadm: remove outdated check
cephadm: consolidate default image logic
cephadm: only infer image for shell, run, inspect-image, pull, ceph-volume
Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Ricardo Marques <rimarques@suse.com>
Sage Weil [Thu, 19 Mar 2020 13:22:40 +0000 (08:22 -0500)]
Merge PR #34027 into octopus
* refs/pull/34027/head:
qa/workunits/cephadm/test_cephadm: mark services unmanaged for test
mgr/cephadm: do not reconfig unmanaged services
qa/workunits/cephadm/test_cephadm: output file for pub key
Sage Weil [Thu, 19 Mar 2020 00:04:14 +0000 (19:04 -0500)]
mgr/progress: fix duration strings
- simplify the code to just calculate the durations when we need them
(I'm not sure why we had those temporary strings!)
- use a nicer time delta format
Fixes: https://tracker.ceph.com/issues/44672 Signed-off-by: Sage Weil <sage@redhat.com>
Jason Dillaman [Wed, 18 Mar 2020 16:54:16 +0000 (12:54 -0400)]
qa/workunits/rbd: use context managers to control Rados lifespan
There is a potential race between the expected exceptions being
thrown and Python shutting down racing with librados background
threads. Ensure that librados is properly shut down prior to
exiting Python.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Wed, 18 Mar 2020 14:45:16 +0000 (09:45 -0500)]
Merge PR #33981 into octopus
* refs/pull/33981/head:
doc/install: edits
doc/cephadm: more edits
doc/cephadm/install: edits
doc/cephadm/adoption: improvements
doc/cephadm/install: a few edits
doc/cephadm/install: do not install ceph-common on host (by default)
doc/cephadm: drop os recs link
doc/cephadm/upgrade: improvements
doc/cephadm/upgrade: document upgrade
doc/cephadm/install: revamp install docs
doc: reorganize cephadm docs
doc/cephadm/administration: update docs on customizing SSH config
doc/cephadm/administration: add a note about the 'removed' dir
Kiefer Chang [Wed, 18 Mar 2020 07:21:35 +0000 (15:21 +0800)]
mgr/dashboard: fix adding/removing host errors
Send a HostSpec instance to the Orchestrator when adding a host.
Also, to be consistent with other components:
- Reword from Add/Remove hosts to Create/Delete hosts
- Display a modal when there is no Orchestrator backend enabled
Venky Shankar [Tue, 14 Jan 2020 09:13:16 +0000 (04:13 -0500)]
mgr/volumes: introduce 'canceled' state in clone op state machine
When fetching the next execution state, -EINTR jumps to 'canceled'
state signifying a canceled (interrupted) operation. Also include
a helper routine to check if a given state machine is in initial
state.
Sage Weil [Sun, 15 Mar 2020 13:45:46 +0000 (08:45 -0500)]
doc: reorganize cephadm docs
- reorganized cephadm into a top-level item with a series of sub-items.
- condensed the 'install' page so that it doesn't create a zillion items
in the toctree on the left
- started updating the cephadm/install sequence (incomplete)
Sage Weil [Tue, 17 Mar 2020 20:03:32 +0000 (15:03 -0500)]
mgr/balancer: tolerate pgs outside of target weight map
We build a target weight map based on the primary crush weights, and
ignore weights that are 0. However, it's possible that existing PGs are
on other OSDs that have weight 0 because the weight-set weight is >0.
That leads to a KeyError exception when we
pgs_by_osd[osd] += 1
and the key isn't present. Fix by simply populating those keys as we
encounter OSDs. Drop the old initialization loop. The net of this is
we may have OSDs outside of target_by_root (won't matter, as far as I can
tell) and we won't have keys for osds with weight 0 (also won't matter,
as far as I can tell).
Fixes: https://tracker.ceph.com/issues/42721 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 14 Mar 2020 21:35:07 +0000 (16:35 -0500)]
update default container images
- For tests, use bleeding-edge octopus branch
- For production defaults, use ceph/ceph:v15.2 tag
- For bootstrap, grab cephadm script from latest octopus branch
Sage Weil [Mon, 16 Mar 2020 22:36:43 +0000 (17:36 -0500)]
Merge PR #33952 into octopus
* refs/pull/33952/head:
qa/workunits/cephadm: --skip-mon-network when using 127.0.0.1
cephadm: add tests
qa/tasks/cephadm: pass -v to bootstrap
mgr/cephadm: only try to place mons on hosts matching public_network
mgr/cephadm: keep track of host networks, ips
cephadm: automatically infer mon public_network, if we can
cephadm: add list-networks command
Sage Weil [Mon, 16 Mar 2020 22:36:17 +0000 (17:36 -0500)]
Merge PR #33955 into octopus
* refs/pull/33955/head:
mgr/cephadm: respect 'unmanaged' flag in spec
mgr/orch: orch ls: show <no spec> or <unmanaged> as appropriate
mgr/orch: orch ls: rename SPEC -> PLACEMENT
mgr/orch: add 'unmanaged' property to ServiceSpec
mgr/orch: combine 'orch daemon add <type> ...' into one command
mgr/orch: combine 'orch apply <type> [<placement>]' into one command