]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
5 years agotest/rbd: fix test_rbd.test_features_to_string 33831/head
Jason Dillaman [Wed, 11 Mar 2020 12:00:16 +0000 (08:00 -0400)]
test/rbd: fix test_rbd.test_features_to_string

The new RBD_FEATURE_NON_PRIMARY overlaps with the legacy test
for an invalid feature bit.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: peer removal needs to open non-primary images in R/W mode
Jason Dillaman [Tue, 10 Mar 2020 23:18:04 +0000 (19:18 -0400)]
librbd: peer removal needs to open non-primary images in R/W mode

The non-primary image might have mirror snapshots that need to be
updated to remove the peer reference.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: race condition in image watcher notification callback
Jason Dillaman [Tue, 10 Mar 2020 17:31:34 +0000 (13:31 -0400)]
librbd: race condition in image watcher notification callback

If a refresh is in-progress when a header update notification is
received, the notification was previously incorrectly dropped.
This prevented rbd-mirror's snapshot-based mirroring replayer from
detecting updates in some cases.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: prevent 'non-primary' feature from being set via API
Jason Dillaman [Tue, 10 Mar 2020 03:41:26 +0000 (23:41 -0400)]
librbd: prevent 'non-primary' feature from being set via API

This feature is mutable from only within librbd as a mirrored image
is promoted/demoted.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agoqa/suites/rbd: exercise different snapshot-based mirroring image features
Jason Dillaman [Tue, 10 Mar 2020 03:01:47 +0000 (23:01 -0400)]
qa/suites/rbd: exercise different snapshot-based mirroring image features

Ensure that snapshot-based mirroring is tested in different RBD image
feature combinations.

Fixes: https://tracker.ceph.com/issues/44396
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agorbd-mirror: remove exclusive-lock requirement from snapshot purge
Jason Dillaman [Tue, 10 Mar 2020 02:47:49 +0000 (22:47 -0400)]
rbd-mirror: remove exclusive-lock requirement from snapshot purge

When using snapshot-based mirroring, there shouldn't be any need to
force the use of the exclusive-lock feature.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: enable mirroring for non-primary images w/o journaling
Jason Dillaman [Tue, 10 Mar 2020 02:26:13 +0000 (22:26 -0400)]
librbd: enable mirroring for non-primary images w/o journaling

The state machine was incorrectly skipping over the mirror enable
step for non-primary images when the journaling feature bit was not
enabled.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: properly copy primary mirror uuid to out param
Jason Dillaman [Tue, 10 Mar 2020 02:08:01 +0000 (22:08 -0400)]
librbd: properly copy primary mirror uuid to out param

This variable is not currently used in snapshot-based mirroring
mode but it should be populated for consistency.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agotest/rbd-mirror: snapshot-based replaying with different features
Jason Dillaman [Tue, 10 Mar 2020 01:38:58 +0000 (21:38 -0400)]
test/rbd-mirror: snapshot-based replaying with different features

The exclusive-lock and journaling features are not required for
snapshot-based mirroring.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: re-use mirror promote state machine when disabling
Jason Dillaman [Mon, 9 Mar 2020 22:32:03 +0000 (18:32 -0400)]
librbd: re-use mirror promote state machine when disabling

The promote state machine will handle remove the non-primary
feature bit and will ensure an interrupted disable operation
doesn't leave things in an inconsistent state.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: enable/disable implicit non-primary feature bit
Jason Dillaman [Mon, 9 Mar 2020 21:04:27 +0000 (17:04 -0400)]
librbd: enable/disable implicit non-primary feature bit

When promoted to primary, disable the non-primary feature bit and
when demoted (or created non-primary), enable the non-primary feature
bit. This will prevent all non rbd-mirror RBD clients from modifying
the RBD image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agorbd-mirror: permit R/W operations against non-primary image
Jason Dillaman [Mon, 9 Mar 2020 20:49:07 +0000 (16:49 -0400)]
rbd-mirror: permit R/W operations against non-primary image

With the non-primary feature bit is enabled, mask-out the read-only
feature bit that will be set in the refresh image state machine if
the image has that feature bit set. This will ensure that only the
rbd-mirror daemon will be able to modify a non-primary image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: track reason why ImageCtx is read-only
Jason Dillaman [Tue, 3 Mar 2020 20:17:52 +0000 (15:17 -0500)]
librbd: track reason why ImageCtx is read-only

This will be utilized by the RefreshRequest state machine to flag the image
as read-only if the new RBD_FEATURE_NON_PRIMARY feature is enabled. Also
allow that flag to be masked out by rbd-mirror daemon to permit IO and
operations against a non-primary image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: new RBD_FEATURE_NON_PRIMARY to prevent R/W IO
Jason Dillaman [Tue, 3 Mar 2020 20:01:35 +0000 (15:01 -0500)]
librbd: new RBD_FEATURE_NON_PRIMARY to prevent R/W IO

When a snapshot-based image is non-primary, we will need to use
this implicit feature to ensure that writes and maintenance
operations cannot be performed against the image.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agoMerge PR #33771 into octopus
Sage Weil [Tue, 10 Mar 2020 22:20:48 +0000 (17:20 -0500)]
Merge PR #33771 into octopus

* refs/pull/33771/head:
common/ceph_timer: Pass reference to waited time on stack
common/ceph_timer: Add test
common/ceph_timer: Use unique_function, allowing noncopyable events
common/ceph_timer: Couple cleanups
common/ceph_timer: Fix namespaces
common/ceph_timer: Add missing includes
common/ceph_timer.h: Don't indent contents of a namespace

Reviewed-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33850 into octopus
Sage Weil [Tue, 10 Mar 2020 22:20:11 +0000 (17:20 -0500)]
Merge PR #33850 into octopus

* refs/pull/33850/head:
spec: Podman (temporarily) requires apparmor-abstractions on suse

Reviewed-by: Sascha Grunert <sgrunert@suse.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
5 years agoMerge PR #33853 into octopus
Sage Weil [Tue, 10 Mar 2020 22:19:20 +0000 (17:19 -0500)]
Merge PR #33853 into octopus

* refs/pull/33853/head:
mgr/cephadm: Make sure we don't co-locate the same daemon

Reviewed-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33857 into octopus
Sage Weil [Tue, 10 Mar 2020 22:18:47 +0000 (17:18 -0500)]
Merge PR #33857 into octopus

* refs/pull/33857/head:
cephadm: bootstrap: wait for mgr to restart after enabling a module
mgr: add 'mgr_status' tell command

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoMerge PR #32990 into octopus
Sage Weil [Tue, 10 Mar 2020 21:28:02 +0000 (16:28 -0500)]
Merge PR #32990 into octopus

* refs/pull/32990/head:
cephadm: flag dashboard user to change password

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
5 years agoMerge PR #33713 into octopus
Sage Weil [Tue, 10 Mar 2020 21:27:48 +0000 (16:27 -0500)]
Merge PR #33713 into octopus

* refs/pull/33713/head:
mgr/cephadm: add _remove_osds_bg back to main loop
mgr/cephadm/osd: update removal report immediately

Reviewed-by: Joshua Schmid <jschmid@suse.de>
5 years agoMerge PR #33838 into octopus
Sage Weil [Tue, 10 Mar 2020 21:26:57 +0000 (16:26 -0500)]
Merge PR #33838 into octopus

* refs/pull/33838/head:
mgr/cephadm: fix service list filtering

Reviewed-by: Sebastian Wagner <swagner@suse.com>
Reviewed-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33832 into octopus
Sage Weil [Tue, 10 Mar 2020 16:07:36 +0000 (11:07 -0500)]
Merge PR #33832 into octopus

* refs/pull/33832/head:
Revert "Merge pull request #33673 from cbodley/wip-denc-enum"

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
5 years agocephadm: bootstrap: wait for mgr to restart after enabling a module 33857/head
Sage Weil [Tue, 10 Mar 2020 14:28:57 +0000 (09:28 -0500)]
cephadm: bootstrap: wait for mgr to restart after enabling a module

It was possible to enable a module (mon updates mgrmap) and then
do a mgr command and have that command reach the mgr before it got the
latest mgrmap and restarted.

Fixes: https://tracker.ceph.com/issues/44531
Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr: add 'mgr_status' tell command
Sage Weil [Tue, 10 Mar 2020 14:26:22 +0000 (09:26 -0500)]
mgr: add 'mgr_status' tell command

For now just dump the mgrmap_epoch

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge pull request #33839 from rhcs-dashboard/44538-fix-rgw-grafana-get-put-latencies
Lenz Grimmer [Tue, 10 Mar 2020 14:29:58 +0000 (15:29 +0100)]
Merge pull request #33839 from rhcs-dashboard/44538-fix-rgw-grafana-get-put-latencies

monitoring: fix RGW grafana chart 'Average GET/PUT Latencies'

Reviewed-by: Jan Fajerski <jfajerski@suse.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Patrick Seidensal <pnawracay@suse.com>
5 years agoMerge pull request #33743 from votdev/issue_43869_fix_qa_test
Lenz Grimmer [Tue, 10 Mar 2020 14:25:50 +0000 (15:25 +0100)]
Merge pull request #33743 from votdev/issue_43869_fix_qa_test

mgr/dashboard: Refactor and cleanup tasks.mgr.dashboard.test_user

Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
5 years agoMerge pull request #33805 from tchaikov/wip-44500
Kefu Chai [Tue, 10 Mar 2020 13:26:29 +0000 (21:26 +0800)]
Merge pull request #33805 from tchaikov/wip-44500

qa/tasks/ceph_manager: capture stderr for COT

Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
5 years agospec: Podman (temporarily) requires apparmor-abstractions on suse 33850/head
Sebastian Wagner [Tue, 10 Mar 2020 11:51:25 +0000 (12:51 +0100)]
spec: Podman (temporarily) requires apparmor-abstractions on suse

`apparmor-abstractions` contains a profile that is required to run podman containers.

Fixes: https://tracker.ceph.com/issues/44272
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
5 years agomgr/cephadm: Make sure we don't co-locate the same daemon 33853/head
Sebastian Wagner [Tue, 10 Mar 2020 12:54:08 +0000 (13:54 +0100)]
mgr/cephadm: Make sure we don't co-locate the same daemon

Fixes: https://tracker.ceph.com/issues/44397
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
5 years agomonitoring: fix RGW grafana chart 'Average GET/PUT Latencies' 33839/head
Alfonso Martínez [Tue, 10 Mar 2020 11:05:26 +0000 (12:05 +0100)]
monitoring: fix RGW grafana chart 'Average GET/PUT Latencies'

Fixes: https://tracker.ceph.com/issues/44538
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
5 years agomgr/cephadm: fix service list filtering 33838/head
Kiefer Chang [Tue, 10 Mar 2020 07:22:40 +0000 (15:22 +0800)]
mgr/cephadm: fix service list filtering

We should apply filters on ServiceSpecs in store too, or services are
returned even filters are applied during collecting daemons.

Fixes: https://tracker.ceph.com/issues/44512
Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
5 years agoMerge PR #33825 into octopus
Sage Weil [Tue, 10 Mar 2020 03:42:05 +0000 (22:42 -0500)]
Merge PR #33825 into octopus

* refs/pull/33825/head:
cephadm: bootstrap: tolerate error return from -h
ceph.in: only shut down rados on clean exit

Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge PR #33811 into octopus
Sage Weil [Tue, 10 Mar 2020 03:26:41 +0000 (22:26 -0500)]
Merge PR #33811 into octopus

* refs/pull/33811/head:
mgr/cephadm: fix upgrade order

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoRevert "Merge pull request #33673 from cbodley/wip-denc-enum" 33832/head
Sage Weil [Tue, 10 Mar 2020 03:23:47 +0000 (22:23 -0500)]
Revert "Merge pull request #33673 from cbodley/wip-denc-enum"

This reverts commit 1041092696c2e3c9ee4e32b764ead06f5bc1f694, reversing
changes made to c2f584f32a9619b39d53178c2327dc26b3a2a27c.

This changed the encoding for certain types.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: fix upgrade order 33811/head
Sage Weil [Mon, 9 Mar 2020 01:38:59 +0000 (20:38 -0500)]
mgr/cephadm: fix upgrade order

Create two variables, CEPH_TYPES and CEPH_UPGRADE_ORDER.  In reality they
are both the same, but this way the meaning is clear, and they lists
won't get out of sync (they should always have the same elements).

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33801 into octopus
Sage Weil [Mon, 9 Mar 2020 21:25:57 +0000 (16:25 -0500)]
Merge PR #33801 into octopus

* refs/pull/33801/head:
qa/suites/rados/ceph: drop opensuse for now

Reviewed-by: Nathan Cutler <ncutler@suse.com>
5 years agoMerge PR #33822 into octopus
Sage Weil [Mon, 9 Mar 2020 20:46:57 +0000 (15:46 -0500)]
Merge PR #33822 into octopus

* refs/pull/33822/head:
cephadm: use `sh` instead of `bash` during enter

Reviewed-by: Sage Weil <sage@redhat.com>
5 years agocephadm: bootstrap: tolerate error return from -h 33825/head
Sage Weil [Mon, 9 Mar 2020 20:45:36 +0000 (15:45 -0500)]
cephadm: bootstrap: tolerate error return from -h

Sometimes we time out connecting to the mon to get commands and return
an error code.

See https://tracker.ceph.com/issues/44526

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33809 into octopus
Sage Weil [Mon, 9 Mar 2020 20:28:19 +0000 (15:28 -0500)]
Merge PR #33809 into octopus

* refs/pull/33809/head:
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds
qa/standalone/scrub/osd-scrub-test: wait longer for update

Reviewed-by: David Zafman <dzafman@redhat.com>
5 years agoMerge PR #32678 into octopus
Sage Weil [Mon, 9 Mar 2020 19:09:00 +0000 (14:09 -0500)]
Merge PR #32678 into octopus

* refs/pull/32678/head:
mgr/dashboard: support multiple DriveGroups when creating OSDs

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
5 years agocephadm: use `sh` instead of `bash` during enter 33822/head
Michael Fritch [Mon, 9 Mar 2020 16:00:07 +0000 (10:00 -0600)]
cephadm: use `sh` instead of `bash` during enter

not all container images use bash (e.g. node-exporter etc)

Signed-off-by: Michael Fritch <mfritch@suse.com>
5 years agoceph.in: only shut down rados on clean exit
Sage Weil [Mon, 9 Mar 2020 17:26:06 +0000 (12:26 -0500)]
ceph.in: only shut down rados on clean exit

If we exit due to a timeout, then calling rados shutdown can lead to all
sorts of problems, because we may still have another thread that is
trying to call rados_connect and/or do some work, and rados_connect
and rados_shutdown don't (and can't!) really behave well when racing
against each other.

Note that shutdown here isn't that important--the process is about to
exit anyway.  It's only useful to exercise the shutdown code path more
often.

Fixes: https://tracker.ceph.com/issues/44526
Signed-off-by: Sage Weil <sage@redhat.com>
5 years agocommon/ceph_timer: Pass reference to waited time on stack 33771/head
Adam C. Emerson [Fri, 6 Mar 2020 03:14:47 +0000 (22:14 -0500)]
common/ceph_timer: Pass reference to waited time on stack

std::condition_variable::wait_until takes a const reference to a
time_point. It may access this reference after relinquishing the
mutex, creating a potential use-after-free error if the first event is
shut down.

So, just copy the time onto the stack, so we have a reference that
won't disappear.

https://tracker.ceph.com/issues/44373

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer: Add test
Adam C. Emerson [Fri, 6 Mar 2020 02:45:11 +0000 (21:45 -0500)]
common/ceph_timer: Add test

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer: Use unique_function, allowing noncopyable events
Adam C. Emerson [Fri, 6 Mar 2020 02:15:25 +0000 (21:15 -0500)]
common/ceph_timer: Use unique_function, allowing noncopyable events

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer: Couple cleanups
Adam C. Emerson [Thu, 5 Mar 2020 22:57:41 +0000 (17:57 -0500)]
common/ceph_timer: Couple cleanups

Take advantage of a couple things in Boost.Intrusive that make the
code less obfuscated.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer: Fix namespaces
Adam C. Emerson [Thu, 5 Mar 2020 21:16:15 +0000 (16:16 -0500)]
common/ceph_timer: Fix namespaces

Stuffing things into 'detail' and using them just makes backtraces and
valgrind illegible.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer: Add missing includes
Adam C. Emerson [Thu, 5 Mar 2020 21:08:12 +0000 (16:08 -0500)]
common/ceph_timer: Add missing includes

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agocommon/ceph_timer.h: Don't indent contents of a namespace
Adam C. Emerson [Thu, 5 Mar 2020 21:03:37 +0000 (16:03 -0500)]
common/ceph_timer.h: Don't indent contents of a namespace

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 years agoMerge PR #33793 into master
Sage Weil [Mon, 9 Mar 2020 13:28:57 +0000 (08:28 -0500)]
Merge PR #33793 into master

* refs/pull/33793/head:
qa/suites/rados/cephadm/upgrade: new start point
qa/tasks/cephadm: put bootstrap config etc directly in /etc/ceph
cephadm: shell: default to config and keyring in /etc/ceph, if present

Reviewed-by: Ricardo Marques <rimarques@suse.com>
5 years agoMerge PR #33808 into master
Sage Weil [Mon, 9 Mar 2020 13:28:37 +0000 (08:28 -0500)]
Merge PR #33808 into master

* refs/pull/33808/head:
mgr/cephadm: apply: fill in default placement if none is provided
mgr/cephadm: make placement truly optional (default to count=1)
mgr/cephadm: allow count == 0
mgr/cephadm: remove magic labels

Reviewed-by: Sebastian Wagner <swagner@suse.com>
5 years agoMerge pull request #33756 from tspmelo/wip-remove-ngx-store
Lenz Grimmer [Mon, 9 Mar 2020 11:30:50 +0000 (12:30 +0100)]
Merge pull request #33756 from tspmelo/wip-remove-ngx-store

mgr/dashboard: Remove ngx-store

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
5 years agoMerge pull request #33691 from rhcs-dashboard/vstart-enable-nfs-ganesa-mgmt-dashboard
Kefu Chai [Mon, 9 Mar 2020 11:25:05 +0000 (19:25 +0800)]
Merge pull request #33691 from rhcs-dashboard/vstart-enable-nfs-ganesa-mgmt-dashboard

vstart.sh: enable nfs-ganesha mgmt. in dashboard.

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Varsha Rao <varao@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge pull request #33797 from tchaikov/wip-crimson-cleanups
Kefu Chai [Mon, 9 Mar 2020 10:22:29 +0000 (18:22 +0800)]
Merge pull request #33797 from tchaikov/wip-crimson-cleanups

crimson: cleanups

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
5 years agomgr/cephadm: add _remove_osds_bg back to main loop 33713/head
Kiefer Chang [Mon, 9 Mar 2020 06:30:48 +0000 (14:30 +0800)]
mgr/cephadm: add _remove_osds_bg back to main loop

The call was accidentally removed in
https://github.com/ceph/ceph/pull/33602.

Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
5 years agomgr/cephadm/osd: update removal report immediately
Kiefer Chang [Tue, 3 Mar 2020 08:58:40 +0000 (16:58 +0800)]
mgr/cephadm/osd: update removal report immediately

Currently, the removal report is updated after entering the main serve()
loop. This tiny window makes clients (like Dashboard) fail to poll the
result. Refresh the report data immediately after scheduling OSD for
removal.

The report structure is changed from Dict to Set because:
- The key is `OSDRemoval` instance, which make it hard to use.
- More consistent with orchestrator interfaces: most calls return a list.

Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
5 years agoMerge pull request #32440 from rosinL/wip-spdk
Kefu Chai [Mon, 9 Mar 2020 03:54:58 +0000 (11:54 +0800)]
Merge pull request #32440 from rosinL/wip-spdk

os/bluestore/spdk: Fix the overflow error of parsing spdk coremask

Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agocrimson/mgr: close() in background 33797/head
Kefu Chai [Mon, 9 Mar 2020 03:48:07 +0000 (11:48 +0800)]
crimson/mgr: close() in background

as per Yingxin,

application code is not required to wait for the `close()` future, it
would be safe to ignore it, because:
- `close()` will shutdown its socket synchronously;
- `close()` will create an internal `ConnectionRef` when it's closing;
- `Messenger` will wait for all connections closed during `shutdown()`;

Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agocommon/buffer.cc: silence ASan warning
Kefu Chai [Sat, 7 Mar 2020 11:36:52 +0000 (19:36 +0800)]
common/buffer.cc: silence ASan warning

silences following warning
```
../src/common/buffer.cc:472:9: runtime error: member access within null pointer of type 'struct raw'
```

Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agocrimson/osd: reorder includes
Kefu Chai [Sat, 7 Mar 2020 10:55:22 +0000 (18:55 +0800)]
crimson/osd: reorder includes

to follow
https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes

Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agocrimson/mgr: use periodical timer for report
Kefu Chai [Sat, 7 Mar 2020 10:53:07 +0000 (18:53 +0800)]
crimson/mgr: use periodical timer for report

* always rearm the timer when handling MMgrConfigure
* remove `mgr::Client::tick_period`

Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge PR #33776 into master
Sage Weil [Mon, 9 Mar 2020 03:05:52 +0000 (22:05 -0500)]
Merge PR #33776 into master

* refs/pull/33776/head:
test: Add flush_pg_stats to avoid race with getting num_shards_repaired

Reviewed-by: Neha Ojha <nojha@redhat.com>
5 years agoqa/tasks/ceph_manager: use StringIO for capturing COT output 33805/head
Kefu Chai [Sun, 8 Mar 2020 06:00:53 +0000 (14:00 +0800)]
qa/tasks/ceph_manager: use StringIO for capturing COT output

there are couple factors we should consider when choosing between
BytesIO and StringIO:

- if the producer is producing binary
- if we are expecting binary
- if the layers in between them are doing the decoding/encoding
  automatically.

in our case, the producer is either the ChannelFile instances returned
by paramiko.SSHClient or subprocess.CompletedProcess insances returned
by subprocess.run(). the former are file-like objects opened in "r" mode,
but their contents are decoded with utf-8 when reading if
ChannelFile.FLAG_BINARY is not specified. that's why we always try to
add this flag in orchestra/run.py when collecting the stdout and stderr
from paramiko.SSHClient after executing a command.

back in python2, this works just fine. as we don't differentiate bytes
from str by then.

but in python3, we have to make a decision. in the case of
ceph-objectstore-tool (COT for short), it does not produce binary and
we don't check its output with binary, so, if neither Remote.run() nor
LocalRemote.run() decodes/encodes for us, it's fine.

so it boils down to `copy_to_log()`:

i think we we should respect the consumer's expectation, and only decode
the output if a StringIO is passed in as stdout or stderr.

as we always log the output with logging we could either set
`ChannelFile.FLAG_BINARY` depending on the type of `capture` or not.
if it's not set, paramiko will return str (bytes) on python2, and str on
python3. if it's not set paramiko will return str (bytes) on python2,
and bytes on python3.

if there is non-ASCII in the output, logging will bail fail with
`UnicodeDecodeError` exception. and paramiko throws the same exception
when trying to decode for us if `ChannelFile.FLAG_BINARY` is not
specified.

so to ensure that we always have logging messages no matter if the
producer follows the rule of "use StringIO if you only emit text" or
not, we have to use `ChannelFile.FLAG_BINARY`, and force paramiko
to send us the bytes. but we still have the luxury to use StringIO
and do the decode when the caller asks for str explicitly. that'd save
the pain of using `str.decode()` or `six.ensure_str()` everywhere
even if we can assure that the program does not write binary.

Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agomgr/cephadm: apply: fill in default placement if none is provided 33808/head
Sage Weil [Mon, 9 Mar 2020 02:17:47 +0000 (21:17 -0500)]
mgr/cephadm: apply: fill in default placement if none is provided

Most stateless daemons get 2x (so there is a standby).  Monitoring items
get just 1x.

By default we do 5 monitors, which will gracefully degrade to one per host
if the cluster has <5 hosts.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoos/bluestore/spdk: Fix the overflow error of parsing spdk coremask 32440/head
Chunsong Feng [Thu, 19 Dec 2019 09:32:09 +0000 (17:32 +0800)]
os/bluestore/spdk: Fix the overflow error of parsing spdk coremask

coremask supports up to 256 bits in DPDK19.05, but the use of stoll in
NVMEManager::try_get limits the maximum use to 64 bits. Parse coremask by
hex character from low to high.

Fixes: https://tracker.ceph.com/issues/43044
Signed-off-by: Hu Ye <yehu5@huawei.com>
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Signed-off-by: luo rixin <luorixin@huawei.com>
5 years agoMerge PR #33804 into master
Sage Weil [Mon, 9 Mar 2020 00:57:06 +0000 (19:57 -0500)]
Merge PR #33804 into master

* refs/pull/33804/head:
cephadm: ls: warn if daemon type (version) is not supported
cephadm: report grafana version
cephadm: report prometheus, node-exporter, alertmanager versions
cephadm: use None (not '<no value>') for monitoring daemon version

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoMerge PR #33792 into master
Sage Weil [Sun, 8 Mar 2020 22:29:00 +0000 (17:29 -0500)]
Merge PR #33792 into master

* refs/pull/33792/head:
doc/cephadm: fix formatting for osd section
doc/cephadm: update 'adding mons' section to suggest/prefer 'apply'
doc/cephadm: fix formatting, typos
mgr/cephadm: implement apply_mon
mgr/cephadm: allow mon creation without explicit ip or addr
mgr/cephadm: allow _apply_service to delete mon daemon's data
mgr/cephadm: remove mon from monmap before removing mon
mgr/cephadm: do not remove mon if it breaks quorum

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agocephadm: ls: warn if daemon type (version) is not supported 33804/head
Sage Weil [Sun, 8 Mar 2020 22:24:27 +0000 (17:24 -0500)]
cephadm: ls: warn if daemon type (version) is not supported

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agocephadm: report grafana version
Sage Weil [Sat, 7 Mar 2020 22:49:44 +0000 (16:49 -0600)]
cephadm: report grafana version

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33802 into master
Sage Weil [Sun, 8 Mar 2020 21:49:38 +0000 (16:49 -0500)]
Merge PR #33802 into master

* refs/pull/33802/head:
mgr/cephadm: sanity check upgrade version
mgr/cephadm: only need to invalidate once here
mgr/cephadm: upgrade requires root mode for now

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoMerge PR #33800 into master
Sage Weil [Sun, 8 Mar 2020 21:38:28 +0000 (16:38 -0500)]
Merge PR #33800 into master

* refs/pull/33800/head:
mgr/cephadm: fix prom config generation when hosts have no labels or addrs

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoMerge PR #33795 into master
Sage Weil [Sun, 8 Mar 2020 21:38:15 +0000 (16:38 -0500)]
Merge PR #33795 into master

* refs/pull/33795/head:
mgr/orch: collapse 'SPEC' and 'PLACEMENT' columns in 'orch ls'

Reviewed-by: Michael Fritch <mfritch@suse.com>
5 years agoqa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds 33809/head
Sage Weil [Sun, 8 Mar 2020 19:52:10 +0000 (14:52 -0500)]
qa/standalone/scrub/osd-scrub-repair: force osdmap prop to osds

flush_pg_stats isn't sufficient to ensure that OSDs have the latest
OSDMap.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoqa/standalone/scrub/osd-scrub-test: wait longer for update
Sage Weil [Sun, 8 Mar 2020 19:45:00 +0000 (14:45 -0500)]
qa/standalone/scrub/osd-scrub-test: wait longer for update

Fixes: https://tracker.ceph.com/issues/43865
Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge pull request #33788 from ajarr/wip-44438
Ramana Raja [Sun, 8 Mar 2020 17:36:50 +0000 (23:06 +0530)]
Merge pull request #33788 from ajarr/wip-44438

test_volumes: fix _verify_clone_attrs call

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agomgr/cephadm: make placement truly optional (default to count=1)
Sage Weil [Sun, 8 Mar 2020 17:05:47 +0000 (12:05 -0500)]
mgr/cephadm: make placement truly optional (default to count=1)

If no placement information is provided at all, assume 1 daemon over any
host.

This could perhaps be improved with a default placement that varies by
daemon type...

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: allow count == 0
Sage Weil [Sun, 8 Mar 2020 17:01:09 +0000 (12:01 -0500)]
mgr/cephadm: allow count == 0

Scale a service down to 0 without removing the spec.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: remove magic labels
Sage Weil [Sun, 8 Mar 2020 17:00:45 +0000 (12:00 -0500)]
mgr/cephadm: remove magic labels

Remove the magic label behavior.  It makes the code confusing, it
makes the overall behavior hard to explain, and it makes the PlacementSpec
meaning different than what Rook is doing.

Instead, if you want mons on hosts with label 'mon', then say 'label:mon'.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge pull request #33686 from yuvalif/fix_data_corruption_in_cls_queue_head
Yuval Lifshitz [Sun, 8 Mar 2020 11:17:02 +0000 (13:17 +0200)]
Merge pull request #33686 from yuvalif/fix_data_corruption_in_cls_queue_head

cls/queue: fix data corruption in urgent data

5 years agoqa/tasks/ceph_manager: capture stderr for COT
Kefu Chai [Sun, 8 Mar 2020 05:39:59 +0000 (13:39 +0800)]
qa/tasks/ceph_manager: capture stderr for COT

as we are expecting the error message written to stderr, and we need to
check for the error messages in it.

this change addresses the regression introduced by
204ceee156cbb8a20bdf56efb0cd0610ee4c107e

Fixes: https://tracker.ceph.com/issues/44500
Signed-off-by: Kefu Chai <kchai@redhat.com>
5 years agomgr/cephadm: fix prom config generation when hosts have no labels or addrs 33800/head
Sage Weil [Sat, 7 Mar 2020 14:09:30 +0000 (08:09 -0600)]
mgr/cephadm: fix prom config generation when hosts have no labels or addrs

The inventory for a host might be {}, which evaluates as false.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33742 into master
Sage Weil [Sun, 8 Mar 2020 02:18:23 +0000 (20:18 -0600)]
Merge PR #33742 into master

* refs/pull/33742/head:
msg/async: s/nowait/always_async/ in EventCenter::submit_to().
msg/async: perform recv reset immediately if called inside EC.

Reviewed-by: Sage Weil <sage@redhat.com>
5 years agocephadm: report prometheus, node-exporter, alertmanager versions
Sage Weil [Sat, 7 Mar 2020 22:43:14 +0000 (16:43 -0600)]
cephadm: report prometheus, node-exporter, alertmanager versions

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agocephadm: use None (not '<no value>') for monitoring daemon version
Sage Weil [Sat, 7 Mar 2020 22:34:08 +0000 (16:34 -0600)]
cephadm: use None (not '<no value>') for monitoring daemon version

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoqa/suites/rados/cephadm/upgrade: new start point 33793/head
Sage Weil [Sat, 7 Mar 2020 21:18:04 +0000 (15:18 -0600)]
qa/suites/rados/cephadm/upgrade: new start point

The starting cephadm needs to look for default ceph.conf etc in /etc/ceph
for cephadm.py to be happy.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoqa/tasks/cephadm: put bootstrap config etc directly in /etc/ceph
Sage Weil [Fri, 6 Mar 2020 21:26:20 +0000 (15:26 -0600)]
qa/tasks/cephadm: put bootstrap config etc directly in /etc/ceph

This puts the conf and keyring in /etc/ceph earlier rather than later,
making them useful for debugging a live system *during* bootstrap.  It's
also less code.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agocephadm: shell: default to config and keyring in /etc/ceph, if present
Sage Weil [Fri, 6 Mar 2020 21:20:24 +0000 (15:20 -0600)]
cephadm: shell: default to config and keyring in /etc/ceph, if present

This just makes things painless for humans: they can usually run
'cephadm shell' and have a working environment.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agoMerge PR #33706 into master
Sage Weil [Sat, 7 Mar 2020 19:45:16 +0000 (13:45 -0600)]
Merge PR #33706 into master

* refs/pull/33706/head:
qa/suites/rados/cephadm/upgrade: adjust starting version
mgr/orch: from_strings -> from_string; do not accept a list
mgr/volumes: pass placement as string, not list
qa/tasks/mgr/test_orchestrator_cli: adjust placement args
qa/tasks/cephadm: pass apply placement as a single arg
mgr/orch: PlacementSpec: allow 'count:123'
mgr/orch: PlacementSpec: may pretty_str() match input
mgr/orch: take single placement argument
mgr/orch: PlacementSpec.from_strings: take a string *or* a list

Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge pull request #33625 from sebastian-philipp/python-common-drive-groups-and
Kefu Chai [Sat, 7 Mar 2020 17:47:42 +0000 (01:47 +0800)]
Merge pull request #33625 from sebastian-philipp/python-common-drive-groups-and

python-common: Make Drive Group filter by AND, instead of OR

Reviewed-by: Joshua Schmid <jschmid@suse.de>
5 years agodoc/cephadm: fix formatting for osd section 33792/head
Sage Weil [Sat, 7 Mar 2020 17:22:47 +0000 (11:22 -0600)]
doc/cephadm: fix formatting for osd section

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agodoc/cephadm: update 'adding mons' section to suggest/prefer 'apply'
Sage Weil [Sat, 7 Mar 2020 15:14:43 +0000 (09:14 -0600)]
doc/cephadm: update 'adding mons' section to suggest/prefer 'apply'

It's nicer for users to specify the cluster/mon subnet once and let
cephadm scale mons.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agodoc/cephadm: fix formatting, typos
Sage Weil [Sat, 7 Mar 2020 15:13:23 +0000 (09:13 -0600)]
doc/cephadm: fix formatting, typos

No need for [monitor 1] when accessing the CLI--this can happy from
any node or container that has a working CLI.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: implement apply_mon
Sage Weil [Fri, 6 Mar 2020 20:57:35 +0000 (14:57 -0600)]
mgr/cephadm: implement apply_mon

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: allow mon creation without explicit ip or addr
Sage Weil [Fri, 6 Mar 2020 20:53:22 +0000 (14:53 -0600)]
mgr/cephadm: allow mon creation without explicit ip or addr

Allow mons to be created if the public_network option is defined in the
config database.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: allow _apply_service to delete mon daemon's data
Sage Weil [Fri, 6 Mar 2020 20:02:06 +0000 (14:02 -0600)]
mgr/cephadm: allow _apply_service to delete mon daemon's data

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: remove mon from monmap before removing mon
Sage Weil [Fri, 6 Mar 2020 20:00:42 +0000 (14:00 -0600)]
mgr/cephadm: remove mon from monmap before removing mon

Check for force flag early so we don't update the monmap if the daemon
remove is going to fail anyway.

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: do not remove mon if it breaks quorum
Sage Weil [Fri, 6 Mar 2020 19:57:18 +0000 (13:57 -0600)]
mgr/cephadm: do not remove mon if it breaks quorum

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: sanity check upgrade version 33802/head
Sage Weil [Sat, 7 Mar 2020 16:06:40 +0000 (10:06 -0600)]
mgr/cephadm: sanity check upgrade version

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: only need to invalidate once here
Sage Weil [Sat, 7 Mar 2020 15:55:13 +0000 (09:55 -0600)]
mgr/cephadm: only need to invalidate once here

Signed-off-by: Sage Weil <sage@redhat.com>
5 years agomgr/cephadm: upgrade requires root mode for now
Sage Weil [Sat, 7 Mar 2020 15:55:01 +0000 (09:55 -0600)]
mgr/cephadm: upgrade requires root mode for now

See https://tracker.ceph.com/issues/44429

Signed-off-by: Sage Weil <sage@redhat.com>