Sage Weil [Sun, 8 Mar 2020 17:00:45 +0000 (12:00 -0500)]
mgr/cephadm: remove magic labels
Remove the magic label behavior. It makes the code confusing, it
makes the overall behavior hard to explain, and it makes the PlacementSpec
meaning different than what Rook is doing.
Instead, if you want mons on hosts with label 'mon', then say 'label:mon'.
Sage Weil [Sat, 7 Mar 2020 19:45:16 +0000 (13:45 -0600)]
Merge PR #33706 into master
* refs/pull/33706/head:
qa/suites/rados/cephadm/upgrade: adjust starting version
mgr/orch: from_strings -> from_string; do not accept a list
mgr/volumes: pass placement as string, not list
qa/tasks/mgr/test_orchestrator_cli: adjust placement args
qa/tasks/cephadm: pass apply placement as a single arg
mgr/orch: PlacementSpec: allow 'count:123'
mgr/orch: PlacementSpec: may pretty_str() match input
mgr/orch: take single placement argument
mgr/orch: PlacementSpec.from_strings: take a string *or* a list
Xuehan Xu [Fri, 6 Mar 2020 10:55:07 +0000 (18:55 +0800)]
crimson: decouple mgr client reconnect and connect reset handling
As of now, the following invocation sequence triggers deadlock when
closing crimson-osd's connection with mgr:
ProtocolV2::dispatch_reset() --> crimson::mgr::Client::ms_handle_reset
--> crimson::mgr::Client::reconnect --> crimson::net::SocketConnection::close
--> crimson::net::Protocol::close()
In the above invocation sequence, ProtocalV2::dispatch_reset() enters the gate
"pending_dispatch" the leaving of which would wait for the complete of crimson::\
net::Protocal::close() which further wait for the complete of the gate's close().
Sage Weil [Tue, 3 Mar 2020 21:39:50 +0000 (15:39 -0600)]
mgr/orch: take single placement argument
This is maybe a wash on the 'ceph orch ...' portion of the CLI. However,
it means that elsewhere, like 'ceph fs volume ...', we can be consistent
and have placement be (1) optional and (2) a single arg so that it is
easier to use both positionally and as a flag (--placement=all:true).
Sage Weil [Sat, 7 Mar 2020 03:19:49 +0000 (21:19 -0600)]
Merge PR #33700 into master
* refs/pull/33700/head:
mgr/cephadm: point dashboard at grafana automatically
doc/cephadm/monitoring: document process to set up monitoring with cephadm
Reviewed-by: Alexandra Settle <asettle@suse.com> Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
Sage Weil [Fri, 6 Mar 2020 17:26:47 +0000 (11:26 -0600)]
Merge PR #33614 into master
* refs/pull/33614/head:
mgr/cephadm: enable custom TLS certificates for grafana
mgr: enable verification of TLS certs without files
mgr/cephadm: dump config to JSON only once when creating daemons
Kefu Chai [Fri, 6 Mar 2020 04:17:40 +0000 (12:17 +0800)]
qa/tasks/ceph.py: quote "<kind>" in command line
otherwise bash will intepret "kind" as a file when handling command like
```
sudo zgrep <kind> /var/log/ceph/valgrind/* /dev/null | sort | uniq
```
and try to feed its content to zgrep, and write the output of zgrep
to /var/log/ceph/valgrind/*. this is not the intended behavior. what we
what to do is to pass "<kind>" as an argument to zgrep, along with
the globbed files names which matches "/var/log/ceph/valgrind/*".
Sage Weil [Fri, 6 Mar 2020 03:24:53 +0000 (21:24 -0600)]
mgr/cephadm: do not specify --image arg for non-ceph daemons; fix upgrade
If we are calling the cephadm script for a non-ceph daemon (prometheus,
etc), do not specify the --image argument, and do not pull it out of
the config db from sections that don't exist.
Sage Weil [Thu, 5 Mar 2020 16:42:26 +0000 (10:42 -0600)]
mgr/cephadm: make osd create on an existing LV idempotent
If we try to prepare an LV that was already prepared, ceph-volume will
return an error message and code. We want our osd create command to be
idempotent, though, so recognize the error string and continue.
This is an ugly hack, but quicker than changing ceph-volume behavior, and
it is sufficient to stop all of the teuthology failures.
The second part of this is that we have to deploy the daemon on OSDs that
are already prepared and already exist in our osdmap beforehand, but have
never started.
Works-around: https://tracker.ceph.com/issues/44313 Signed-off-by: Sage Weil <sage@redhat.com>
volumes/fs/async_cloner.py: note: In function "handle_clone_pending":
volumes/fs/async_cloner.py:71: error: "OpSmException" has no attribute "error"; maybe "errno"?
volumes/fs/async_cloner.py: note: In function "handle_clone_in_progress":
volumes/fs/async_cloner.py:139: error: "OpSmException" has no attribute "error"; maybe "errno"?
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:22:48 +0000 (08:22 -0700)]
mgr/volumes: remove unneeded assignment to `NoneType`
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "get_subvolume_object" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:70: error: Incompatible types in assignment (expression has type "None", variable has type "SubvolumeBase")
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:59 +0000 (08:21 -0700)]
mgr/volumes: add missing OpSmException import
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "upgrade_legacy_subvolume" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:56: error: Name 'OpSmException' is not defined
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:54 +0000 (08:21 -0700)]
mgr/volumes: add missing error code
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "_load_supported_versions" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:35: error: Too few arguments for "VolumeException"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:36 +0000 (08:21 -0700)]
mgr/volumes: fixup format string args
fixes mypy errors:
volumes/fs/purge_queue.py:26: error: Cannot find replacement for positional format specifier 1
volumes/fs/async_cloner.py:37: error: Cannot find replacement for positional format specifier 1
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:30 +0000 (08:21 -0700)]
mgr/volumes: add missing `mgr` param
fixes mypy errors:
volumes/fs/operations/volume.py: note: In function "create_volume":
volumes/fs/operations/volume.py:216: error: Too few arguments for "remove_pool"
volumes/fs/operations/volume.py:223: error: Too few arguments for "remove_pool"
volumes/fs/operations/volume.py:224: error: Too few arguments for "remove_pool"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:25 +0000 (08:21 -0700)]
mgr/volumes: assert self.fs
fixes mypy errors:
volumes/fs/operations/volume.py: note: In member "disconnect" of class "Connection":
volumes/fs/operations/volume.py:94: error: Item "None" of "Optional[Any]" has no attribute "get_addrs"
volumes/fs/operations/volume.py:95: error: Item "None" of "Optional[Any]" has no attribute "shutdown"
volumes/fs/operations/volume.py: note: In member "abort" of class "Connection":
volumes/fs/operations/volume.py:105: error: Item "None" of "Optional[Any]" has no attribute "abort_conn"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:21:18 +0000 (08:21 -0700)]
mgr/volumes: skip type checking on RTimer class
Unclear why mypy does not like this:
volumes/fs/operations/volume.py: note: In member "run" of class "RTimer":
volumes/fs/operations/volume.py:118: error: "RTimer" has no attribute "finished"
volumes/fs/operations/volume.py:119: error: "RTimer" has no attribute "finished"
volumes/fs/operations/volume.py:119: error: "RTimer" has no attribute "interval"
volumes/fs/operations/volume.py:120: error: "RTimer" has no attribute "function"
volumes/fs/operations/volume.py:120: error: "RTimer" has no attribute "args"
volumes/fs/operations/volume.py:120: error: "RTimer" has no attribute "kwargs"
volumes/fs/operations/volume.py:121: error: "RTimer" has no attribute "finished"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:20:01 +0000 (08:20 -0700)]
mgr/volumes: fix positional str formatting
fixes mypy error:
volumes/fs/operations/group.py: note: In function "create_group":
volumes/fs/operations/group.py:135: error: Not all arguments converted during string formatting
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:19:53 +0000 (08:19 -0700)]
mgr/volumes: place getters and setters next to each other
workaround for mypy issue:
https://github.com/python/mypy/issues/1465
fixes mypy errors:
volumes/fs/operations/group.py: note: In class "Group":
volumes/fs/operations/group.py:44: error: Name 'uid' already defined on line 36
volumes/fs/operations/group.py:44: error: "Callable[[Group], Any]" has no attribute "setter"
volumes/fs/operations/group.py:48: error: Name 'gid' already defined on line 40
volumes/fs/operations/group.py:48: error: "Callable[[Group], Any]" has no attribute "setter"
volumes/fs/operations/group.py: note: In function "open_group":
volumes/fs/operations/group.py:170: error: Property "uid" defined in "Group" is read-only
volumes/fs/operations/group.py:171: error: Property "gid" defined in "Group" is read-only
volumes/fs/operations/versions/subvolume_base.py: note: In class "SubvolumeBase":
volumes/fs/operations/versions/subvolume_base.py:45: error: Name 'uid' already defined on line 33
volumes/fs/operations/versions/subvolume_base.py:45: error: "Callable[[SubvolumeBase], Any]" has no attribute "setter"
volumes/fs/operations/versions/subvolume_base.py:49: error: Name 'gid' already defined on line 37
volumes/fs/operations/versions/subvolume_base.py:49: error: "Callable[[SubvolumeBase], Any]" has no attribute "setter"
volumes/fs/operations/versions/subvolume_base.py:53: error: Name 'mode' already defined on line 41
volumes/fs/operations/versions/subvolume_base.py:53: error: "Callable[[SubvolumeBase], Any]" has no attribute "setter"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:18:43 +0000 (08:18 -0700)]
mgr/volumes: reverse params passed to `isinstace()`
fixes mypy error:
volumes/fs/operations/clone_index.py: note: In member "track" of class "CloneIndex":
volumes/fs/operations/clone_index.py:38: error: Argument 2 to "isinstance" has incompatible type "Union[VolumeException, Any]"; expected "Union[type, Tuple[Union[type, Tuple[Any, ...]], ...]]"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:16:03 +0000 (08:16 -0700)]
mgr/volumes: import VolumeException
fixes mypy errors:
volumes/fs/operations/index.py: note: In member "track" of class "Index":
volumes/fs/operations/index.py:19: error: Name 'VolumeException' is not defined
volumes/fs/operations/index.py: note: In member "untrack" of class "Index":
volumes/fs/operations/index.py:22: error: Name 'VolumeException' is not defined
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:15:43 +0000 (08:15 -0700)]
mgr/volumes: add `Dict` type
fixes mypy errors:
volumes/fs/operations/op_sm.py:39: error: "object" has no attribute "get"
volumes/fs/operations/op_sm.py:49: error: "object" has no attribute "get"
volumes/fs/operations/lock.py: note: In member "__init__" of class "GlobalLock":
volumes/fs/operations/lock.py:27: error: "object" has no attribute "__enter__"
volumes/fs/operations/lock.py:27: error: "object" has no attribute "__exit__"
volumes/fs/operations/lock.py: note: In member "lock_op" of class "GlobalLock":
volumes/fs/operations/lock.py:35: error: "object" has no attribute "__enter__"
volumes/fs/operations/lock.py:35: error: "object" has no attribute "__exit__"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
Michael Fritch [Tue, 3 Mar 2020 15:15:37 +0000 (08:15 -0700)]
mgr/volumes: import errno
fixes mypy errors:
volumes/fs/operations/op_sm.py:36: error: Name 'errno' is not defined
volumes/fs/operations/op_sm.py:39: error: Name 'errno' is not defined
volumes/fs/operations/op_sm.py:46: error: Name 'errno' is not defined
volumes/fs/operations/op_sm.py:49: error: Name 'errno' is not defined
volumes/fs/operations/template.py:5: error: Name 'errno' is not defined
volumes/fs/operations/template.py:14: error: Name 'errno' is not defined
volumes/fs/operations/template.py:23: error: Name 'errno' is not defined
volumes/fs/operations/template.py:32: error: Name 'errno' is not defined
volumes/fs/operations/template.py:42: error: Name 'errno' is not defined
volumes/fs/operations/template.py:45: error: Name 'errno' is not defined
volumes/fs/operations/template.py:62: error: Name 'errno' is not defined
volumes/fs/operations/template.py:74: error: Name 'errno' is not defined
volumes/fs/operations/template.py:85: error: Name 'errno' is not defined
volumes/fs/operations/template.py:94: error: Name 'errno' is not defined
volumes/fs/operations/template.py:103: error: Name 'errno' is not defined
volumes/fs/operations/template.py:112: error: Name 'errno' is not defined
volumes/fs/operations/template.py:121: error: Name 'errno' is not defined
volumes/fs/operations/template.py:130: error: Name 'errno' is not defined
volumes/fs/operations/template.py:139: error: Name 'errno' is not defined
volumes/fs/operations/template.py:148: error: Name 'errno' is not defined
volumes/fs/operations/template.py:158: error: Name 'errno' is not defined
volumes/fs/operations/template.py:169: error: Name 'errno' is not defined
volumes/fs/operations/template.py:180: error: Name 'errno' is not defined
volumes/fs/operations/index.py:18: error: Name 'errno' is not defined
volumes/fs/operations/index.py:21: error: Name 'errno' is not defined
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
rgw: cls_bucket_list_(un)ordered should clear results collection
Each call to cls_bucket_list_(un)ordered should have an empty
collection to populate with results. Rather than rely on the caller to
insure this, it's more reliable to have these functions do the clear.
Additionally in some cases, a reserve call was added to the collection
to pre-allocate the space needed for the expected number of
results. This will potentially result in fewer re-allocations plus
copies.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Casey Bodley [Thu, 5 Mar 2020 16:32:43 +0000 (11:32 -0500)]
rgw: fix string_view formatting in RGWFormatter_Plain
two string_views were being passed directly to vsnprintf where it
expected null-terminated strings. the compiler didn't catch this, and
resulted in segfaults
string_views aren't guaranteed to be null-terminated, so printf formats
have to specify a length as well
Kefu Chai [Thu, 5 Mar 2020 15:42:13 +0000 (23:42 +0800)]
mgr: update metadata if an osd just joins
instead of using "front_address" for checking if we have a new OSD
reusing existing a known identify shows up in the osdmap, it'd be
simpler to compare the up_from epoch with the osdmap's epoch. as
objecter will subscribe **every** osdmap after mgr boots. so mgr
should be able to see the osdmap when the osd joins the cluster
where the up_from epoch is identical to osdmap's epoch.
this way is simpler than existing approach. but it will involve
more overhead if osd reboots frequently without changing their
metadata.
before this change, the metadata is requested/updated only if
the public (front) address is changed.
after this change, the metadata is requested/updated whenever
an osd reboots.
Kefu Chai [Thu, 5 Mar 2020 15:35:46 +0000 (23:35 +0800)]
mgr: update "hostname" when we already have the daemon state from the same entity
there is chance that we reuse the identity of a daemon and deploy the
daemon on a different host. in that case, the existing daemon state
should be updated with the new hostname.
Yaarit Hatuka [Thu, 5 Mar 2020 14:42:46 +0000 (09:42 -0500)]
mgr/telemetry: force --license when sending while opted-out
Users can manually send telemetry data with 'ceph telemetry send', even
in case they did not opt-in and agree to the license. We ask to
explicitly add '--license' to 'ceph telemetry send' in this case.
This also fixes an issue when opting-out ('ceph telemetry off'), where the
revision was not reset.
Sage Weil [Thu, 5 Mar 2020 14:05:49 +0000 (08:05 -0600)]
Merge PR #33728 into master
* refs/pull/33728/head:
mgr/orch: factor out nice_delta
mgr/orch: show spec age in 'orch ls'
mgr/cephadm: store timestamp with specs
mgr/orch: add created timestamp to ServiceDescription
mgr/orch: include uptime in 'orch ps'
mgr/orch: include AGE column in 'orch ps'
mgr/cephadm: populate new DaemonDescription timestamps
mgr/orch: add new timestamps in DaemonDescription
cephadm: include timestamps for configured, created
cephadm: include timestamps for started, deploy
Reviewed-by: Joshua Schmid <jschmid@suse.de> Reviewed-by: Michael Fritch <mfritch@suse.com>
The ps output names daemons like 'type.foo', e.g., 'mgr.x'. Now that
the test_orchestrator impl is less bonkers this needs to be adjusted to
match reality.
If a daemon isn't running, we don't know the image_id (hash), so we skip.
But it's also possible to have a running daemon that doesn't report an
image_id... like right after we deploy it when the container hasn't
started up yet. Skip those too.