dashboard: Resolve FQDN / hostname mismatch in hosts overview panel
In the AVG Disk Utilization panel, the result is calculated
by combining the output of node_disk_io_time_seconds_total
with the output of ceph_disk_occupation. However, the
first vector encodes the instance label with the full FQDN
while the ceph label only contains the hostname:port. In
order for these to match correctly, the domain name and port
has to be stripped from the labels.
When moving to LVM-based ceph-volume setups, several
grafana dashboards stopped working. The problem is that
(device, instance) no longer results in unique labels
which causes errors like:
"many-to-many matching not allowed: matching labels must be unique on one side"
The references to `$osd_hosts` etc. were encoded as
`[[osd_hosts]]` in the PromQL expression divisor, and
the panel always displayed N/A as the result of the
query.
Replacing the `[[...]]` with `$...` makes the expression
work again.
because in teuthology we are using six.ensure_str, which was added in
six 1.12.0, see https://github.com/benjaminp/six/blob/1.12.0/CHANGES ,
we cannot continue using six 1.11.0, as a result, we need switch over to
six>1.12.0. since the latest stable version of six is now 1.14.0, let's
just use it.
Brad Hubbard [Fri, 1 Nov 2019 01:08:36 +0000 (11:08 +1000)]
tools/rados: Unmask '-o' to restore original behaviour
0b369e1aff1 masked the original behaviour of '-o' which was to indicate
'outfile' as documented in the man page. Changing object-size to capital
o will restore the original behaviour.
osd/OSDMap: Show health warning if a pool is configured with size 1
Introduce a config option called 'mon_warn_on_pool_no_redundancy' that is
used to show a health warning if any pool in the ceph cluster is
configured with a size of 1. The user can mute/unmute the warning using
'ceph health mute/unmute POOL_NO_REDUNDANCY'.
Add standalone test to verify warning on setting pool size=1. Set the
associated warning to 'false' in ceph.conf.template under qa/tasks so
that existing tests do not break.
Conflicts:
PendingReleaseNotes
- Added release notes under 14.2.9
qa/standalone/mon/health-mute.sh
- Deleted the script as 'health mute/unmute' cmd is unavailable in nautilus
qa/tasks/ceph.conf.template
- Removed a flag not available in nautilus
src/common/options.cc
- Removed a flag not available in nautilus
src/osd/OSDMap.cc
mgr/DaemonServer.{h,cc} deals with raw pointers while master uses ref_t<>
cast -- adjust to that. a minor conflict in the header and the metrics
templatization is not backported to nautilus. also, DaemonKey is a std::pair
in nautilus but a struct in master -- that requires a change in referencing
daemon type and name.
Venky Shankar [Sat, 8 Feb 2020 09:36:42 +0000 (04:36 -0500)]
mgr: helper function to check if a service is a normal ceph service
This would be widely required since ceph metadata server entries are
maintained in service map (DaemonServer::pending_service_map). Such
normal ceph services would need to filtered when processing the service
map to avoid extraneous entries getting processed.
This commit undoes the service daemon registration for the MDS. It doesn't look
absolutely necessary and it causes the MDS to be listed twice in the `ceph
versions` output:
Fixing that requires looking for duplicates or ignoring MDSs in the
service daemons when the mon processes `ceph versions`. I have a feeling
that it wasn't actually designed to be used by the MDS this way however.
Additionally, the reason for "unknown" version is because the metadata
sent to the mgr does not include "ceph_version".
- Make explicit the check for getting removed from the MDSMap. This was
only done before by checking if MDS held a rank which does not check the
case where a standby is removed from the FSMap.
- Use mds_info_t::dump to simplify various debug output.
- Add a few sanity asserts for invalid state transitions.
mgr, mon: allow normal ceph services to register with manager
Additionally, introduce `task status` field in manager report
messages to forward status of executing tasks in daemons (e.g.,
status of executing scrubs in ceph metadata servers).
`task status` makes its way upto service map which is then used
to display the relevant information in ceph status.
"The default values are handled by mgr_module.py's _get_module_option();
the or here means that we break any non-true (0, false, none) value and
override it with the default."
Alfonso Martínez [Tue, 24 Mar 2020 08:34:55 +0000 (09:34 +0100)]
mgr/dashboard: fix error when enabling SSO with cert. file
Nautilus dedicated fix: added py2 compatibility code.
Also:
* Disabled security setting 'wantNameIdEncrypted': not all Identity Providers support this and we are already requiring encrypted assertions (which is the default).
Fixes: https://tracker.ceph.com/issues/44666 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Sage Weil [Tue, 21 Jan 2020 16:43:04 +0000 (10:43 -0600)]
pybind/mgr/*: fix config_notify handling of default values
The default values are handled by mgr_module.py's _get_module_option();
the or here means that we break any non-true (0, false, none) value and
override it with the default.
Conflicts:
src/pybind/mgr/cephadm/module.py
- nautilus has no "cephadm" module. It does have an "orchestrator_cli"
module but it doesn't contain the code being patched
src/pybind/mgr/hello/module.py
- nautilus has a "hello" module, but it doesn't contain the code being
patched
Since the codebase is very different and a backport is not recommended or even
possible, I have created this commit with only the minimal code necessary.
Matthew Oliver [Tue, 4 Feb 2020 02:29:48 +0000 (13:29 +1100)]
ceph_argparse: increment matchcnt on kwargs
Currently when you pass a param in on the ceph cli as a kwarg
(--<param_name>) the matchcnt isn't incremented in the validate method
which is used to choose the right command signature.
The '--realm_name' and '--zone_name' isn't counted to the matchcnt, so
'orchestrator rgw rm' isn't picked as the valid command.
This patch simply corrects this by incrementing matchcnt on the kwarg
validate path before calling shortcircuiting the loop.
Fixes: https://tracker.ceph.com/issues/43803 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit cb37c9ee609864a078edf38d98608bd8cc18cbd7)
Conflicts:
test: exclude helper method from nosetest discovery
On nautilus the assertion helper was recognized by nosetest as a test
even though it doens't start with test_ prefix. Explicitely decorate it
with @nottest
Yao Zongyou [Tue, 3 Mar 2020 15:34:26 +0000 (15:34 +0000)]
rgw: clear ent_list for each loop of bucket list
if ent_list is not cleared, the old element will be checked repeatedly
and will occupy more memory. Fixes: http://tracker.ceph.com/issues/44394 Signed-off-by: Yao Zongyou <yaozongyou@vip.qq.com>
(cherry picked from commit f63bf47aa464c345c907c748dfdbbc5a239d8488)
anurag [Wed, 11 Mar 2020 14:17:05 +0000 (19:47 +0530)]
mgr/dashboard: Pool read/write OPS shows too many decimal places Fixes: https://tracker.ceph.com/issues/39714 Signed-off-by: anurag <anurag@localhost.localdomain>
(cherry picked from commit 27a2bbb12614b7aba0561c027346d9b5427f2405) Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/datatable/table/table.component.spec.ts,
src/pybind/mgr/dashboard/frontend/src/app/shared/datatable/table/table-key-value/table-key-value.component.spec.ts:
added import of PipesModule to Angular unit tests
Venky Shankar [Tue, 14 Jan 2020 09:13:16 +0000 (04:13 -0500)]
mgr/volumes: introduce 'canceled' state in clone op state machine
When fetching the next execution state, -EINTR jumps to 'canceled'
state signifying a canceled (interrupted) operation. Also include
a helper routine to check if a given state machine is in initial
state.
volumes/fs/async_cloner.py: note: In function "handle_clone_pending":
volumes/fs/async_cloner.py:71: error: "OpSmException" has no attribute "error"; maybe "errno"?
volumes/fs/async_cloner.py: note: In function "handle_clone_in_progress":
volumes/fs/async_cloner.py:139: error: "OpSmException" has no attribute "error"; maybe "errno"?
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 7fa4da445650fd4a9f799c48ba513cb6f1a0d26c)
Michael Fritch [Tue, 3 Mar 2020 15:22:48 +0000 (08:22 -0700)]
mgr/volumes: remove unneeded assignment to `NoneType`
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "get_subvolume_object" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:70: error: Incompatible types in assignment (expression has type "None", variable has type "SubvolumeBase")
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit a09af07cb69fbca68f08b5740036cf52aa6f3c27)
Michael Fritch [Tue, 3 Mar 2020 15:21:59 +0000 (08:21 -0700)]
mgr/volumes: add missing OpSmException import
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "upgrade_legacy_subvolume" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:56: error: Name 'OpSmException' is not defined
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit d5c1a7cbd5e800950495b4e70a83a8becf2f0abd)
Michael Fritch [Tue, 3 Mar 2020 15:21:54 +0000 (08:21 -0700)]
mgr/volumes: add missing error code
fixes mypy error:
volumes/fs/operations/versions/__init__.py: note: In member "_load_supported_versions" of class "SubvolumeLoader":
volumes/fs/operations/versions/__init__.py:35: error: Too few arguments for "VolumeException"
Fixes: https://tracker.ceph.com/issues/44393 Signed-off-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 22b4cb9405051b6ea17f76dac85a117b1ab34a41)