]> git-server-git.apps.pok.os.sepia.ceph.com Git - cephmetrics.git/log
cephmetrics.git
8 years agoAdd a note about waiting to collect data 136/head
Zack Cerza [Wed, 4 Oct 2017 21:23:41 +0000 (15:23 -0600)]
Add a note about waiting to collect data

Users might initially be confused that immediately after deployment, the
dashboard looks broken. This is because it doesn't yet have the data it
needs to function.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #135 from ceph/wip-sanity
Zack Cerza [Wed, 4 Oct 2017 22:09:36 +0000 (16:09 -0600)]
Merge pull request #135 from ceph/wip-sanity

Check that all hosts are referred to by FQDN

8 years agoSkip ansible-lint on 'systemctl show' 135/head
Zack Cerza [Wed, 4 Oct 2017 21:18:09 +0000 (15:18 -0600)]
Skip ansible-lint on 'systemctl show'

The linter complains about this, but ansible doesn't have the required
functionality here.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoCheck that all hosts are referred to by FQDN
Zack Cerza [Tue, 3 Oct 2017 20:25:14 +0000 (14:25 -0600)]
Check that all hosts are referred to by FQDN

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoSkip linting on sed call
Zack Cerza [Wed, 4 Oct 2017 21:14:26 +0000 (15:14 -0600)]
Skip linting on sed call

lineinfile won't necessarily work well for this.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #132 from ceph/wip-rhsm
Boris Ranto [Tue, 3 Oct 2017 18:26:44 +0000 (20:26 +0200)]
Merge pull request #132 from ceph/wip-rhsm

ansible: Do not enable rhsm repos

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
8 years agoansible: Do not enable rhsm repos 132/head
Boris Ranto [Mon, 2 Oct 2017 09:29:01 +0000 (11:29 +0200)]
ansible: Do not enable rhsm repos

We are shipping as a regular product and as such, we cannot enable
additional repos via rhsm. The customers might even not have these repos
installed (especially the storage console repo might not be available to
them).

If we need any of the packages from these repos, we need to cross-ship
them in our product (as we already do downstream).

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoMerge pull request #129 from ceph/wip-remove-refs-to-prod-repo
Zack Cerza [Fri, 29 Sep 2017 16:17:38 +0000 (10:17 -0600)]
Merge pull request #129 from ceph/wip-remove-refs-to-prod-repo

ansible: Remove references to old prod repo

8 years agoMerge pull request #125 from ceph/wip-collectd-vars
Zack Cerza [Thu, 28 Sep 2017 16:08:53 +0000 (10:08 -0600)]
Merge pull request #125 from ceph/wip-collectd-vars

Fix undefined collectd vars

8 years agoMerge pull request #128 from ceph/wip-prod-repos
Zack Cerza [Thu, 28 Sep 2017 16:08:33 +0000 (10:08 -0600)]
Merge pull request #128 from ceph/wip-prod-repos

Add support for custom repos

8 years agoMerge pull request #126 from ceph/selinux-fix
Boris Ranto [Thu, 28 Sep 2017 07:56:17 +0000 (09:56 +0200)]
Merge pull request #126 from ceph/selinux-fix

selinux policy update

8 years agoselinux: Update the policy 126/head
Boris Ranto [Wed, 27 Sep 2017 11:56:21 +0000 (13:56 +0200)]
selinux: Update the policy

This policy update should cover all the newly found avc denials.

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoansible: Remove references to old prod repo 129/head
Boris Ranto [Wed, 27 Sep 2017 13:31:07 +0000 (15:31 +0200)]
ansible: Remove references to old prod repo

This fixes the build failure after the patch that removed the original
repo file.

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoAdd support for custom repos 128/head
Zack Cerza [Mon, 25 Sep 2017 23:49:58 +0000 (17:49 -0600)]
Add support for custom repos

This is to enable testing packages built from non-master branches with
devel_mode=False

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #124 from ceph/wip-remove-prod-repo
Zack Cerza [Tue, 26 Sep 2017 00:03:32 +0000 (18:03 -0600)]
Merge pull request #124 from ceph/wip-remove-prod-repo

ansible: Remove old production repo

8 years agoFix undefined collectd vars 125/head
Zack Cerza [Fri, 22 Sep 2017 21:32:43 +0000 (15:32 -0600)]
Fix undefined collectd vars

We regressed by moving the definition of a couple of these vars. Set
them in a separate task file to avoid this in the future.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoansible: Remove old production repo 124/head
Boris Ranto [Fri, 22 Sep 2017 09:20:12 +0000 (11:20 +0200)]
ansible: Remove old production repo

We no longer use download.ceph.com as a production repo. We should
be removing instead of installing it.

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoMerge pull request #121 from ceph/fix-rgw
pcuzner [Tue, 19 Sep 2017 00:15:31 +0000 (12:15 +1200)]
Merge pull request #121 from ceph/fix-rgw

rgw: update glob pattern used to detect rgw sockets (issue #115 )

8 years agorgw: update glob pattern used to detect rgw sockets 121/head
Paul Cuzner [Tue, 19 Sep 2017 00:10:26 +0000 (12:10 +1200)]
rgw: update glob pattern used to detect rgw sockets

8 years agoMerge pull request #120 from ceph/wip-poolname
pcuzner [Thu, 14 Sep 2017 23:04:25 +0000 (11:04 +1200)]
Merge pull request #120 from ceph/wip-poolname

Mon._get_df_stats(): Sanitize pool names

8 years agoMon._get_df_stats(): Sanitize pool names 120/head
Zack Cerza [Thu, 14 Sep 2017 22:03:00 +0000 (16:03 -0600)]
Mon._get_df_stats(): Sanitize pool names

_get_pool_stats() already replaces '.' with '_'; let's copy that
behavior

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #114 from ceph/wip-health-flap
pcuzner [Thu, 14 Sep 2017 21:46:34 +0000 (09:46 +1200)]
Merge pull request #114 from ceph/wip-health-flap

alert-status: Use max() for health alert condition

8 years agoMerge pull request #119 from ceph/db-changes
Zack Cerza [Thu, 14 Sep 2017 21:10:38 +0000 (15:10 -0600)]
Merge pull request #119 from ceph/db-changes

dashboard changes

8 years agoat-a-glance: MDS and OSDS panel changes 119/head
Paul Cuzner [Tue, 12 Sep 2017 23:59:04 +0000 (11:59 +1200)]
at-a-glance: MDS and OSDS panel changes

the MDS panel has been changed to report more detail using the status-panel
plugin (replacing singlestat).

The OSDs panel queries have been changed to better calculate the out/down
values of OSDs

8 years agoiscsi-overview: multiple panel fixes
Paul Cuzner [Tue, 12 Sep 2017 23:57:41 +0000 (11:57 +1200)]
iscsi-overview: multiple panel fixes

Values were shown in correctly in environments where the iscsi config had
been dropped and recreated. This update addresses issues in the following
panels; path summary, unused LUNs, defined capacity. In addition the
client charts only show entries for clients with i/o or load > 0.

8 years agonetwork/backend: dashboards updated to use the network interfaces by prefix
Paul Cuzner [Tue, 12 Sep 2017 23:55:20 +0000 (11:55 +1200)]
network/backend: dashboards updated to use the network interfaces by prefix

Before hand the network stats uses *, and therefore included metrics for
'lo' etc, which was observed to skew reporting in some environments.

8 years agoMerge pull request #116 from ceph/wip-graphite-tz
pcuzner [Tue, 12 Sep 2017 21:56:51 +0000 (09:56 +1200)]
Merge pull request #116 from ceph/wip-graphite-tz

Set timezone in Graphite's config

8 years agoMerge pull request #117 from ceph/wip-tb
pcuzner [Tue, 12 Sep 2017 21:54:08 +0000 (09:54 +1200)]
Merge pull request #117 from ceph/wip-tb

Tell collectd to log any Python tracebacks

8 years agoTell collectd to log any Python tracebacks 117/head
Zack Cerza [Thu, 27 Jul 2017 18:48:45 +0000 (11:48 -0700)]
Tell collectd to log any Python tracebacks

So that users can generate more useful bug reports.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoSet timezone in Graphite's config 116/head
Zack Cerza [Thu, 7 Sep 2017 20:52:06 +0000 (14:52 -0600)]
Set timezone in Graphite's config

Oddly, graphite defaults to using a hardcoded timezone rather than the
system's. This throws off queries, so let's configure it.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoalert-status: Use max() for health alert condition 114/head
Zack Cerza [Thu, 7 Sep 2017 20:09:47 +0000 (14:09 -0600)]
alert-status: Use max() for health alert condition

last() appears to invite flapping.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #111 from ceph/wip-health-db
pcuzner [Wed, 6 Sep 2017 20:16:22 +0000 (08:16 +1200)]
Merge pull request #111 from ceph/wip-health-db

Don't forget to deploy the new health dashboard

8 years agoMerge pull request #112 from ceph/wip-df-stats
pcuzner [Wed, 6 Sep 2017 20:15:51 +0000 (08:15 +1200)]
Merge pull request #112 from ceph/wip-df-stats

Collect more pool stats, and display capacity in dashboard

8 years agoAdd pool capacity chart to ceph-cluster.json 112/head
Zack Cerza [Tue, 5 Sep 2017 22:31:45 +0000 (16:31 -0600)]
Add pool capacity chart to ceph-cluster.json

This chart will be helpful if the user wants to inspect pool capacity
over time, as opposed to the current state.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoAdd pool capacity to ceph-pools dashboard
Zack Cerza [Wed, 30 Aug 2017 19:25:29 +0000 (13:25 -0600)]
Add pool capacity to ceph-pools dashboard

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoceph-grafana: Optionally update alert dashboard
Zack Cerza [Fri, 1 Sep 2017 16:16:53 +0000 (10:16 -0600)]
ceph-grafana: Optionally update alert dashboard

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agodashUpdater.py: Allow updating alert dashboard
Zack Cerza [Thu, 31 Aug 2017 16:00:12 +0000 (10:00 -0600)]
dashUpdater.py: Allow updating alert dashboard

This may end up only being useful for testing, but it seems important to
have the option to update the alert dashboard.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoAdd alert for pool capacity
Zack Cerza [Fri, 1 Sep 2017 15:47:43 +0000 (09:47 -0600)]
Add alert for pool capacity

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoceph-grafana: Add dashboards tag
Zack Cerza [Wed, 30 Aug 2017 18:06:08 +0000 (12:06 -0600)]
ceph-grafana: Add dashboards tag

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agomon: Add 'ceph df' pool stats
Zack Cerza [Mon, 28 Aug 2017 21:12:58 +0000 (15:12 -0600)]
mon: Add 'ceph df' pool stats

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #110 from ceph/wip-eventurl
Zack Cerza [Tue, 29 Aug 2017 18:18:47 +0000 (12:18 -0600)]
Merge pull request #110 from ceph/wip-eventurl

Set EventURL in cephmetrics.conf

8 years agoDon't forget to deploy the new health dashboard 111/head
Zack Cerza [Tue, 29 Aug 2017 18:17:53 +0000 (12:17 -0600)]
Don't forget to deploy the new health dashboard

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoSet EventURL in cephmetrics.conf 110/head
Zack Cerza [Fri, 25 Aug 2017 22:37:50 +0000 (16:37 -0600)]
Set EventURL in cephmetrics.conf

So that events get submitted to graphite

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMove some defaults into a cephmetrics-common role
Zack Cerza [Fri, 25 Aug 2017 22:36:41 +0000 (16:36 -0600)]
Move some defaults into a cephmetrics-common role

This is so that ceph-collectd and ceph-grafana can share some defaults

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #108 from ceph/support-svc-restarts
Zack Cerza [Thu, 24 Aug 2017 22:37:28 +0000 (16:37 -0600)]
Merge pull request #108 from ceph/support-svc-restarts

Support svc restarts

8 years agoMerge pull request #109 from ceph/wip-pipelining
Zack Cerza [Thu, 24 Aug 2017 22:36:57 +0000 (16:36 -0600)]
Merge pull request #109 from ceph/wip-pipelining

Use SSH pipelining

8 years agoUse SSH pipelining 109/head
Zack Cerza [Thu, 24 Aug 2017 22:27:58 +0000 (16:27 -0600)]
Use SSH pipelining

See http://docs.ansible.com/ansible/latest/become.html#becoming-an-unprivileged-user

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agomon: simplify the admin_socket read logic 108/head
Paul Cuzner [Wed, 23 Aug 2017 21:14:44 +0000 (09:14 +1200)]
mon: simplify the admin_socket read logic

The initial commit placed logic in each area that called the admin
socket. This patch separates the admin socket call out to a separate
method, so it gets checked in one place.

Some tidy up and comments added too.

8 years agorgw: add warning when >1 rgw admin socket is detected
Paul Cuzner [Wed, 23 Aug 2017 21:11:52 +0000 (09:11 +1200)]
rgw: add warning when >1 rgw admin socket is detected

8 years agorgw: look for the admin_socket on each call
Paul Cuzner [Wed, 23 Aug 2017 02:58:34 +0000 (14:58 +1200)]
rgw: look for the admin_socket on each call

The admin_socket name for rgw is not fixed, unlike mon/osds. Therefore
to account for svc restarts and name changes the socket name is
determined at each get_stats cycle. If the socket isn't there, the
collector just passes back the version of radosgw to the caller and
will send stats again once a socket is detected on the host

8 years agomon: account for null dict from _admin_socket
Paul Cuzner [Wed, 23 Aug 2017 02:56:37 +0000 (14:56 +1200)]
mon: account for null dict from _admin_socket

the _admin_socket method could return a null dict if the
socket is not there (i.e. ceph-mon is down). By checking for the
empty dict, the collector can remain active while ceph-mon is
stopped and restarted during normal maintenance processes on a
host.

8 years agoiscsi: trigger stats only when iscsi is active
Paul Cuzner [Wed, 23 Aug 2017 02:54:15 +0000 (14:54 +1200)]
iscsi: trigger stats only when iscsi is active

look for the iscsi dir in sysfs to determine when to
send the iscsi stats. If the iscsi base dir is not there
the collector will just send the version of gwcli

8 years agocommon: catch stderr output into the stdout pipe
Paul Cuzner [Wed, 23 Aug 2017 02:52:33 +0000 (14:52 +1200)]
common: catch stderr output into the stdout pipe

Ensure stdout and stderr output is returned to the caller

8 years agocephmetrics: simplified the probe logic
Paul Cuzner [Wed, 23 Aug 2017 02:51:52 +0000 (14:51 +1200)]
cephmetrics: simplified the probe logic

probe logic changed in the base class, so the code here can
change to take advantage

8 years agobase: detect ceph role based on installed binaries
Paul Cuzner [Wed, 23 Aug 2017 02:51:02 +0000 (14:51 +1200)]
base: detect ceph role based on installed binaries

In addition get_version simplified, and the _admin_socket method logic
tightened up to account for missing socket files.

8 years agoMerge pull request #100 from ceph/event-info
Zack Cerza [Tue, 22 Aug 2017 18:37:14 +0000 (12:37 -0600)]
Merge pull request #100 from ceph/event-info

Add bluestore performance metrics and enhanced ceph health information

8 years agoMerge pull request #103 from ceph/wip-ubuntu-gweb
Zack Cerza [Tue, 22 Aug 2017 18:34:31 +0000 (12:34 -0600)]
Merge pull request #103 from ceph/wip-ubuntu-gweb

ansible: Use graphite-web instead of graphite-api on Ubuntu

8 years agominor fixes during PR review 100/head
Paul Cuzner [Tue, 22 Aug 2017 00:08:56 +0000 (12:08 +1200)]
minor fixes during PR review

8 years agoceph-cluster: fix queries for rgw and iscsi version tables
Paul Cuzner [Mon, 21 Aug 2017 04:58:04 +0000 (16:58 +1200)]
ceph-cluster: fix queries for rgw and iscsi version tables

8 years agodashboard query update to filter out old OSDs
Paul Cuzner [Mon, 21 Aug 2017 04:57:16 +0000 (16:57 +1200)]
dashboard query update to filter out old OSDs

Old OSDs will still exist in the TSDB, and could show as out or down.
The update uses transformNull to pick out osds with null values and
filter them out of the results shown.

8 years agobase: handle admin_socket connection failures gracefully
Paul Cuzner [Mon, 21 Aug 2017 04:52:22 +0000 (16:52 +1200)]
base: handle admin_socket connection failures gracefully

Connection errors are reported to the parent object and the cephmetrics
log file

8 years agorgw: handle the scenario where the socket doesn't respond
Paul Cuzner [Mon, 21 Aug 2017 04:51:18 +0000 (16:51 +1200)]
rgw: handle the scenario where the socket doesn't respond

The perf dump to the socket could fail, so this change handles this
scenario by just reporting the issue avoiding connectionerror
problems

8 years agonode-detail: minor fix to template for osd_ids
Paul Cuzner [Mon, 21 Aug 2017 01:38:48 +0000 (13:38 +1200)]
node-detail: minor fix to template for osd_ids

osd_id template had to be updated to avoid issues with the move of
num_osds and ceph_version metrics.

8 years agorelationships diagram updated to reflect new ceph-health dashboard
Paul Cuzner [Mon, 21 Aug 2017 01:36:52 +0000 (13:36 +1200)]
relationships diagram updated to reflect new ceph-health dashboard

8 years agoceph-pools: fix yaxis in pool iops chart
Paul Cuzner [Mon, 21 Aug 2017 01:36:29 +0000 (13:36 +1200)]
ceph-pools: fix yaxis in pool iops chart

8 years agoosd-information: fixes for null entries and time windows used on pie-charts
Paul Cuzner [Mon, 21 Aug 2017 01:36:01 +0000 (13:36 +1200)]
osd-information: fixes for null entries and time windows used on pie-charts

osd's that fail result in nulls in the data series, so queries updated to
account for this gaps. In addition, a time window if 2 mins used to
restrict the obs that have to be grabbed from graphite for the pie charts

8 years agoceph-cluster: health info removed, ceph version tables added
Paul Cuzner [Mon, 21 Aug 2017 01:33:59 +0000 (13:33 +1200)]
ceph-cluster: health info removed, ceph version tables added

exploits the updates to the basecollector class which now provide ceph
version which is presented in table form, by role.

8 years agoMerge pull request #97 from ceph/wip-branto
Gregory Meno [Fri, 18 Aug 2017 17:39:13 +0000 (10:39 -0700)]
Merge pull request #97 from ceph/wip-branto

rpm: Fix spec file source versioning

8 years agorpm: Fix spec file source versioning 97/head
Boris Ranto [Wed, 16 Aug 2017 11:11:00 +0000 (13:11 +0200)]
rpm: Fix spec file source versioning

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoansible: Use graphite-web instead of graphite-api on Ubuntu 103/head
David Galloway [Tue, 15 Aug 2017 21:17:02 +0000 (17:17 -0400)]
ansible: Use graphite-web instead of graphite-api on Ubuntu

Because of https://github.com/brutasse/graphite-api/issues/222

Signed-off-by: David Galloway <dgallowa@redhat.com>
8 years agoMerge pull request #95 from ceph/iscsi-fix
Zack Cerza [Tue, 8 Aug 2017 22:42:05 +0000 (15:42 -0700)]
Merge pull request #95 from ceph/iscsi-fix

iscsi.py: fix to defer the import of rtslib_fb

8 years agoiscsi.py: fix to defer the import of rtslib_fb 95/head
Paul Cuzner [Tue, 8 Aug 2017 21:48:55 +0000 (09:48 +1200)]
iscsi.py: fix to defer the import of rtslib_fb

the goal of the parent module cephmetrics is to be generic across the
different ceph roles. By deferring the import of rtslib to the instantiation
of the first (and only!) ISCSIGateway object cephmetrics can import this
iscsi module without a problem regardless of the runtime environment.

8 years agoMerge pull request #92 from ceph/iscsi-support
Zack Cerza [Tue, 8 Aug 2017 18:10:13 +0000 (11:10 -0700)]
Merge pull request #92 from ceph/iscsi-support

Add iSCSI gateway support

8 years agoiscsi.py: updates to address code review comments 92/head
Paul Cuzner [Mon, 7 Aug 2017 23:44:16 +0000 (11:44 +1200)]
iscsi.py: updates to address code review comments

8 years agoiscsi-overview: minor fixes and rename of the iscsi_gateways variable
Paul Cuzner [Mon, 7 Aug 2017 23:43:44 +0000 (11:43 +1200)]
iscsi-overview: minor fixes and rename of the iscsi_gateways variable

The client configuration panel was not excluding null entries, so when
rbd get unmasked from clients and not reused, they would still show up
in the table

In addition the templating variable iscsi_gateway was renamed to iscsi_gateways
aligning to the naming of the osd_servers and rgw_servers

8 years agodahsboard.yml : updated to show the variable needed by the iscsi dashboard
Paul Cuzner [Mon, 7 Aug 2017 23:41:50 +0000 (11:41 +1200)]
dahsboard.yml : updated to show the variable needed by the iscsi dashboard

the iscsi_gateways templating variable is used to generate the graphs,
so for iscsi based deployments this variable will need to be defined to
ensure the queries work correctly in grafana

8 years agoMerge pull request #88 from ceph/wip-centos
Zack Cerza [Mon, 7 Aug 2017 22:12:24 +0000 (15:12 -0700)]
Merge pull request #88 from ceph/wip-centos

Support CentOS 7 in devel_mode

8 years agocephmetrics.py: updated to detect and collect stats for iSCSI gateways
Paul Cuzner [Mon, 7 Aug 2017 00:20:10 +0000 (12:20 +1200)]
cephmetrics.py: updated to detect and collect stats for iSCSI gateways

The probe method now looks for the sysfs kernel entries that denote an
iscsi gateway is running on the node. When this dir is found an instance
of the iscsi collector (ISCSIGateway) is created and polled during
every read callback.

8 years agodashboard.yml: updated to includes the iscsi-overview dashboard
Paul Cuzner [Mon, 7 Aug 2017 00:17:54 +0000 (12:17 +1200)]
dashboard.yml: updated to includes the iscsi-overview dashboard

8 years agoiscsi.py : iscsi collector added to create stats for iSCSI gateways
Paul Cuzner [Mon, 7 Aug 2017 00:17:30 +0000 (12:17 +1200)]
iscsi.py : iscsi collector added to create stats for iSCSI gateways

The iscsi module interacts with rtslib_fb to extract LIO configuration
settings to create performance and configuration metrics.

8 years agoiscsi-overview: New dashboard to monitor iscsi gateway nodes
Paul Cuzner [Mon, 7 Aug 2017 00:15:59 +0000 (12:15 +1200)]
iscsi-overview: New dashboard to monitor iscsi gateway nodes

8 years agoMerge pull request #90 from ceph/dashboard-updates
Zack Cerza [Fri, 4 Aug 2017 04:18:26 +0000 (21:18 -0700)]
Merge pull request #90 from ceph/dashboard-updates

Dashboard updates

8 years agoosd-information: minor fixes for larger environments 90/head
Paul Cuzner [Thu, 3 Aug 2017 04:53:57 +0000 (16:53 +1200)]
osd-information: minor fixes for larger environments

In a 600+ OSD environment the charts were based on averageSeries which
was taking a long time. This has now been changed, so the comparison
chart only shows current values for a given OSD for comparison

8 years agonetwork-usage : fix yaxis units on mon/rgw and osd graphs
Paul Cuzner [Thu, 3 Aug 2017 04:20:46 +0000 (16:20 +1200)]
network-usage : fix yaxis units on mon/rgw and osd graphs

Initial commit did not use the right metric - missed due to low load, more
obvious on larger systems!

8 years agoat-a-glance: link updates to reflect dashboard name changes
Paul Cuzner [Wed, 2 Aug 2017 02:20:41 +0000 (14:20 +1200)]
at-a-glance: link updates to reflect dashboard name changes

In addition to the link updates
- scrub state reflects both scrub and deep-scrub status
- the scrub panel no longer turns to "warn" when scrub is active - it's
  a natural feature of the cluster and not a problem! However, it will
  turn red, if scrub or deep-scrub is disabled
- disk util panel changed to a graph and switched to average and 95%ile
  (95%ile on a busy cluster just shows too much variance)
- OSDs panel now links to the ceph-osd-information dashboard

8 years agoceph-backend-storage: rows reorganized and additional host tables added
Paul Cuzner [Wed, 2 Aug 2017 02:16:16 +0000 (14:16 +1200)]
ceph-backend-storage: rows reorganized and additional host tables added

OSD capacity by Host table added to the summary row, showing capacity by
host

Performance data split into two rows, allowing each one to be hidden
independently

8 years agoceph-osd-information : osd dashboard now provides summary and performance data
Paul Cuzner [Wed, 2 Aug 2017 02:13:00 +0000 (14:13 +1200)]
ceph-osd-information : osd dashboard now provides summary and performance data

Summary row shows;
- osd count
- osd up count
- osd's down
- disk size summary (pie chart showing what sizes of disk are in the cluster
- table of osd to disk size
- OSD encryption summary (how many of my OSDs are encrypted?)
- OSD type status (how many OSDs are filestore vs bluestore

Panel includes an OSD id which is used as a filter for the filestore
performance row

The performance row now shows average OSD performance for a single OSD or
all OSDs. This can then be used for side-by-side comparison with OSD
performance across the cluster at the 95%ile.

8 years agoceph-pools: rename of the dashboard that provides pool performance data
Paul Cuzner [Wed, 2 Aug 2017 02:08:59 +0000 (14:08 +1200)]
ceph-pools: rename of the dashboard that provides pool performance data

8 years agoosd-node-detail : multiple changes to better show OSD level metrics
Paul Cuzner [Wed, 2 Aug 2017 02:08:24 +0000 (14:08 +1200)]
osd-node-detail : multiple changes to better show OSD level metrics

changes include;
- support for osd and disk name to provide filters to the graphs
- add osd overview row containing
  - raw capacity panel
  - host.disk -> disk size table
  - host.disk -> osd id
- shortcut links added to other overview type dashboards

8 years agoreplacement version for ceph-rados dashboard
Paul Cuzner [Wed, 2 Aug 2017 02:04:47 +0000 (14:04 +1200)]
replacement version for ceph-rados dashboard

Renamed to ceph-cluster to better reflect the metrics and data shown on
the dashboard

8 years agodiagram updated to reflect name changes
Paul Cuzner [Wed, 2 Aug 2017 02:03:48 +0000 (14:03 +1200)]
diagram updated to reflect name changes

8 years agoremoved dashboards
Paul Cuzner [Wed, 2 Aug 2017 02:03:31 +0000 (14:03 +1200)]
removed dashboards

8 years agoMerge pull request #89 from ceph/wip-branto
Zack Cerza [Wed, 2 Aug 2017 01:12:39 +0000 (18:12 -0700)]
Merge pull request #89 from ceph/wip-branto

rpm: Drop collectors dependency for main package

8 years agorpm: Drop collectors dependency for main package 89/head
Boris Ranto [Tue, 1 Aug 2017 20:29:39 +0000 (22:29 +0200)]
rpm: Drop collectors dependency for main package

Signed-off-by: Boris Ranto <branto@redhat.com>
8 years agoMerge pull request #87 from ceph/dmcrypt-support
Zack Cerza [Mon, 31 Jul 2017 22:08:12 +0000 (15:08 -0700)]
Merge pull request #87 from ceph/dmcrypt-support

Dmcrypt support

8 years agoMention CentOS 7 support 88/head
Zack Cerza [Fri, 28 Jul 2017 22:39:50 +0000 (15:39 -0700)]
Mention CentOS 7 support

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoceph-grafana: Require PyYAML on yum systems
Zack Cerza [Fri, 28 Jul 2017 22:38:30 +0000 (15:38 -0700)]
ceph-grafana: Require PyYAML on yum systems

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoosd: add % used to each OSD 87/head
Paul Cuzner [Fri, 28 Jul 2017 08:02:45 +0000 (20:02 +1200)]
osd: add % used to each OSD

percent used was not available within the osd metric tree only the
physical disk. With the inclusion under the osd, the percent_used can
reference the osd_id directly easier in any queries

8 years agoosd: support device-mappper (dmcrypt) osd's/journals
Paul Cuzner [Fri, 28 Jul 2017 02:28:19 +0000 (14:28 +1200)]
osd: support device-mappper (dmcrypt) osd's/journals

dmcrypt osd/journals make use of /dev/mapper devices, so this change
supports the device mappper naming for the device links. In addition,
all disks (osd/jrnl metrics) have additional metrics; "osd_type" and
"encrypted" to help understand the status of the OSDs within the cluster.

8 years agocommon: Add osd type and encrypted state to disk objects
Paul Cuzner [Fri, 28 Jul 2017 02:23:24 +0000 (14:23 +1200)]
common: Add osd type and encrypted state to disk objects

osd_type is defined as; 0 - filestore, 1 - bluestore
encrypted is defined as; 0 - off, 1 - on