Paul Cuzner [Fri, 30 Jun 2017 02:05:33 +0000 (14:05 +1200)]
at-a-glance: multiple fixes to mon/osd/growth and forecast panels
MON/OSD panel queries updated to address the interpolation
problem where floats were shown. OSD panel also now shows
total OSDs
Templating update for the disk_full_threshold (2->80)
Growth/Forecast panel queries updated to account for data coming
from multiple mon's
Health Panel updated to show as RED when the cluster is in an
ERROR state
Paul Cuzner [Thu, 29 Jun 2017 04:50:30 +0000 (16:50 +1200)]
at-a-glance: pg status pie chart changes
a degraded state is now shown based on the diff of pg_active and
pg_active_clean. This intermediate metric has been added to the pie
chart so it shows; active+clean, degraded and peering.
Paul Cuzner [Thu, 29 Jun 2017 03:15:58 +0000 (15:15 +1200)]
network-usage: dashboard updated to track enX interface stats
graphite doesn't support blacklisting in queries, so interface names that
we're interested in have to be whitelisted. This fix now tracks enX, ethX and
bondX interface names.
Paul Cuzner [Thu, 29 Jun 2017 03:12:54 +0000 (15:12 +1200)]
ceph-rados: display fixes to a several charts
health history was a mess with the light theme. The chart now uses threshold lines
(amber and red), and plots health against those lines. In addition small fixes to the capacity
chart (it was stacking values!), and the monitor status table
Paul Cuzner [Thu, 29 Jun 2017 03:09:01 +0000 (15:09 +1200)]
backend-storage: cosmetic changes to heatmap and graphs
Heatmap 'spectrum' was not showing well in the light theme - this
make it more readable. In addition 'info' has been added to explain the
heatmap and what it represents.
Paul Cuzner [Thu, 29 Jun 2017 03:04:17 +0000 (15:04 +1200)]
at-a-glance : query and cosmetic changes
Multiple changes as follows;
- dashboard links (top) hover over was mis-aligned. fixed
- forecast value if negative now shows N/A
- forecast and growth queries updated
- disks near full set to 0, if there aren't any issues (instead of no value)
- added descriptions on various panels
Paul Cuzner [Thu, 29 Jun 2017 02:53:43 +0000 (14:53 +1200)]
status-panel: Update the bkgnd color to support the light theme
By default the panel just uses 'green', which with the light theme is too
dark, making the text on the panel difficult to read. This ansible step just
updates the color in the css to make the text more readable
Boris Ranto [Mon, 26 Jun 2017 12:30:38 +0000 (14:30 +0200)]
collectors: Pass through keyword arguments
We need this to pass through the log_level keyword argument to the base
class. Otherwise, the collectd will fail because these classes get
unknown argument log_level.
Boris Ranto [Sun, 25 Jun 2017 08:13:51 +0000 (10:13 +0200)]
ansible: Write grafana config when grafana is down
We need to start and communcite with grafana after we push our own
config to the grafana. The old config can use different locations
e.g. for the DB and these won't get populated properly if we run
dashUpdater or post to the grafana API too early.
Paul Cuzner [Mon, 26 Jun 2017 05:33:45 +0000 (17:33 +1200)]
at-a-glance: various enhancements
Dashboard updates;
- initial light theme support - some colour changes on charts
- shuffled panel order so the rows basically cover - overview, client and OS
- added disks near full panel
- added growth and forecast panels
- added OSD level ram usage summary
- PG's now shown using Grafana labs pie chart plugin
- count of rbds now shown (on client row)
- osd host count query changed
- queries on Mons and OSDs changed to mitigate state flapping
- 'buttons' changed to make them work with light/dark themes
Paul Cuzner [Mon, 26 Jun 2017 05:12:23 +0000 (17:12 +1200)]
mon: updated for logging and additional metrics collected for rbd and osd hosts
collector class now scans the cluster to determine the count of RBDs. Each monitor
will pick a discrete set of pools to scan, so the overall load is shared across monitors.
In addition, since the osd tree command is used to determine the up/down state of
the OSDs, the same output is used to determine the number of osd hosts in the
configuration. Prior to this change the determination was inferred through a
graphite query.
Paul Cuzner [Mon, 26 Jun 2017 05:07:50 +0000 (17:07 +1200)]
cephmetrics: Use a default value type for graphite, and assign default logging
Before this change, if a variable was defined in a class but NOT defined in it's attributes
the collector would fail. With this change a default of gauge is assigned.
In addition, a default logging level is set for all the collectors, if not specified by LogLevel
in the collectd.conf plugin
Paul Cuzner [Thu, 22 Jun 2017 21:16:08 +0000 (09:16 +1200)]
add home dashboard support
this change adds a _home_dashboard setting such that the grafana home dashboard
for the admin user can be changed to be the ceph-at-a-glance dashboard.
Paul Cuzner [Thu, 22 Jun 2017 02:53:57 +0000 (14:53 +1200)]
at-a-glance: fix for health panel, and colour match with status panel
The singlestat panel was using value map, but singlestat appears to be interpolating
the health value which results in the value map not being used and a number appearing
on the dashboard. This update uses a range map to handle/workaround this nuance.
In addition, cosmetic changes to the health, disk and latency panels - making their warning
colour state match the status panel warning colour for consistency
Paul Cuzner [Tue, 20 Jun 2017 04:16:52 +0000 (16:16 +1200)]
dashboards updated to account for null missing values
Prior to this change the charts had "Display/Null Value" set as null - but in
an environment where observations/metrics are arriving late or miss, the
resulting chart would be part populated at best, at worst blank with only
hover only providing an indication that data points are present.
Ideally the data should be present - but by setting as connected, null values
will not stop the graphs from being rendered.
Paul Cuzner [Mon, 19 Jun 2017 20:47:25 +0000 (08:47 +1200)]
at-a-glance : health history chart updated
The health history query lead to interpolation of health values, so you'd see 1,2 or 3's due to averaging. This
change updates the query to use consolidateBy function which should keep the values as intended - 0,4,8
representing OK, WARN, ERROR
INSTALL instructions updated since this chart problem was listed as a Known Issue