]>
git-server-git.apps.pok.os.sepia.ceph.com Git - cephmetrics.git/log
Paul Cuzner [Thu, 24 May 2018 23:54:16 +0000 (11:54 +1200)]
Fix templating definitions
To prevent template init failures the datasource
has been added to the template definitions
Paul Cuzner [Thu, 24 May 2018 22:59:17 +0000 (10:59 +1200)]
Fixes to template variables, and mon_servers
mon_servers query was hardcoded from testing - so
this uses the ceph_mon_metadata metric to find
the names of the mons to allow mon network traffic
to be split out (assumes mons aren't collocated
though!)
Paul Cuzner [Thu, 24 May 2018 22:57:28 +0000 (10:57 +1200)]
Fixes to template variables, and health panel
Template vars now declare the datasource name and
the health panel has been updated to reflect the
error codes from mgr correctly
Paul Cuzner [Thu, 24 May 2018 22:56:08 +0000 (10:56 +1200)]
Fix templating declaration and health chart
Templating needed to have the data source defined
to prevent template init failures, the health
history chart's yaxis wasn't big enough i.e. when
the cluster is in an error state (err=2), the line
drawn by grafana was at the top of the graph and
only just visible!
Zack Cerza [Wed, 23 May 2018 20:05:45 +0000 (14:05 -0600)]
Merge pull request #183 from ceph/dashboard-refresh
Dashboard refresh
Paul Cuzner [Tue, 22 May 2018 23:51:36 +0000 (11:51 +1200)]
Dashboard uid metadata field removed
The presence of uid in the dashboard definition is causing
compatibility issues with moving to Grafana v5.1. By removing the
uid entry, the dashboards still work and can be migrated to 5.1
Paul Cuzner [Tue, 22 May 2018 23:43:03 +0000 (11:43 +1200)]
include pg unknown in pg state table
Paul Cuzner [Tue, 22 May 2018 23:42:18 +0000 (11:42 +1200)]
Include pg_unknown in PG status pie chart
Paul Cuzner [Tue, 22 May 2018 23:14:11 +0000 (11:14 +1200)]
Add RAM usage to OSD Node detail dashboard
Paul Cuzner [Wed, 16 May 2018 08:41:16 +0000 (20:41 +1200)]
Changes to health history and inclusion of pg information
Paul Cuzner [Wed, 16 May 2018 07:54:04 +0000 (19:54 +1200)]
Added pg distribution and pg per osd tables
Paul Cuzner [Wed, 16 May 2018 04:04:20 +0000 (16:04 +1200)]
add alert for high pgnums on an OSD
Paul Cuzner [Tue, 22 May 2018 23:21:06 +0000 (11:21 +1200)]
Update prometheus source name (align with ansible deployment)
Paul Cuzner [Mon, 14 May 2018 04:17:09 +0000 (16:17 +1200)]
capacity units updated for consistency
Paul Cuzner [Mon, 14 May 2018 04:15:40 +0000 (16:15 +1200)]
Drop encrypted osd panel, add dev-class piechart
Boris Ranto [Tue, 22 May 2018 12:32:56 +0000 (14:32 +0200)]
Merge pull request #181 from zmc/wip-177
Work around #177
Reviewed-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Tue, 22 May 2018 12:25:39 +0000 (14:25 +0200)]
Merge pull request #185 from zmc/wip-grafana-fixes
Support Grafana 5.1.x
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:56:10 +0000 (16:56 -0600)]
ceph-grafana: Restart container when creating
This is necessary so we can move to newer versions
of the container image.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:56:00 +0000 (16:56 -0600)]
ceph-grafana: Use the correct UID for 5.1.x
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:55:09 +0000 (16:55 -0600)]
ceph-grafana: Bump container version
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 17 May 2018 17:29:10 +0000 (11:29 -0600)]
Merge pull request #167 from zmc/wip-mon-db-host
Monitor the dashboard host
Zack Cerza [Thu, 17 May 2018 17:28:56 +0000 (11:28 -0600)]
Merge pull request #182 from zmc/wip-bluestore-selinux
ceph-collectd: Only set SEL context on filestore
Zack Cerza [Thu, 10 May 2018 22:55:05 +0000 (16:55 -0600)]
ceph-collectd: Set SEL context on bluestore
... as well as filestore.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 16 May 2018 22:35:22 +0000 (16:35 -0600)]
Merge pull request #175 from zmc/wip-classic-fixes
Fixes for classic mode
Zack Cerza [Mon, 14 May 2018 18:37:18 +0000 (12:37 -0600)]
Merge pull request #180 from zmc/fix-docker-tests
ceph-docker: Update tests for 'containerized' flag
Zack Cerza [Mon, 14 May 2018 18:36:59 +0000 (12:36 -0600)]
Merge pull request #173 from zmc/wip-cpu-quota
ansible: Set CPU/RAM quotas on containers
Zack Cerza [Thu, 10 May 2018 22:30:11 +0000 (16:30 -0600)]
ceph-docker: Update tests for 'containerized' flag
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 10 May 2018 19:16:41 +0000 (13:16 -0600)]
Work around #177
Boris Ranto [Thu, 10 May 2018 11:55:02 +0000 (13:55 +0200)]
Merge pull request #166 from zmc/wip-prom-pkg
Allow installing prometheus and node_exporter from packages
Reviewed-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 10 May 2018 08:57:19 +0000 (10:57 +0200)]
Merge pull request #176 from zmc/wip-fix-notification
ceph-grafana: Use 'admin_user', not 'user'
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Wed, 9 May 2018 19:15:42 +0000 (13:15 -0600)]
ceph-grafana: Use 'admin_user', not 'user'
We missed this when merging #165 & #170
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 30 Apr 2018 12:29:59 +0000 (14:29 +0200)]
tox.init: Unpin ansible for ansible-syntax
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Sat, 21 Apr 2018 02:20:52 +0000 (21:20 -0500)]
node_exporter: Fix RPM name
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 20 Apr 2018 23:39:22 +0000 (18:39 -0500)]
ceph-prometheus: Allow installation via packages
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 8 May 2018 18:53:50 +0000 (12:53 -0600)]
ceph-docker: Fail if using classic mode
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 8 May 2018 18:48:08 +0000 (12:48 -0600)]
classic: Fix pie charts w/ grafana 5.0
When legends are off, we're seeing "Invalid dimensions for plot, width =
179, height = 0"
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Mon, 7 May 2018 08:37:48 +0000 (10:37 +0200)]
Merge pull request #170 from zmc/wip-notification
ceph-grafana: Add notification channel
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Fri, 4 May 2018 18:21:06 +0000 (12:21 -0600)]
ansible: Set RAM quotas on containers
We default to 4GB RAM, and we use double that value for swap.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 4 May 2018 18:04:13 +0000 (12:04 -0600)]
ansible: Set CPU quotas on containers
We want to default to limiting each container to two cores. While docker
has a nicer way to do this, it isn't supported by ansible as of 2.5.2.
It's easy enough to use the slightly more awkward method, however.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 20 Apr 2018 23:38:46 +0000 (18:38 -0500)]
playbook.yml: Add node_exporter tag
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 4 May 2018 17:48:26 +0000 (11:48 -0600)]
Merge pull request #165 from ceph/wip-branto
Add a couple of ansible fixes
Zack Cerza [Wed, 2 May 2018 16:13:00 +0000 (10:13 -0600)]
ceph-grafana: Add notification channel
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 1 May 2018 18:00:00 +0000 (12:00 -0600)]
ceph-node-exporter: Open firewall ports
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 1 May 2018 17:59:31 +0000 (11:59 -0600)]
Add dashboard for monitoring the db host itself
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 1 May 2018 17:58:27 +0000 (11:58 -0600)]
ceph-prometheus: Add separate job for db host
So that we can monitor the db host's resources, without mixing that data
in with the ceph cluster's metrics.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 1 May 2018 14:52:23 +0000 (08:52 -0600)]
playbook.yml: Deploy node_exporter on db host
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 22:03:22 +0000 (00:03 +0200)]
Add a sample group_vars file
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 21:16:16 +0000 (23:16 +0200)]
Fix grafana admin user/password setting
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 21:15:30 +0000 (23:15 +0200)]
Extend purge.yml
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 20:36:42 +0000 (22:36 +0200)]
Set home dashboard
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 19:20:33 +0000 (21:20 +0200)]
Strip away containerized_deployment
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 19:12:08 +0000 (21:12 +0200)]
Merge containerized variables
This will also help us fix of accessing these variables in ceph-docker
role where they were not defined previously.
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 19 Apr 2018 19:00:58 +0000 (21:00 +0200)]
Merge pull request #160 from zmc/wip-prometheus-role
Prometheus support via ceph-mgr
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Wed, 18 Apr 2018 03:44:06 +0000 (21:44 -0600)]
ceph-prometheus: Rename container to 'prometheus'
This change requires some manual steps:
- docker rename cephmetrics-prometheus prometheus
- mv /etc/systemd/system/cephmetrics-prometheus.service
/etc/systemd/system/prometheus.service
- systemctl daemon-reload
And finally, rerun the playbook.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 18 Apr 2018 18:59:24 +0000 (12:59 -0600)]
ceph-grafana: Use light theme
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 17 Apr 2018 20:26:48 +0000 (14:26 -0600)]
ceph-grafana: Ship a minimal grafana.ini
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 17 Apr 2018 16:38:57 +0000 (10:38 -0600)]
ceph-grafana: Use v5.0.4 for containers
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 13 Apr 2018 22:43:17 +0000 (16:43 -0600)]
ceph-grafana: Use ansible's to_json filter
'tojson' is part of jinja2 2.9, which isn't in RHEL.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 12 Apr 2018 22:39:24 +0000 (16:39 -0600)]
ceph-grafana: Fix dashboard_dir in devel_mode
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 12 Apr 2018 16:35:49 +0000 (10:35 -0600)]
Move devel_packages out of defaults
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 29 Mar 2018 19:26:32 +0000 (13:26 -0600)]
tox: Use py.test -v for integration tests
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 29 Mar 2018 00:12:42 +0000 (18:12 -0600)]
ceph-graphite: Fix test_metrics_present
If a cluster changes over time, the test broke. When looking at stored
metrics, assert that our inventory hosts are a subset of the stored
metrics as opposed to doing an equality check.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 28 Mar 2018 22:43:47 +0000 (16:43 -0600)]
ceph-graphite: Skip irrelevant integration tests
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 22 Feb 2018 21:52:58 +0000 (14:52 -0700)]
Merge pull request #154 from jeanchlopez/master
Fix a typo in the purge.yml Ansible playbook file
Boris Ranto [Mon, 15 Jan 2018 18:42:14 +0000 (19:42 +0100)]
Merge pull request #156 from ceph/wip-req-req
rpm: Require python-requests
Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
Boris Ranto [Fri, 12 Jan 2018 10:34:28 +0000 (11:34 +0100)]
rpm: Require python-requests
Signed-off-by: Boris Ranto <branto@redhat.com>
Jean-Charles Lopez [Wed, 13 Dec 2017 13:10:34 +0000 (10:10 -0300)]
Merge pull request #1 from jeanchlopez/jeanchlopez-patch-purge-playbook-1
Fix a typo in purge.yml
Jean-Charles Lopez [Wed, 13 Dec 2017 13:00:24 +0000 (10:00 -0300)]
Fix a typo in purge.yml
Author JC Lopez
Fix. Change /var/lig/graphite-web with /var/lib/graphite-web
Boris Ranto [Wed, 8 Nov 2017 20:38:44 +0000 (21:38 +0100)]
Merge pull request #149 from ceph/wip-centos
ansible: Fixes for CentOS deployments
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Thu, 26 Oct 2017 21:16:15 +0000 (15:16 -0600)]
Add python-requests to yum packages
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 26 Oct 2017 19:04:09 +0000 (13:04 -0600)]
ansible: Work with old Django versions
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 26 Oct 2017 15:28:29 +0000 (09:28 -0600)]
Merge pull request #148 from zmc/wip-fixes
Minor dashboard fixes
Zack Cerza [Wed, 25 Oct 2017 16:54:42 +0000 (10:54 -0600)]
ceph-rgw-workload.json: Set editable=False
BZ1506132
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 25 Oct 2017 16:54:18 +0000 (10:54 -0600)]
osd-node-detail.json: Don't hardcode cluster name
BZ1506200
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 24 Oct 2017 17:47:43 +0000 (11:47 -0600)]
Merge pull request #144 from ceph/wip-testinfra
Add some integration tests using testinfra
Boris Ranto [Tue, 24 Oct 2017 07:34:43 +0000 (09:34 +0200)]
Merge pull request #147 from ceph/wip-libsemanage-python
Require libsemanage-python on collector nodes
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Fri, 20 Oct 2017 20:34:54 +0000 (14:34 -0600)]
Require libsemanage-python on collector nodes
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 17 Oct 2017 15:15:28 +0000 (09:15 -0600)]
Add some integration tests using testinfra
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 12 Oct 2017 02:55:13 +0000 (20:55 -0600)]
Merge pull request #141 from ceph/wip-fix-none-osd-colld
collectors: No update on fetch osd_stats failure
Boris Ranto [Wed, 11 Oct 2017 19:43:36 +0000 (21:43 +0200)]
Merge pull request #143 from ceph/wip-pyyaml
Ensure PyYAML is installed on the Grafana node
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Wed, 11 Oct 2017 16:34:33 +0000 (10:34 -0600)]
Ensure PyYAML is installed on the Grafana node
It's needed by dashUpdater.py. BZ1497198
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Wed, 11 Oct 2017 10:18:03 +0000 (12:18 +0200)]
Merge pull request #142 from ceph/ui-bz-fixes
Ui bz fixes
Reviewed-by: Boris Ranto <branto@redhat.com>
Paul Cuzner [Wed, 11 Oct 2017 04:16:42 +0000 (17:16 +1300)]
osd-information: disk size summary timings and metrics updated
Update to the query interval needed on larger configurations to make sure
all samples are present and OSD size table units updated. BZ1496186
Paul Cuzner [Wed, 11 Oct 2017 04:14:25 +0000 (17:14 +1300)]
osd-node-detail: updated to correct disk units
All disk units now showing as decimal not binary values. In addition
help text updated on the raw capacity panel to better explain how the
value is derived. BZ1496186
Paul Cuzner [Wed, 11 Oct 2017 04:12:53 +0000 (17:12 +1300)]
ceph-pools : sort and selection fixes
sortByMaxima added to queries and filter (pool_name) added to the
recovery pool overview panel to work the same as the others. BZ1499734
Paul Cuzner [Wed, 11 Oct 2017 04:11:02 +0000 (17:11 +1300)]
at-a-glance: updates to address OSD UP state issues
OSD Hosts and RGW Hosts panel changed to vonage status panel to better
reflect UP states BZ1498504
Zack Cerza [Tue, 10 Oct 2017 23:05:50 +0000 (17:05 -0600)]
Merge pull request #139 from ceph/wip-packaging
Finalize packaging changes
Boris Ranto [Tue, 10 Oct 2017 22:24:27 +0000 (00:24 +0200)]
rpm: Move the rpm patch off the gitroot
Signed-off-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Mon, 9 Oct 2017 16:07:17 +0000 (10:07 -0600)]
Merge pull request #140 from ceph/wip-vonage-color
Tweak Vonage panel color in newer versions
Boris Ranto [Sat, 7 Oct 2017 09:28:09 +0000 (11:28 +0200)]
collectors: No update on fetch osd_stats failure
We currently do not check that we were able to successfully fetch the
osd_stas for a given osd_id which can result in collectors being down
for a while. We should check that we were able to fetch them before we
actually try to update them.
Signed-off-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Fri, 6 Oct 2017 19:36:05 +0000 (13:36 -0600)]
Tweak Vonage panel color in newer versions
The plugin recently added a system for configurable colors, breaking our
earlier hack. Add a new one.
See: https://github.com/Vonage/Grafana_Status_panel/commit/
8633ba23bb2f336342ba3f7002a07be597b6e816
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Fri, 6 Oct 2017 17:34:58 +0000 (19:34 +0200)]
rpm: ipaddr ansible filter requires python-netaddr
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Thu, 5 Oct 2017 20:39:17 +0000 (22:39 +0200)]
ansible: Disable devel_mode
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Fri, 6 Oct 2017 10:00:25 +0000 (12:00 +0200)]
ansible: Fix merge_vars.yml
Currently, we override the variables in merge_vars. However, if we run
the script several times (e.g. a host is a grafana and a collectd node)
then vars[item] is defined but it can be that it is not a mapping. In
this case we redefine the value to empty string breaking the actual
values of the variables.
We redefine e.g. devel_mode or use_epel to empty string this way making
it false for the grafana server node that is both grafana and the
collectd node.
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Fri, 6 Oct 2017 07:16:40 +0000 (09:16 +0200)]
Merge pull request #138 from ceph/dboard-fixes
dashboard updates to address minor usability issues
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Fri, 6 Oct 2017 01:54:45 +0000 (19:54 -0600)]
Merge pull request #137 from ceph/wip-graphite-migrate
Fix graphite migration
Paul Cuzner [Fri, 6 Oct 2017 01:37:40 +0000 (14:37 +1300)]
dashboard updates to address minor usability issues
ceph-at-a-glance: Capacity util panel now links directly to the capacity
chart in the ceph-cluster dashboard BZ
1496198
ceph-osd-information: query timespan changed from the default 1hr to 1m
aligning to the OSDs panel query on the 'at-a-glance' dashboard
BZ
1498504
Zack Cerza [Thu, 5 Oct 2017 19:43:57 +0000 (13:43 -0600)]
Use --fake-initial for migrations if necessary
Migrations won't work correctly if the db already exists. This manifests
in an error like:
django.db.utils.OperationalError: table "django_content_type" already exists
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 5 Oct 2017 19:42:50 +0000 (13:42 -0600)]
Use 'graphite-manage migrate' instead of 'syncdb'
syncdb is deprecated.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 4 Oct 2017 22:17:21 +0000 (16:17 -0600)]
Merge pull request #136 from ceph/wip-db-wait
Add a note about waiting to collect data