]>
git-server-git.apps.pok.os.sepia.ceph.com Git - cephmetrics.git/log
jjict [Fri, 29 Mar 2019 12:35:37 +0000 (13:35 +0100)]
Make SMTP, anonymous login and theme configuratebil
Zack Cerza [Wed, 5 Jun 2019 23:52:55 +0000 (17:52 -0600)]
Merge pull request #234 from marcosmamorim/pull_images
Prevent pull images for prometheus and grafana
Boris Ranto [Wed, 5 Jun 2019 15:15:53 +0000 (17:15 +0200)]
Merge pull request #236 from zmc/fix-lint
Fix issues raised by newer ansible-lint versions
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 23:32:36 +0000 (17:32 -0600)]
ansible-lint: Use delegate_to: localhost
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 23:08:45 +0000 (17:08 -0600)]
ansible-lint: Ignore pipefail warning
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 23:06:22 +0000 (17:06 -0600)]
ansible-lint: Don't compare to empty string
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 23:04:27 +0000 (17:04 -0600)]
ansible-lint: Allow a tab in this particular line
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 22:58:49 +0000 (16:58 -0600)]
Replace improperly copied file w/ symlink
The rest of the roles symlink this file; this was a simple oversight.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 22:57:35 +0000 (16:57 -0600)]
ansible-lint: Ignore these lines' length
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 22:57:05 +0000 (16:57 -0600)]
ansible-lint: Ignore missing galaxy_info
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 4 Jun 2019 22:48:13 +0000 (16:48 -0600)]
Merge pull request #235 from servesha/wip-bug-fix
firewalld: added port
Servesha Dudhgaonkar [Tue, 23 Apr 2019 06:39:48 +0000 (12:09 +0530)]
firewalld: added port
Signed-off-by: Servesha Dudhgaonkar <sdudhgao@redhat.com>
Marcos Amorim [Thu, 4 Apr 2019 18:21:10 +0000 (14:21 -0400)]
Prevent pull images for prometheus and grafana
This patch add a new prometheus and grafana variable to allow install when the images already pulled on docker.
Signed-off-by: Marcos Amorim <mamorim@redhat.com>
Zack Cerza [Tue, 5 Feb 2019 23:40:16 +0000 (16:40 -0700)]
Merge pull request #229 from ceph/dashboard-diagrams
Update dashboard relationship diagrams
Paul Cuzner [Thu, 31 Jan 2019 22:33:35 +0000 (11:33 +1300)]
Update dashboard relationship diagrams
The relationships have been updated and an svg
file included for future changes
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Zack Cerza [Thu, 13 Dec 2018 23:49:38 +0000 (16:49 -0700)]
Merge pull request #228 from ceph/node-detail-fix
Fix Host breakdown and disk graphs
Paul Cuzner [Thu, 13 Dec 2018 22:13:24 +0000 (11:13 +1300)]
Fix Host breakdown and disk graphs
Host breakdown was hitting duplicate labels, and
the disk graphs needed a filter to only show the
disk stats for disks that relate to the host's
OSDs.
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Zack Cerza [Tue, 9 Oct 2018 15:10:38 +0000 (09:10 -0600)]
Merge pull request #225 from ceph/wip-cluster-osds
dashboards: Fix cluster OSDs panel
Boris Ranto [Tue, 18 Sep 2018 21:31:14 +0000 (23:31 +0200)]
dashboards: Fix cluster OSDs panel
We still use the old 'id' label for getting the number of OSDs in the
ceph cluster dashboard. However, the label was superseded by the
'ceph_daemon' label in one of the updates to the prometheus exporter in
ceph-mgr. This patch changes the query to what we use in other
dashboards -- i.e. 'ceph_daemon' instead of 'id'.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627725
Signed-off-by: Boris Ranto <branto@redhat.com>
pcuzner [Tue, 28 Aug 2018 19:44:38 +0000 (07:44 +1200)]
Merge pull request #224 from zmc/wip-205
dashboards: Don't filter out 10GbE iface names
Zack Cerza [Tue, 28 Aug 2018 19:41:08 +0000 (12:41 -0700)]
Merge pull request #222 from ceph/fix-health-status
Fix health state colors
Zack Cerza [Mon, 27 Aug 2018 22:43:37 +0000 (15:43 -0700)]
dashboards: Don't filter out 10GbE iface names
https://github.com/ceph/cephmetrics/issues/205
Signed-off-by: Zack Cerza <zack@redhat.com>
pcuzner [Fri, 24 Aug 2018 02:07:55 +0000 (14:07 +1200)]
Merge pull request #212 from zmc/wip-prom-etc-hosts
ceph-prometheus: Optionally add /etc/hosts entries
pcuzner [Fri, 24 Aug 2018 02:07:20 +0000 (14:07 +1200)]
Merge pull request #219 from ceph/wip-erro-panel
dashboards: Add ceph error panel to alert status dashboard
Paul Cuzner [Fri, 24 Aug 2018 01:59:27 +0000 (13:59 +1200)]
Fix health state colors
Two dashboards were translating the values from
ceph_health_status incorrectly, resulting in
the wrong health state being shown.
Closes: https://github.com/ceph/cephmetrics/issues/202
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Boris Ranto [Tue, 21 Aug 2018 10:34:03 +0000 (12:34 +0200)]
dashboards: Add error panel+alerting
Signed-off-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Thu, 2 Aug 2018 21:24:37 +0000 (14:24 -0700)]
ceph-prometheus: Optionally add /etc/hosts entries
This only supports containerized deployments.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 14 Aug 2018 03:09:05 +0000 (20:09 -0700)]
Merge pull request #215 from zmc/wip-nexp-service-name
ceph-node-exporter: Fix defaults for service_name
Zack Cerza [Tue, 14 Aug 2018 03:08:18 +0000 (20:08 -0700)]
Merge pull request #216 from zmc/wip-mgr-firewall
Open ports 9283 and 9090
Zack Cerza [Wed, 8 Aug 2018 21:19:07 +0000 (14:19 -0700)]
ceph-prometheus: Open port 9090
https://github.com/ceph/cephmetrics/issues/214
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 7 Aug 2018 18:32:59 +0000 (11:32 -0700)]
ceph-mgr: Open port 9283
https://github.com/ceph/cephmetrics/issues/213
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 7 Aug 2018 05:51:26 +0000 (22:51 -0700)]
ceph-node-exporter: Fix defaults for service_name
Indentation was off.
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Tue, 31 Jul 2018 18:13:00 +0000 (20:13 +0200)]
Merge pull request #211 from ceph/wip-mgr-optimize
Only run ceph-mgr role on a single node
Reviewed-by: Zack Cerza <zcerza@redhat.com>
Zack Cerza [Tue, 31 Jul 2018 16:43:46 +0000 (09:43 -0700)]
Merge pull request #210 from ceph/wip-cluster-name
ansible: Support non-default cluster name
Boris Ranto [Sat, 28 Jul 2018 12:02:28 +0000 (14:02 +0200)]
Only run ceph-mgr role on a single node
We do not need to run the ceph-mgr role multiple times. The command that
enables the module only need to be run on one of the machines, choosing
first as it is the easiest.
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Fri, 27 Jul 2018 23:48:19 +0000 (01:48 +0200)]
ansible: Support non-default cluster name
Signed-off-by: Boris Ranto <branto@redhat.com>
pcuzner [Thu, 26 Jul 2018 02:33:47 +0000 (14:33 +1200)]
Merge pull request #209 from ceph/wip-branto
Downstream fixes
Boris Ranto [Wed, 25 Jul 2018 16:16:02 +0000 (18:16 +0200)]
osd-node-detail: Fix value repetition
We did not sum the values for RAM usage so we ended up with a couple of
entries being shown for each RAM usage query.
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Wed, 25 Jul 2018 16:12:38 +0000 (18:12 +0200)]
ansible: Fix service_name indentation
The service_name for node_exporter was not indented properly.
Signed-off-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Mon, 16 Jul 2018 15:57:09 +0000 (09:57 -0600)]
Merge pull request #203 from ceph/wip-rpm-patch
rpm: use_epel is no longer defined
Boris Ranto [Mon, 16 Jul 2018 07:30:32 +0000 (09:30 +0200)]
Merge pull request #208 from ceph/fix-throughput-units
Fix units used for throughput
Reviewed-by: Boris Ranto <branto@redhat.com>
Paul Cuzner [Fri, 13 Jul 2018 00:24:35 +0000 (12:24 +1200)]
Fix units used on throughput charts
Applies the same change to the prometheus based
dashboards as commit
239afda4debc80544112fe98d31af3f692678f57 did for
the graphite based dashboards
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
pcuzner [Thu, 12 Jul 2018 23:51:24 +0000 (11:51 +1200)]
Merge pull request #204 from ceph/osd-info-updates
Multiple fixes to OSD information dashboard
Paul Cuzner [Thu, 12 Jul 2018 23:49:33 +0000 (11:49 +1200)]
Fix units used for throughput
Throughput units were using binary representation,
whereas the expected unit is decimal (i.e.
MB/s not MiB/s)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1496186
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Paul Cuzner [Tue, 10 Jul 2018 23:41:30 +0000 (11:41 +1200)]
Multiple fixes to OSD information dashboard
Bluestore tables and charts updated, including;
- switched units from ms to secs which shows us too
- changed metric from commit to KV latency
- updated thresholds in bluestore tables
- switched from rate to irate for bluestore metrics
- updated bluestore text box description
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Boris Ranto [Fri, 29 Jun 2018 11:25:17 +0000 (13:25 +0200)]
rpm: use_epel is no longer defined
Signed-off-by: Boris Ranto <branto@redhat.com>
pcuzner [Tue, 3 Jul 2018 21:51:47 +0000 (09:51 +1200)]
Merge pull request #201 from zmc/wip-uneditable-dbs
Make dashboards uneditable by default
pcuzner [Tue, 3 Jul 2018 21:50:28 +0000 (09:50 +1200)]
Merge pull request #200 from zmc/wip-osd-info-db
ceph-osd-information: Fix numerous bugs
Zack Cerza [Thu, 28 Jun 2018 19:31:41 +0000 (13:31 -0600)]
ceph-osd-information: Fix numerous bugs
Too many to list here; almost every panel was
broken in some regard.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 3 Jul 2018 18:07:58 +0000 (12:07 -0600)]
Merge pull request #197 from ceph/add-rgw-latencies
Added RGW GET/PUT Latencies
Zack Cerza [Tue, 3 Jul 2018 18:07:02 +0000 (12:07 -0600)]
Merge pull request #196 from ceph/iscsi-db-updates
Updated to use OS metrics from default scrape job
Zack Cerza [Tue, 3 Jul 2018 18:06:08 +0000 (12:06 -0600)]
Merge pull request #199 from zmc/wip-cluster-db
ceph-cluster: Fix column styles for version tables
Zack Cerza [Fri, 29 Jun 2018 15:23:10 +0000 (09:23 -0600)]
Merge pull request #198 from zmc/wip-iscsi-prom
Scrape iSCSI-related exporters
Zack Cerza [Thu, 28 Jun 2018 22:18:51 +0000 (16:18 -0600)]
Make dashboards uneditable by default
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 27 Jun 2018 22:58:42 +0000 (16:58 -0600)]
ceph-prometheus: Scrape iscsi gateway exporter
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 27 Jun 2018 22:57:44 +0000 (16:57 -0600)]
playbook: Install node_exporter on iscsi gateways
Signed-off-by: Zack Cerza <zack@redhat.com>
Paul Cuzner [Thu, 28 Jun 2018 21:58:35 +0000 (09:58 +1200)]
Added RGW GET/PUT Latencies
Added multiple charts showing GET/PUT latencies
at overview and RGW detail levels. In addition
the failed HTTP request panel has been changed
from a singlestat to a graph to visualize the
failure rates across all RGW instances.
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Paul Cuzner [Thu, 28 Jun 2018 04:47:03 +0000 (16:47 +1200)]
Updated to use OS metrics from default scrape job
All node_exporter scrapes are now done under the
same job (called node) so the dashboard now uses
an updated template query to identify the correct
host to pull out the OS metrics by iscsi gateway
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Zack Cerza [Tue, 26 Jun 2018 20:32:19 +0000 (14:32 -0600)]
ceph-cluster: Fix column styles for version tables
We were using 'id', not 'ceph_daemon' and as a result that column wasn't
showing up.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Tue, 26 Jun 2018 18:50:30 +0000 (12:50 -0600)]
Merge pull request #195 from zmc/wip-dashboard-fixes
Fix mon_server queries in network-usage-by-node and set home db
Zack Cerza [Tue, 26 Jun 2018 18:45:09 +0000 (12:45 -0600)]
Merge pull request #187 from ceph/wip-rpm
rpm: Update spec file for recent changes
Boris Ranto [Tue, 26 Jun 2018 18:20:43 +0000 (20:20 +0200)]
rpm: Modify node_exporter service name
Signed-off-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Fri, 22 Jun 2018 21:58:16 +0000 (15:58 -0600)]
ceph-grafana: Set the admin users's home db
If a human (or API request) changes the home dashboard for the admin
account, our method of setting it at the org level will no longer be
effective. Let's keep the admin user's home dashboard set to
ceph-at-a-glance.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 22 Jun 2018 18:57:00 +0000 (12:57 -0600)]
ceph-node-exporter: Fix inaccurate task name
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Fri, 22 Jun 2018 18:55:21 +0000 (12:55 -0600)]
network-usage-by-node: Fix mon_server queries
The variable queries needed to drop 'mon.' from the mon names, and the
panel queries needed '[[mon_servers]]' to be wrapped in parentheses.
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Wed, 6 Jun 2018 15:14:36 +0000 (17:14 +0200)]
rpm: Modify container name/version
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Wed, 23 May 2018 09:54:49 +0000 (11:54 +0200)]
rpm: Update spec file for recent changes
Signed-off-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Wed, 20 Jun 2018 07:44:32 +0000 (09:44 +0200)]
Merge pull request #191 from zmc/wip-osp-fixes
Fixes for OSP
Reviewed-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Wed, 20 Jun 2018 07:31:23 +0000 (09:31 +0200)]
Merge pull request #192 from zmc/unpin-testinfra
Unpin testinfra
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Wed, 13 Jun 2018 21:36:51 +0000 (15:36 -0600)]
Unpin testinfra
1.14.0 has been released with the fix we needed.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 7 Jun 2018 22:40:24 +0000 (16:40 -0600)]
node_exporter: Allow custom service name
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 6 Jun 2018 19:23:22 +0000 (13:23 -0600)]
ceph-mgr: Cope with differently-named containers
We were expecting ceph-mgr@hostname, but let's also look for
ceph-mgr-hostname.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 6 Jun 2018 18:55:59 +0000 (12:55 -0600)]
Merge pull request #188 from zmc/wip-containers
containers: grafana version; systemd unit comments
Zack Cerza [Wed, 6 Jun 2018 18:55:44 +0000 (12:55 -0600)]
Merge pull request #190 from zmc/wip-db-fixes
Add some dashboard tests, and make them pass
Zack Cerza [Thu, 31 May 2018 18:34:03 +0000 (12:34 -0600)]
dashboards/network-usage-by-node: Fix mon_hosts
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 31 May 2018 18:27:12 +0000 (12:27 -0600)]
dashboards: Remove cruft
Specifically, references to collectd, influxdb,
and hostnames.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 30 May 2018 19:46:32 +0000 (13:46 -0600)]
dashboards: Drop 'DS_LOCAL' templating
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 30 May 2018 21:01:50 +0000 (15:01 -0600)]
Add some dashboard tests
Signed-off-by: Zack Cerza <zack@redhat.com>
Boris Ranto [Thu, 24 May 2018 09:44:09 +0000 (11:44 +0200)]
Merge pull request #179 from zmc/wip-drop-use_epel
Drop use_epel flag
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Wed, 23 May 2018 22:50:42 +0000 (16:50 -0600)]
Warn against modifying container systemd units
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 23 May 2018 22:48:47 +0000 (16:48 -0600)]
ceph-grafana: Default to 'latest' container version
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Wed, 23 May 2018 20:05:45 +0000 (14:05 -0600)]
Merge pull request #183 from ceph/dashboard-refresh
Dashboard refresh
Paul Cuzner [Tue, 22 May 2018 23:51:36 +0000 (11:51 +1200)]
Dashboard uid metadata field removed
The presence of uid in the dashboard definition is causing
compatibility issues with moving to Grafana v5.1. By removing the
uid entry, the dashboards still work and can be migrated to 5.1
Paul Cuzner [Tue, 22 May 2018 23:43:03 +0000 (11:43 +1200)]
include pg unknown in pg state table
Paul Cuzner [Tue, 22 May 2018 23:42:18 +0000 (11:42 +1200)]
Include pg_unknown in PG status pie chart
Paul Cuzner [Tue, 22 May 2018 23:14:11 +0000 (11:14 +1200)]
Add RAM usage to OSD Node detail dashboard
Paul Cuzner [Wed, 16 May 2018 08:41:16 +0000 (20:41 +1200)]
Changes to health history and inclusion of pg information
Paul Cuzner [Wed, 16 May 2018 07:54:04 +0000 (19:54 +1200)]
Added pg distribution and pg per osd tables
Paul Cuzner [Wed, 16 May 2018 04:04:20 +0000 (16:04 +1200)]
add alert for high pgnums on an OSD
Paul Cuzner [Tue, 22 May 2018 23:21:06 +0000 (11:21 +1200)]
Update prometheus source name (align with ansible deployment)
Paul Cuzner [Mon, 14 May 2018 04:17:09 +0000 (16:17 +1200)]
capacity units updated for consistency
Paul Cuzner [Mon, 14 May 2018 04:15:40 +0000 (16:15 +1200)]
Drop encrypted osd panel, add dev-class piechart
Boris Ranto [Tue, 22 May 2018 12:32:56 +0000 (14:32 +0200)]
Merge pull request #181 from zmc/wip-177
Work around #177
Reviewed-by: Boris Ranto <branto@redhat.com>
Boris Ranto [Tue, 22 May 2018 12:25:39 +0000 (14:25 +0200)]
Merge pull request #185 from zmc/wip-grafana-fixes
Support Grafana 5.1.x
Reviewed-by: Boris Ranto <branto@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:56:10 +0000 (16:56 -0600)]
ceph-grafana: Restart container when creating
This is necessary so we can move to newer versions
of the container image.
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:56:00 +0000 (16:56 -0600)]
ceph-grafana: Use the correct UID for 5.1.x
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Mon, 21 May 2018 22:55:09 +0000 (16:55 -0600)]
ceph-grafana: Bump container version
Signed-off-by: Zack Cerza <zack@redhat.com>
Zack Cerza [Thu, 17 May 2018 17:29:10 +0000 (11:29 -0600)]
Merge pull request #167 from zmc/wip-mon-db-host
Monitor the dashboard host
Zack Cerza [Thu, 17 May 2018 17:28:56 +0000 (11:28 -0600)]
Merge pull request #182 from zmc/wip-bluestore-selinux
ceph-collectd: Only set SEL context on filestore
Zack Cerza [Thu, 10 May 2018 22:55:05 +0000 (16:55 -0600)]
ceph-collectd: Set SEL context on bluestore
... as well as filestore.
Signed-off-by: Zack Cerza <zack@redhat.com>