Paul Cuzner [Wed, 11 Oct 2017 04:14:25 +0000 (17:14 +1300)]
osd-node-detail: updated to correct disk units
All disk units now showing as decimal not binary values. In addition
help text updated on the raw capacity panel to better explain how the
value is derived. BZ1496186
Boris Ranto [Fri, 6 Oct 2017 10:00:25 +0000 (12:00 +0200)]
ansible: Fix merge_vars.yml
Currently, we override the variables in merge_vars. However, if we run
the script several times (e.g. a host is a grafana and a collectd node)
then vars[item] is defined but it can be that it is not a mapping. In
this case we redefine the value to empty string breaking the actual
values of the variables.
We redefine e.g. devel_mode or use_epel to empty string this way making
it false for the grafana server node that is both grafana and the
collectd node.
Zack Cerza [Thu, 5 Oct 2017 19:43:57 +0000 (13:43 -0600)]
Use --fake-initial for migrations if necessary
Migrations won't work correctly if the db already exists. This manifests
in an error like:
django.db.utils.OperationalError: table "django_content_type" already exists
Zack Cerza [Wed, 4 Oct 2017 21:23:41 +0000 (15:23 -0600)]
Add a note about waiting to collect data
Users might initially be confused that immediately after deployment, the
dashboard looks broken. This is because it doesn't yet have the data it
needs to function.
Boris Ranto [Mon, 2 Oct 2017 09:29:01 +0000 (11:29 +0200)]
ansible: Do not enable rhsm repos
We are shipping as a regular product and as such, we cannot enable
additional repos via rhsm. The customers might even not have these repos
installed (especially the storage console repo might not be available to
them).
If we need any of the packages from these repos, we need to cross-ship
them in our product (as we already do downstream).
Paul Cuzner [Tue, 12 Sep 2017 23:57:41 +0000 (11:57 +1200)]
iscsi-overview: multiple panel fixes
Values were shown in correctly in environments where the iscsi config had
been dropped and recreated. This update addresses issues in the following
panels; path summary, unused LUNs, defined capacity. In addition the
client charts only show entries for clients with i/o or load > 0.
Paul Cuzner [Wed, 23 Aug 2017 21:14:44 +0000 (09:14 +1200)]
mon: simplify the admin_socket read logic
The initial commit placed logic in each area that called the admin
socket. This patch separates the admin socket call out to a separate
method, so it gets checked in one place.
Paul Cuzner [Wed, 23 Aug 2017 02:58:34 +0000 (14:58 +1200)]
rgw: look for the admin_socket on each call
The admin_socket name for rgw is not fixed, unlike mon/osds. Therefore
to account for svc restarts and name changes the socket name is
determined at each get_stats cycle. If the socket isn't there, the
collector just passes back the version of radosgw to the caller and
will send stats again once a socket is detected on the host
Paul Cuzner [Wed, 23 Aug 2017 02:56:37 +0000 (14:56 +1200)]
mon: account for null dict from _admin_socket
the _admin_socket method could return a null dict if the
socket is not there (i.e. ceph-mon is down). By checking for the
empty dict, the collector can remain active while ceph-mon is
stopped and restarted during normal maintenance processes on a
host.
Paul Cuzner [Wed, 23 Aug 2017 02:54:15 +0000 (14:54 +1200)]
iscsi: trigger stats only when iscsi is active
look for the iscsi dir in sysfs to determine when to
send the iscsi stats. If the iscsi base dir is not there
the collector will just send the version of gwcli
Paul Cuzner [Mon, 21 Aug 2017 04:57:16 +0000 (16:57 +1200)]
dashboard query update to filter out old OSDs
Old OSDs will still exist in the TSDB, and could show as out or down.
The update uses transformNull to pick out osds with null values and
filter them out of the results shown.
Paul Cuzner [Mon, 21 Aug 2017 01:36:01 +0000 (13:36 +1200)]
osd-information: fixes for null entries and time windows used on pie-charts
osd's that fail result in nulls in the data series, so queries updated to
account for this gaps. In addition, a time window if 2 mins used to
restrict the obs that have to be grabbed from graphite for the pie charts
Paul Cuzner [Tue, 8 Aug 2017 21:48:55 +0000 (09:48 +1200)]
iscsi.py: fix to defer the import of rtslib_fb
the goal of the parent module cephmetrics is to be generic across the
different ceph roles. By deferring the import of rtslib to the instantiation
of the first (and only!) ISCSIGateway object cephmetrics can import this
iscsi module without a problem regardless of the runtime environment.
Paul Cuzner [Mon, 7 Aug 2017 23:43:44 +0000 (11:43 +1200)]
iscsi-overview: minor fixes and rename of the iscsi_gateways variable
The client configuration panel was not excluding null entries, so when
rbd get unmasked from clients and not reused, they would still show up
in the table
In addition the templating variable iscsi_gateway was renamed to iscsi_gateways
aligning to the naming of the osd_servers and rgw_servers
Paul Cuzner [Mon, 7 Aug 2017 23:41:50 +0000 (11:41 +1200)]
dahsboard.yml : updated to show the variable needed by the iscsi dashboard
the iscsi_gateways templating variable is used to generate the graphs,
so for iscsi based deployments this variable will need to be defined to
ensure the queries work correctly in grafana
Paul Cuzner [Mon, 7 Aug 2017 00:20:10 +0000 (12:20 +1200)]
cephmetrics.py: updated to detect and collect stats for iSCSI gateways
The probe method now looks for the sysfs kernel entries that denote an
iscsi gateway is running on the node. When this dir is found an instance
of the iscsi collector (ISCSIGateway) is created and polled during
every read callback.
Paul Cuzner [Thu, 3 Aug 2017 04:53:57 +0000 (16:53 +1200)]
osd-information: minor fixes for larger environments
In a 600+ OSD environment the charts were based on averageSeries which
was taking a long time. This has now been changed, so the comparison
chart only shows current values for a given OSD for comparison