Scenario: "Test OSD"
Given the following series:
| metrics | values |
- | ceph_osd_metadata{back_iface="",ceph_daemon="osd.0",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
- | ceph_osd_metadata{back_iface="",ceph_daemon="osd.1",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
- | ceph_osd_metadata{back_iface="",ceph_daemon="osd.2",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
+ | ceph_osd_metadata{job="ceph",back_iface="",ceph_daemon="osd.0",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
+ | ceph_osd_metadata{job="ceph",back_iface="",ceph_daemon="osd.1",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
+ | ceph_osd_metadata{job="ceph",back_iface="",ceph_daemon="osd.2",cluster_addr="192.168.1.12",device_class="hdd",front_iface="",hostname="127.0.0.1",objectstore="bluestore",public_addr="192.168.1.12",ceph_version="ceph version 17.0.0-8967-g6932a4f702a (6932a4f702a0d557fc36df3ca7a3bca70de42667) quincy (dev)"} | 1.0 |
When variable `ceph_hosts` is `127.0.0.1`
Then Grafana panel `OSDs` with legend `EMPTY` shows:
| metrics | values |
Scenario: "Test Disk IOPS - Writes - Several OSDs per device"
Given the following series:
| metrics | values |
- | node_disk_writes_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_writes_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0 osd.1 osd.2",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.3 osd.4 osd.5",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_writes_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_writes_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0 osd.1 osd.2",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.3 osd.4 osd.5",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk IOPS` with legend `{{device}}({{ceph_daemon}}) writes` shows:
| metrics | values |
- | {ceph_daemon="osd.0 osd.1 osd.2", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.3 osd.4 osd.5", device="sdb", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.0 osd.1 osd.2", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.3 osd.4 osd.5", device="sdb", instance="localhost"} | 1 |
Scenario: "Test Disk IOPS - Writes - Single OSD per device"
Given the following series:
| metrics | values |
- | node_disk_writes_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_writes_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_writes_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_writes_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk IOPS` with legend `{{device}}({{ceph_daemon}}) writes` shows:
| metrics | values |
- | {ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.1", device="sdb", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.1", device="sdb", instance="localhost"} | 1 |
Scenario: "Test Disk IOPS - Reads - Several OSDs per device"
Given the following series:
| metrics | values |
- | node_disk_reads_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_reads_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0 osd.1 osd.2",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.3 osd.4 osd.5",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_reads_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_reads_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0 osd.1 osd.2",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.3 osd.4 osd.5",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk IOPS` with legend `{{device}}({{ceph_daemon}}) reads` shows:
| metrics | values |
- | {ceph_daemon="osd.0 osd.1 osd.2", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.3 osd.4 osd.5", device="sdb", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.0 osd.1 osd.2", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.3 osd.4 osd.5", device="sdb", instance="localhost"} | 1 |
Scenario: "Test Disk IOPS - Reads - Single OSD per device"
Given the following series:
| metrics | values |
- | node_disk_reads_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_reads_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_reads_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_reads_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk IOPS` with legend `{{device}}({{ceph_daemon}}) reads` shows:
| metrics | values |
- | {ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.1", device="sdb", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.1", device="sdb", instance="localhost"} | 1 |
# IOPS Panel - end
Scenario: "Test disk throughput - read"
Given the following series:
| metrics | values |
- | node_disk_read_bytes_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_read_bytes_total{device="sdb",instance="localhost:9100"} | 100+600x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_read_bytes_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_read_bytes_total{job="ceph",device="sdb",instance="localhost:9100"} | 100+600x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Throughput by Disk` with legend `{{device}}({{ceph_daemon}}) read` shows:
| metrics | values |
- | {ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.1", device="sdb", instance="localhost"} | 10 |
+ | {job="ceph",ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.1", device="sdb", instance="localhost"} | 10 |
Scenario: "Test disk throughput - write"
Given the following series:
| metrics | values |
- | node_disk_written_bytes_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_written_bytes_total{device="sdb",instance="localhost:9100"} | 100+600x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_written_bytes_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_written_bytes_total{job="ceph",device="sdb",instance="localhost:9100"} | 100+600x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Throughput by Disk` with legend `{{device}}({{ceph_daemon}}) write` shows:
| metrics | values |
- | {ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
- | {ceph_daemon="osd.1", device="sdb", instance="localhost"} | 10 |
+ | {job="ceph",ceph_daemon="osd.0", device="sda", instance="localhost"} | 1 |
+ | {job="ceph",ceph_daemon="osd.1", device="sdb", instance="localhost"} | 10 |
# Node disk bytes written/read panel - end
Scenario: "Test $ceph_hosts Disk Latency panel"
Given the following series:
| metrics | values |
- | node_disk_write_time_seconds_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_write_time_seconds_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | node_disk_writes_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_writes_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | node_disk_read_time_seconds_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_read_time_seconds_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | node_disk_reads_completed_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_reads_completed_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_write_time_seconds_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_write_time_seconds_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_writes_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_writes_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_read_time_seconds_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_read_time_seconds_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_reads_completed_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_reads_completed_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk Latency` with legend `{{device}}({{ceph_daemon}})` shows:
| metrics | values |
Scenario: "Test $ceph_hosts Disk utilization"
Given the following series:
| metrics | values |
- | node_disk_io_time_seconds_total{device="sda",instance="localhost:9100"} | 10+60x1 |
- | node_disk_io_time_seconds_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | node_disk_io_time_seconds_total{job="ceph",device="sda",instance="localhost:9100"} | 10+60x1 |
+ | node_disk_io_time_seconds_total{job="ceph",device="sdb",instance="localhost:9100"} | 10+60x1 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `ceph_hosts` is `localhost`
Then Grafana panel `$ceph_hosts Disk utilization` with legend `{{device}}({{ceph_daemon}})` shows:
| metrics | values |
- | {ceph_daemon="osd.0", device="sda", instance="localhost"} | 100 |
- | {ceph_daemon="osd.1", device="sdb", instance="localhost"} | 100 |
+ | {job="ceph",ceph_daemon="osd.0", device="sda", instance="localhost"} | 100 |
+ | {job="ceph",ceph_daemon="osd.1", device="sdb", instance="localhost"} | 100 |
| node_disk_io_time_seconds_total{device="sda",instance="localhost:9100"} | 10+60x1 |
| node_disk_io_time_seconds_total{device="sdb",instance="localhost:9100"} | 10+60x1 |
| node_disk_io_time_seconds_total{device="sdc",instance="localhost:9100"} | 10 2000 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd_hosts` is `localhost`
Then Grafana panel `AVG Disk Utilization` with legend `EMPTY` shows:
| metrics | values |
| node_disk_reads_completed_total{device="sdb",instance="localhost"} | 10 60 |
| node_disk_read_time_seconds_total{device="sda",instance="localhost"} | 100 600 |
| node_disk_read_time_seconds_total{device="sdb",instance="localhost"} | 100 600 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device Latency for $osd` with legend `{{instance}}/{{device}} Reads` shows:
| metrics | values |
| node_disk_writes_completed_total{device="sdb",instance="localhost"} | 10 60 |
| node_disk_write_time_seconds_total{device="sda",instance="localhost"} | 100 600 |
| node_disk_write_time_seconds_total{device="sdb",instance="localhost"} | 100 600 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device Latency for $osd` with legend `{{instance}}/{{device}} Writes` shows:
| metrics | values |
| metrics | values |
| node_disk_writes_completed_total{device="sda",instance="localhost"} | 10 100 |
| node_disk_writes_completed_total{device="sdb",instance="localhost"} | 10 100 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device R/W IOPS for $osd` with legend `{{device}} on {{instance}} Writes` shows:
| metrics | values |
| metrics | values |
| node_disk_reads_completed_total{device="sda",instance="localhost"} | 10 100 |
| node_disk_reads_completed_total{device="sdb",instance="localhost"} | 10 100 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device R/W IOPS for $osd` with legend `{{device}} on {{instance}} Reads` shows:
| metrics | values |
| metrics | values |
| node_disk_reads_completed_total{device="sda",instance="localhost"} | 10 100 |
| node_disk_reads_completed_total{device="sdb",instance="localhost"} | 10 100 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device R/W IOPS for $osd` with legend `{{device}} on {{instance}} Reads` shows:
| metrics | values |
| metrics | values |
| node_disk_writes_completed_total{device="sda",instance="localhost"} | 10 100 |
| node_disk_writes_completed_total{device="sdb",instance="localhost"} | 10 100 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device R/W IOPS for $osd` with legend `{{device}} on {{instance}} Writes` shows:
| metrics | values |
Given the following series:
| metrics | values |
| node_disk_io_time_seconds_total{device="sda",instance="localhost:9100"} | 10 100 |
- | ceph_disk_occupation_human{ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
- | ceph_disk_occupation_human{ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.0",device="/dev/sda",instance="localhost:9283"} | 1.0 |
+ | ceph_disk_occupation_human{job="ceph",ceph_daemon="osd.1",device="/dev/sdb",instance="localhost:9283"} | 1.0 |
When variable `osd` is `osd.0`
Then Grafana panel `Physical Device Util% for $osd` with legend `{{device}} on {{instance}}` shows:
| metrics | values |
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `$rgw_servers GET/PUT Latencies` with legend `GET {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance_id="58892247"} | 2.5000000000000004 |
+ | {ceph_daemon="rgw.foo", instance_id="58892247"} | 1.5 |
Scenario: "Test $rgw_servers GET/PUT Latencies - PUT"
Given the following series:
And variable `rgw_servers` is `rgw.1`
Then Grafana panel `Bandwidth by HTTP Operation` with legend `GETs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.1", instance="127.0.0.1", instance_id="92806566", job="ceph"} | 1.6666666666666667 |
+ | {ceph_daemon="rgw.1", instance="127.0.0.1", instance_id="92806566", job="ceph"} | 1.5 |
Scenario: "Test Bandwidth by HTTP Operation - PUT"
Given the following series:
And variable `rgw_servers` is `rgw.1`
Then Grafana panel `Bandwidth by HTTP Operation` with legend `PUTs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.1", instance="127.0.0.1", instance_id="92806566", job="ceph"} | 1 |
+ | {ceph_daemon="rgw.1", instance="127.0.0.1", instance_id="92806566", job="ceph"} | 7.5E-01 |
Scenario: "Test HTTP Request Breakdown - Requests Failed"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `HTTP Request Breakdown` with legend `Requests Failed {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 6.666666666666667e-02 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1E-01 |
Scenario: "Test HTTP Request Breakdown - GET"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `HTTP Request Breakdown` with legend `GETs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | .6666666666666666 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1.1666666666666667 |
Scenario: "Test HTTP Request Breakdown - PUT"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `HTTP Request Breakdown` with legend `PUTs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 2.3333333333333335 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1.5 |
Scenario: "Test HTTP Request Breakdown - Other"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `Workload Breakdown` with legend `Failures {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 6.666666666666667e-02 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1E-01 |
Scenario: "Test Workload Breakdown - GETs"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `Workload Breakdown` with legend `GETs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | .6666666666666666 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1.1666666666666667 |
Scenario: "Test Workload Breakdown - PUTs"
Given the following series:
And variable `rgw_servers` is `rgw.foo`
Then Grafana panel `Workload Breakdown` with legend `PUTs {{ceph_daemon}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 2.3333333333333335 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1.5 |
Scenario: "Test Workload Breakdown - Other"
Given the following series:
When interval is `30s`
Then Grafana panel `Average GET/PUT Latencies` with legend `GET AVG` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo",instance="127.0.0.1", instance_id="58892247", job="ceph"} | 2.5000000000000004 |
+ | {ceph_daemon="rgw.foo",instance="127.0.0.1", instance_id="58892247", job="ceph"} | 1.5 |
Scenario: "Test Average PUT Latencies"
Given the following series:
When interval is `30s`
Then Grafana panel `Total Requests/sec by RGW Instance` with legend `{{rgw_host}}` shows:
| metrics | values |
- | {rgw_host="1"} | 1.6666666666666667 |
+ | {rgw_host="1"} | 1.5 |
Scenario: "Test GET Latencies by RGW Instance"
Given the following series:
When interval is `30s`
Then Grafana panel `GET Latencies by RGW Instance` with legend `{{rgw_host}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph", rgw_host="foo"} | 2.5000000000000004 |
+ | {ceph_daemon="rgw.foo", instance="127.0.0.1", instance_id="58892247", job="ceph", rgw_host="foo"} | 1.5 |
Scenario: "Test Bandwidth Consumed by Type- GET"
Given the following series:
And interval is `30s`
Then Grafana panel `Bandwidth Consumed by Type` with legend `GETs` shows:
| metrics | values |
- | {} | 1.6666666666666667 |
+ | {} | 1.5 |
Scenario: "Test Bandwidth Consumed by Type- PUT"
Given the following series:
And interval is `30s`
Then Grafana panel `Bandwidth Consumed by Type` with legend `PUTs` shows:
| metrics | values |
- | {} | 1 |
+ | {} | 7.5E-01 |
Scenario: "Test Bandwidth by RGW Instance"
Given the following series:
And interval is `30s`
Then Grafana panel `Bandwidth by RGW Instance` with legend `{{rgw_host}}` shows:
| metrics | values |
- | {ceph_daemon="rgw.1", instance_id="92806566", rgw_host="1"} | 2.666666666666667 |
+ | {ceph_daemon="rgw.1", instance_id="92806566", rgw_host="1"} | 2.25 |
Scenario: "Test PUT Latencies by RGW Instance"
Given the following series:
Scenario: "Test Total backend responses by HTTP code"
Given the following series:
| metrics | values |
- | haproxy_backend_http_responses_total{code="200",instance="ingress.rgw.1",proxy="backend"} | 10 100 |
- | haproxy_backend_http_responses_total{code="404",instance="ingress.rgw.1",proxy="backend"} | 20 200 |
+ | haproxy_backend_http_responses_total{job="haproxy",code="200",instance="ingress.rgw.1",proxy="backend"} | 10 100 |
+ | haproxy_backend_http_responses_total{job="haproxy",code="404",instance="ingress.rgw.1",proxy="backend"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
When variable `code` is `200`
Then Grafana panel `Total responses by HTTP code` with legend `Backend {{ code }}` shows:
Scenario: "Test Total frontend responses by HTTP code"
Given the following series:
| metrics | values |
- | haproxy_frontend_http_responses_total{code="200",instance="ingress.rgw.1",proxy="frontend"} | 10 100 |
- | haproxy_frontend_http_responses_total{code="404",instance="ingress.rgw.1",proxy="frontend"} | 20 200 |
+ | haproxy_frontend_http_responses_total{job="haproxy",code="200",instance="ingress.rgw.1",proxy="frontend"} | 10 100 |
+ | haproxy_frontend_http_responses_total{job="haproxy",code="404",instance="ingress.rgw.1",proxy="frontend"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
When variable `code` is `200`
Then Grafana panel `Total responses by HTTP code` with legend `Frontend {{ code }}` shows:
Scenario: "Test Total http frontend requests by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_http_requests_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_http_requests_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_http_requests_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_http_requests_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Requests` shows:
| metrics | values |
Scenario: "Test Total backend response errors by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_response_errors_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_response_errors_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_response_errors_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_response_errors_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Response errors` shows:
| metrics | values |
Scenario: "Test Total frontend requests errors by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_request_errors_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_request_errors_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_request_errors_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_request_errors_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Requests errors` shows:
| metrics | values |
Scenario: "Test Total backend redispatch warnings by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_redispatch_warnings_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_redispatch_warnings_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_redispatch_warnings_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_redispatch_warnings_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Backend redispatch` shows:
| metrics | values |
Scenario: "Test Total backend retry warnings by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_retry_warnings_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_retry_warnings_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_retry_warnings_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_retry_warnings_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Backend retry` shows:
| metrics | values |
Scenario: "Test Total frontend requests denied by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_requests_denied_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_requests_denied_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_requests_denied_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_requests_denied_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Request denied` shows:
| metrics | values |
Scenario: "Test Total backend current queue by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_current_queue{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_current_queue{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_current_queue{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_current_queue{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total requests / responses` with legend `Backend Queued` shows:
| metrics | values |
Scenario: "Test Total frontend connections by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_connections_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_connections_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_connections_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_connections_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total number of connections` with legend `Front` shows:
| metrics | values |
Scenario: "Test Total backend connections attempts by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_connection_attempts_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_connection_attempts_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_connection_attempts_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_connection_attempts_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total number of connections` with legend `Back` shows:
| metrics | values |
Scenario: "Test Total backend connections error by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_connection_errors_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_connection_errors_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_connection_errors_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_connection_errors_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Total number of connections` with legend `Back errors` shows:
| metrics | values |
Scenario: "Test Total frontend bytes incoming by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_bytes_in_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_bytes_in_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_bytes_in_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_bytes_in_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Current total of incoming / outgoing bytes` with legend `IN Front` shows:
| metrics | values |
Scenario: "Test Total frontend bytes outgoing by instance"
Given the following series:
| metrics | values |
- | haproxy_frontend_bytes_out_total{proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_frontend_bytes_out_total{proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_frontend_bytes_out_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_frontend_bytes_out_total{job="haproxy",proxy="frontend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Current total of incoming / outgoing bytes` with legend `OUT Front` shows:
| metrics | values |
Scenario: "Test Total backend bytes incoming by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_bytes_in_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_bytes_in_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_bytes_in_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_bytes_in_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Current total of incoming / outgoing bytes` with legend `IN Back` shows:
| metrics | values |
Scenario: "Test Total backend bytes outgoing by instance"
Given the following series:
| metrics | values |
- | haproxy_backend_bytes_out_total{proxy="backend",instance="ingress.rgw.1"} | 10 100 |
- | haproxy_backend_bytes_out_total{proxy="backend",instance="ingress.rgw.1"} | 20 200 |
+ | haproxy_backend_bytes_out_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 10 100 |
+ | haproxy_backend_bytes_out_total{job="haproxy",proxy="backend",instance="ingress.rgw.1"} | 20 200 |
When variable `ingress_service` is `ingress.rgw.1`
Then Grafana panel `Current total of incoming / outgoing bytes` with legend `OUT Back` shows:
| metrics | values |
data['stats'][str(file)] = {'total': 0, 'tested': 0}
add_dashboard_queries(data, dashboard_data, str(file))
add_dashboard_variables(data, dashboard_data)
+ add_default_dashboards_variables(data)
return data
if 'name' in variable:
data['variables'][variable['name']] = 'UNSET VARIABLE'
+def add_default_dashboards_variables(data: Dict[str, Any]) -> None:
+ data['variables']['job'] = 'ceph'
+ data['variables']['job_haproxy'] = 'haproxy'
+ data['variables']['__rate_interval'] = '1m'
def replace_grafana_expr_variables(expr: str, variable: str, value: Any) -> str:
""" Replace grafana variables in expression with a value