git-server-git.apps.pok.os.sepia.ceph.com Git

author	Kefu Chai <k.chai@proxmox.com>
	Sat, 4 Jul 2026 00:54:17 +0000 (08:54 +0800)
committer	Kefu Chai <k.chai@proxmox.com>
	Sat, 4 Jul 2026 01:40:07 +0000 (09:40 +0800)
commit	89cb0fc152ba3c7719353f51d970ff2b8b910293
tree	e577103d32f37782056b99694379f02fb1ffca6d	tree \| snapshot
parent	c11c7144fc77e331fa0b23a5298742bfe68afdd8	commit \| diff

monitoring/ceph-mixin: scope test queries per dashboard

get_dashboards_data() keeps every query in a single map keyed by
"<panel title>-<legendFormat>", but that id is not unique: several
dashboards have a panel titled "IOPS", "OSDs" or "Throughput". The
dashboard read last overwrites the earlier entry, so one query shadows
another and never gets tested. glob() walks the files in filesystem
order, so which query survives, and whether the test passes, depends on
readdir.

run-tox-promql-query-test hit this in "Test IOPS Read"
(ceph-cluster.feature): the scenario feeds ceph_osd_op_r but the id
resolved to a CephFS pool panel.

    FAILED:
      expr: "sum(rate(ceph_pool_rd{cluster=~"mycluster|",  pool_id=~"UNSET VARIABLE"}[1m]))", time: 1m,
          exp:"{} 2.5E+01"
          got:"nil"
    HOOK-ERROR in after_scenario: AssertionError:

pool_id is "UNSET VARIABLE" because the scenario does not set $mdatapool,
and no ceph_pool_rd series were fed, so the query returns nil. It only
fails when readdir returns cephfsdashboard after ceph-cluster-advanced,
so the earlier change that walked nested rows and made the winner
deterministic did not fix it; it fixed which query wins, not that the
wrong one can win.

Key the queries per dashboard (data['queries'][<dashboard>][<id>]) and
have each .feature name its dashboard in a Background step. The lookup is
confined to that dashboard, so a title/legend used elsewhere no longer
shadows it. A duplicate within one dashboard is still an error, unless it
is in a collapsed row.

ceph-cluster.feature covers ceph-cluster-advanced; the rest map to one
dashboard each.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

monitoring/ceph-mixin/tests_dashboards/features/ceph-cluster.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/environment.py		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/host-details.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/hosts_overview.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/osd-device-details.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/osds-overview.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/radosgw-detail.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/features/radosgw_overview.feature		diff \| blob \| history
monitoring/ceph-mixin/tests_dashboards/util.py		diff \| blob \| history