From: Zack Cerza Date: Thu, 9 Mar 2023 21:43:57 +0000 (-0700) Subject: docs: Add new document for teuthology-exporter X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=baff8c4dea3c75d757faf27317c678b15fd7c573;p=teuthology.git docs: Add new document for teuthology-exporter Signed-off-by: Zack Cerza --- diff --git a/docs/exporter.rst b/docs/exporter.rst new file mode 100644 index 0000000000..fb729a82c8 --- /dev/null +++ b/docs/exporter.rst @@ -0,0 +1,67 @@ +.. _exporter: + +================================== +The Teuthology Prometheus Exporter +================================== + +To help make it easier to determine the status of the lab, we've created a +`Prometheus `__ exporter (helpfully named +`teuthology-exporter`. We use `Grafana `__ to visualize +the data we collect. + +It listens on port 61764, and scrapes every 60 seconds by default. + + +Exposed Metrics +=============== + +.. list-table:: + + * - Name + - Type + - Description + - Labels + * - beanstalk_queue_length + - Gauge + - The number of jobs in the beanstalkd queue + - machine type + * - beanstalk_queue_paused + - Gauge + - Whether or not the beanstalkd queue is paused + - machine type + * - teuthology_dispatchers + - Gauge + - The number of running teuthology-dispatcher instances + - machine type + * - teuthology_job_processes + - Gauge + - The number of running job *processes* + - + * - teuthology_job_results_total + - Gauge + - The number of completed jobs + - status (pass/fail/dead) + * - teuthology_nodes + - Gauge + - The number of test nodes + - up, locked + * - teuthology_job_duration_seconds + - Summary + - The time it took to run a job + - suite + * - teuthology_task_duration_seconds + - Summary + - The time it took for each phase of each task to run + - name, phase (enter/exit) + * - teuthology_bootstrap_duration_seconds + - Summary + - The time it took to run teuthology's bootstrap script + - + * - teuthology_node_locking_duration_seconds + - Summary + - The time it took to lock nodes + - machine type, count + * - teuthology_node_reimaging_duration_seconds + - Summary + - The time it took to reimage nodes + - machine type, count diff --git a/docs/index.rst b/docs/index.rst index a218ae781a..82db430f5d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,6 +14,7 @@ Content Index downburst_vms.rst INSTALL.rst LAB_SETUP.rst + exporter.rst commands/list.rst ChangeLog.rst