]> git.apps.os.sepia.ceph.com Git - teuthology.git/log
teuthology.git
21 months agoMerge pull request #1902 from ceph/dispatcher-quiet
Dan Mick [Tue, 28 Nov 2023 23:28:51 +0000 (15:28 -0800)]
Merge pull request #1902 from ceph/dispatcher-quiet

dispatcher: Dont spam the journal

21 months agoMerge pull request #1792 from VallariAg/unittest-xml-scanner
Zack Cerza [Mon, 27 Nov 2023 23:25:30 +0000 (16:25 -0700)]
Merge pull request #1792 from VallariAg/unittest-xml-scanner

orch/run: Add unit test xml scanner

21 months agoutil/scanner: add UnitTestScanner.num_of_total_failures 1792/head
Vallari Agrawal [Fri, 27 Oct 2023 08:58:18 +0000 (14:28 +0530)]
util/scanner: add UnitTestScanner.num_of_total_failures

In UnitTestScanner's final error message, add total count of failures
before the first error occurance, like "(total x failed) <message>".
Another minor change: add "..." if the failure reason is more than 200 chars.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
21 months agoadd utils/tests/test_scanner.py
Vallari Agrawal [Tue, 19 Sep 2023 15:04:28 +0000 (20:34 +0530)]
add utils/tests/test_scanner.py

and test_run_unit_test in test_remote.py

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
21 months agoadd Scanner, UnitTestScanner, ValgrindScanner
Vallari Agrawal [Sat, 1 Oct 2022 11:16:52 +0000 (16:46 +0530)]
add Scanner, UnitTestScanner, ValgrindScanner

1. add 'run_unit_test' to Remote
2. create util/scanner.py
3. new exception: UnitTestError
4. add `lxml` dependency in setup.cfg

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
22 months agosupervisor: Drop job output 1902/head
Zack Cerza [Fri, 17 Nov 2023 20:48:38 +0000 (13:48 -0700)]
supervisor: Drop job output

It gets logged to its own file in the job archive.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agoMerge pull request #1901 from ceph/fog-debug-quieter
Dan Mick [Fri, 17 Nov 2023 20:43:33 +0000 (12:43 -0800)]
Merge pull request #1901 from ceph/fog-debug-quieter

fog: Drop request debug logging

22 months agodispatcher: Drop supervisor output
Zack Cerza [Fri, 17 Nov 2023 20:34:21 +0000 (13:34 -0700)]
dispatcher: Drop supervisor output

It gets logged to its own file in the job archive.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agofog: Drop request debug logging 1901/head
Zack Cerza [Fri, 17 Nov 2023 20:22:58 +0000 (13:22 -0700)]
fog: Drop request debug logging

It's too noisy.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agoMerge pull request #1892 from ceph/devstack-simplified
Zack Cerza [Thu, 26 Oct 2023 19:05:51 +0000 (13:05 -0600)]
Merge pull request #1892 from ceph/devstack-simplified

Add containers/teuthology-dev

22 months agoAdd containers/teuthology-dev 1892/head
Zack Cerza [Tue, 26 Sep 2023 21:32:25 +0000 (14:32 -0700)]
Add containers/teuthology-dev

This is nearly identical to docs/docker-compose/teuthology, but with
some changes to better work with ceph-devstack. The bits in
docs/docker-compose should be able to be adapted easily to work with
this container.

Signed-off-by: Zack Cerza <zack@redhat.com>
23 months agoMerge PR #1895 into main
Patrick Donnelly [Tue, 17 Oct 2023 19:38:56 +0000 (15:38 -0400)]
Merge PR #1895 into main

* refs/pull/1895/head:
install/bin/stdin-killer: macOs (Darwin) compatibility

Reviewed-by: Zack Cerza <zack@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
23 months agoMerge pull request #1894 from VallariAg/fix-readthedocs-builds
Zack Cerza [Tue, 17 Oct 2023 16:16:46 +0000 (10:16 -0600)]
Merge pull request #1894 from VallariAg/fix-readthedocs-builds

fix readthedocs PR builds

23 months agoreadthedocs: fix 'The configuration key "build.image" is deprecated' 1894/head
Vallari Agrawal [Tue, 17 Oct 2023 09:16:42 +0000 (14:46 +0530)]
readthedocs: fix 'The configuration key "build.image" is deprecated'

builds are failing because support for deprecated “build.image” is
fully removed by readthedocs, need to use "build.os" instead.

ref: https://blog.readthedocs.com/use-build-os-config/
error: https://readthedocs.org/projects/teuthology/builds/22250705/

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
23 months agoinstall/bin/stdin-killer: macOs (Darwin) compatibility 1895/head
Leonid Usov [Tue, 17 Oct 2023 10:37:27 +0000 (13:37 +0300)]
install/bin/stdin-killer: macOs (Darwin) compatibility

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
23 months agoMerge pull request #1887 from ceph/paramiko-eoferror
Josh Durgin [Wed, 11 Oct 2023 16:42:32 +0000 (09:42 -0700)]
Merge pull request #1887 from ceph/paramiko-eoferror

orchestra: Tolerate EOFError during connect

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2 years agoMerge pull request #1888 from ceph/keyscan-tweak
Zack Cerza [Tue, 5 Sep 2023 23:29:55 +0000 (17:29 -0600)]
Merge pull request #1888 from ceph/keyscan-tweak

2 years agomisc._ssh_keyscan: Sort keys before returning any 1888/head
Zack Cerza [Tue, 5 Sep 2023 18:42:46 +0000 (12:42 -0600)]
misc._ssh_keyscan: Sort keys before returning any

ssh-keyscan's output is unsorted, so this function wasn't deterministic.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoorchestra: Move connection exception handling 1887/head
Zack Cerza [Thu, 31 Aug 2023 18:10:09 +0000 (11:10 -0700)]
orchestra: Move connection exception handling

... to inside the retry loop. Also, add an increment to the safe_while
instance we use.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoorchestra: Treat EOFError as SSHException
Zack Cerza [Thu, 31 Aug 2023 17:34:41 +0000 (10:34 -0700)]
orchestra: Treat EOFError as SSHException

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1886 from ceph/update-paramiko
Zack Cerza [Wed, 30 Aug 2023 17:34:16 +0000 (11:34 -0600)]
Merge pull request #1886 from ceph/update-paramiko

2 years agoMerge pull request #1884 from kamoltat/wip-ksirivad-fix-62445
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:59:27 +0000 (11:59 -0400)]
Merge pull request #1884 from kamoltat/wip-ksirivad-fix-62445

teuthology/scrape: Fix bad backtrace parsing in Teuthology.log
Reviewed-by Zack Cerza <zcerza@redhat.com>

2 years agoMerge pull request #1885 from ceph/fix-docs-build
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:44:58 +0000 (11:44 -0400)]
Merge pull request #1885 from ceph/fix-docs-build

Fix docs build
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 years agoUpdate paramiko 1886/head
Zack Cerza [Mon, 28 Aug 2023 20:07:19 +0000 (14:07 -0600)]
Update paramiko

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agotox: Avoid buggy sphinx versions 1885/head
Zack Cerza [Wed, 23 Aug 2023 19:24:56 +0000 (13:24 -0600)]
tox: Avoid buggy sphinx versions

See https://github.com/ceph/teuthology/pull/1884

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agosetup.cfg: Drop license_file
Zack Cerza [Wed, 23 Aug 2023 19:21:43 +0000 (13:21 -0600)]
setup.cfg: Drop license_file

It's deprecated in favor of `license_files`, but the default value is
sufficient.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoteuthology/scrape: Fix bad backtrace parsing in Teuthology.log 1884/head
Kamoltat Sirivadhna [Thu, 17 Aug 2023 16:28:00 +0000 (12:28 -0400)]
teuthology/scrape: Fix bad backtrace parsing in Teuthology.log

Problem:

- confusing warning message stating that
the back trace is malformed

- We kept adding to the backtrace buffer
even when we exceeded the `MAX_BT_LINES`

Solution:

- Correct the warning message to be
"Ignoring backtrace that exceeds MAX_BT_LINES"
- reset the buffer once we exceeded MAX_BT_LINES
- Added some cases where we detect start/end of back trace.

Fixes:https://tracker.ceph.com/issues/62445

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 years agoMerge pull request #1883 from ceph/nuke-desc-typeerror
Dan Mick [Tue, 15 Aug 2023 19:40:36 +0000 (12:40 -0700)]
Merge pull request #1883 from ceph/nuke-desc-typeerror

nuke: Avoid a TypeError w/ null node description

2 years agonuke: Avoid a TypeError w/ null node description 1883/head
Zack Cerza [Tue, 15 Aug 2023 18:05:38 +0000 (12:05 -0600)]
nuke: Avoid a TypeError w/ null node description

This avoids a `TypeError: argument of type 'NoneType' is not iterable`
when nuking a node whose description is None.

ex: https://sentry.ceph.com/share/issue/91172146663f4c71a6cbfe43725b2e07/

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1882 from ceph/sentry-reimage-taskname
Dan Mick [Mon, 14 Aug 2023 20:34:08 +0000 (13:34 -0700)]
Merge pull request #1882 from ceph/sentry-reimage-taskname

supervisor.reimage(): Improve Sentry reporting

2 years agosupervisor.reimage(): Improve Sentry reporting 1882/head
Zack Cerza [Mon, 14 Aug 2023 18:48:47 +0000 (12:48 -0600)]
supervisor.reimage(): Improve Sentry reporting

Set the `task` tag value to 'reimage' when reporting reimage failures to
Sentry, to make searching for them in its UI easier.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1881 from ceph/stdin-killer-setpgrp
Zack Cerza [Fri, 4 Aug 2023 17:56:20 +0000 (11:56 -0600)]
Merge pull request #1881 from ceph/stdin-killer-setpgrp

stdin-killer: do not setpgrp is already leader

2 years agostdin-killer: do not setpgrp if already leader 1881/head
Patrick Donnelly [Fri, 4 Aug 2023 13:17:28 +0000 (09:17 -0400)]
stdin-killer: do not setpgrp if already leader

Fixes failure like:

    2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr:Traceback (most recent call last):
    2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr:  File "/usr/bin/stdin-killer", line 213, in <module>
    2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr:    os.setpgrp()
    2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr:PermissionError: [Errno 1] Operation not permitted

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoMerge pull request #1880 from ceph/wip-62286
Zack Cerza [Wed, 2 Aug 2023 20:25:13 +0000 (14:25 -0600)]
Merge pull request #1880 from ceph/wip-62286

2 years agoPhysicalConsole: Tolerate invalid UTF-8 characters 1880/head
Zack Cerza [Wed, 2 Aug 2023 18:03:21 +0000 (12:03 -0600)]
PhysicalConsole: Tolerate invalid UTF-8 characters

... in pexpect.spawn() calls.

Fixes: https://tracker.ceph.com/issues/62286
Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoPhysicalConsole.check_status(): Use log.exception
Zack Cerza [Wed, 2 Aug 2023 17:04:21 +0000 (11:04 -0600)]
PhysicalConsole.check_status(): Use log.exception

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1845 from NitzanMordhai/wip-nitzan-correct-typo-osd-default-pool...
Zack Cerza [Wed, 2 Aug 2023 16:33:38 +0000 (10:33 -0600)]
Merge pull request #1845 from NitzanMordhai/wip-nitzan-correct-typo-osd-default-pool-size

2 years agoMerge pull request #1877 from ceph/sentry-ae
Dan Mick [Tue, 1 Aug 2023 00:51:49 +0000 (17:51 -0700)]
Merge pull request #1877 from ceph/sentry-ae

supervisor: Fix an AttributeError in reimage()

2 years agosupervisor: Fix an AttributeError in reimage() sentry-ae 1877/head
Zack Cerza [Mon, 31 Jul 2023 23:31:43 +0000 (17:31 -0600)]
supervisor: Fix an AttributeError in reimage()

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1859 from ceph/quiet-urllib3
Dan Mick [Fri, 28 Jul 2023 23:08:05 +0000 (16:08 -0700)]
Merge pull request #1859 from ceph/quiet-urllib3

Turn down logging for urllib3.util.retry

2 years agoMerge pull request #1875 from ceph/reimage-errs-sentry
Dan Mick [Fri, 28 Jul 2023 21:52:06 +0000 (14:52 -0700)]
Merge pull request #1875 from ceph/reimage-errs-sentry

Report reimage failures to Sentry

2 years agoMerge pull request #1876 from ceph/afa-sort
Dan Mick [Fri, 28 Jul 2023 21:02:07 +0000 (14:02 -0700)]
Merge pull request #1876 from ceph/afa-sort

task.ansible.FailureAnalyzer: Sort failure items

2 years agotask.ansible.FailureAnalyzer: Sort failure items afa-sort 1876/head
Zack Cerza [Fri, 28 Jul 2023 19:22:52 +0000 (13:22 -0600)]
task.ansible.FailureAnalyzer: Sort failure items

To reduce unecessary duplication in e.g. Sentry.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1874 from ceph/fix-fog-timeout
Dan Mick [Thu, 27 Jul 2023 21:56:28 +0000 (14:56 -0700)]
Merge pull request #1874 from ceph/fix-fog-timeout

fog: Fix a connection timeout bug

2 years agoMerge pull request #1873 from ceph/console-log
Dan Mick [Thu, 27 Jul 2023 21:55:36 +0000 (14:55 -0700)]
Merge pull request #1873 from ceph/console-log

orchestra.console: Scope loggers to shortname

2 years agosupervisor.reimage: Report failures to Sentry reimage-errs-sentry 1875/head
Zack Cerza [Thu, 27 Jul 2023 17:49:16 +0000 (11:49 -0600)]
supervisor.reimage: Report failures to Sentry

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMove Sentry reporting logic to utils
Zack Cerza [Thu, 27 Jul 2023 17:25:23 +0000 (11:25 -0600)]
Move Sentry reporting logic to utils

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoFOG._wait_for_ready(): Catch ConnectionErrors 1874/head
Zack Cerza [Thu, 27 Jul 2023 17:42:46 +0000 (11:42 -0600)]
FOG._wait_for_ready(): Catch ConnectionErrors

Instead of just ConnectionResetErrors, which inherit from
ConnectionError

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoremote: Raise ConnectionError when appropriate
Zack Cerza [Thu, 27 Jul 2023 17:41:11 +0000 (11:41 -0600)]
remote: Raise ConnectionError when appropriate

Instead of just Exception.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoorchestra.console: Scope loggers to shortname 1873/head
Zack Cerza [Thu, 27 Jul 2023 16:24:25 +0000 (10:24 -0600)]
orchestra.console: Scope loggers to shortname

This will make reading console debug logging easier.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1871 from ceph/sentry-ansible
Zack Cerza [Mon, 24 Jul 2023 19:38:11 +0000 (13:38 -0600)]
Merge pull request #1871 from ceph/sentry-ansible

2 years agoexceptions.AnsibleFailedError: Add fingerprint() 1871/head
Zack Cerza [Mon, 24 Jul 2023 17:22:51 +0000 (11:22 -0600)]
exceptions.AnsibleFailedError: Add fingerprint()

This will cause Sentry to group events by their failure reasons, rather
than lumping all AnsibleFailedErrors together

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1869 from dmick/wip-pexpect
Dan Mick [Sat, 22 Jul 2023 03:42:43 +0000 (20:42 -0700)]
Merge pull request #1869 from dmick/wip-pexpect

orchestra/console: log output from pexpect commands

2 years agoorchestra/console: log output from pexpect commands 1869/head
Dan Mick [Thu, 20 Jul 2023 02:45:11 +0000 (19:45 -0700)]
orchestra/console: log output from pexpect commands

in case anything weird is being noticed and communicated by
ipmitool, try to display anything it says

Signed-off-by: Dan Mick <dmick@redhat.com>
2 years agoMerge pull request #1870 from ceph/pyyaml-fix
Dan Mick [Sat, 22 Jul 2023 00:35:03 +0000 (17:35 -0700)]
Merge pull request #1870 from ceph/pyyaml-fix

Pin PyYAML to fix CI breakage

2 years agoUpdate pip-tools 1870/head
Zack Cerza [Fri, 21 Jul 2023 19:53:29 +0000 (13:53 -0600)]
Update pip-tools

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoExclude PyYAML 5.4.0,5.4.1
Zack Cerza [Fri, 21 Jul 2023 19:49:52 +0000 (13:49 -0600)]
Exclude PyYAML 5.4.0,5.4.1

See https://github.com/yaml/pyyaml/issues/601

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1866 from ceph/aa-fix
Zack Cerza [Tue, 18 Jul 2023 17:48:15 +0000 (11:48 -0600)]
Merge pull request #1866 from ceph/aa-fix

2 years agoMerge pull request #1867 from ceph/keyscan-timout
Zack Cerza [Fri, 14 Jul 2023 20:58:51 +0000 (14:58 -0600)]
Merge pull request #1867 from ceph/keyscan-timout

2 years agomisc.ssh_keyscan: Always retry, and retry more 1867/head
Zack Cerza [Fri, 14 Jul 2023 20:27:32 +0000 (14:27 -0600)]
misc.ssh_keyscan: Always retry, and retry more

We started seeing reimage failures with errors like:
"teuthology.exceptions.MaxWhileTries: 'ssh_keyscan $host' reached
maximum tries (6) after waiting for 5 seconds"

Let's be quite a bit more generous.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoTestFailureAnalyzer: Add tests for dropped items 1866/head
Zack Cerza [Fri, 14 Jul 2023 18:01:35 +0000 (12:01 -0600)]
TestFailureAnalyzer: Add tests for dropped items

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoansible.FailureAnalyzer: Drop malformed records
Zack Cerza [Fri, 14 Jul 2023 17:50:24 +0000 (11:50 -0600)]
ansible.FailureAnalyzer: Drop malformed records

If host_obj is the wrong type, we won't be able to extract anything
useful from it. In these cases, we'll end up using the raw string as we
used to do.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoAnsible._handle_failure: YAMLErrors are special
Zack Cerza [Fri, 14 Jul 2023 17:38:54 +0000 (11:38 -0600)]
Ansible._handle_failure: YAMLErrors are special

Return to treating them differently, but also continue to catch other
exceptions here.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoansible.FailureAnalyzer: Look for SSH errors
Zack Cerza [Fri, 14 Jul 2023 17:37:33 +0000 (11:37 -0600)]
ansible.FailureAnalyzer: Look for SSH errors

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoansible.FailureAnalyzer: items -> values
Zack Cerza [Fri, 14 Jul 2023 17:17:19 +0000 (11:17 -0600)]
ansible.FailureAnalyzer: items -> values

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1865 from ceph/ansible-fail-tolerate-exceptions
Dan Mick [Thu, 13 Jul 2023 23:33:37 +0000 (16:33 -0700)]
Merge pull request #1865 from ceph/ansible-fail-tolerate-exceptions

Ansible._handle_failure: Catch all Exceptions

2 years agoAnsible._handle_failure: Catch all Exceptions 1865/head
Zack Cerza [Thu, 13 Jul 2023 22:58:05 +0000 (16:58 -0600)]
Ansible._handle_failure: Catch all Exceptions

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1864 from ceph/analyze-ansible
Zack Cerza [Thu, 13 Jul 2023 00:58:34 +0000 (18:58 -0600)]
Merge pull request #1864 from ceph/analyze-ansible

2 years agoansible: Try to summarize failure logs analyze-ansible 1864/head
Zack Cerza [Wed, 5 Jul 2023 21:12:05 +0000 (15:12 -0600)]
ansible: Try to summarize failure logs

The failure logs we capture are sometimes helpful, but are often too
long, too complex, and too noisy to understand. Sentry also struggles to
associate related failures because of the presence of unique data such
as timestamps and URLs.

While I don't see a quick and generic solution to this, there are
several common failure modes that can easily be summarized. This commit
begins that work by looking for errors caused by network outages.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agotest_ansible: Use mock_open()
Zack Cerza [Wed, 5 Jul 2023 23:07:32 +0000 (17:07 -0600)]
test_ansible: Use mock_open()

This provides a more complete interface than what we were constructing.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1863 from ceph/fog-wfr-eoferror
Dan Mick [Sat, 1 Jul 2023 00:05:18 +0000 (17:05 -0700)]
Merge pull request #1863 from ceph/fog-wfr-eoferror

FOG._wait_for_ready: Tolerate EOFError

2 years agoFOG._wait_for_ready: Tolerate EOFError fog-wfr-eoferror 1863/head
Zack Cerza [Fri, 30 Jun 2023 22:11:14 +0000 (16:11 -0600)]
FOG._wait_for_ready: Tolerate EOFError

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1862 from ceph/retry-sentinel-connreset
Zack Cerza [Fri, 30 Jun 2023 19:19:08 +0000 (13:19 -0600)]
Merge pull request #1862 from ceph/retry-sentinel-connreset

2 years agoRemote.reconnect(): Use a default timeout of 30s 1862/head
Zack Cerza [Fri, 30 Jun 2023 18:29:52 +0000 (12:29 -0600)]
Remote.reconnect(): Use a default timeout of 30s

And rewrite with safe_while.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoFOG._wait_for_ready: Tolerate ConnectionResetError
Zack Cerza [Thu, 29 Jun 2023 19:08:37 +0000 (13:08 -0600)]
FOG._wait_for_ready: Tolerate ConnectionResetError

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agocontextutil: Remove leftover print statement
Zack Cerza [Thu, 29 Jun 2023 19:02:47 +0000 (13:02 -0600)]
contextutil: Remove leftover print statement

Looks like this was missed during PR submission/review in 8f8d05852

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1858 from ceph/exporter-restart
Zack Cerza [Wed, 28 Jun 2023 17:56:02 +0000 (11:56 -0600)]
Merge pull request #1858 from ceph/exporter-restart

2 years agoMerge pull request #1860 from ceph/disp-rc
Dan Mick [Tue, 27 Jun 2023 22:10:08 +0000 (15:10 -0700)]
Merge pull request #1860 from ceph/disp-rc

dispatcher: Return the highest of the jobs' RCs

2 years agodispatcher: Return the highest of the jobs' RCs 1860/head
Zack Cerza [Tue, 11 Oct 2022 19:10:53 +0000 (13:10 -0600)]
dispatcher: Return the highest of the jobs' RCs

This is so that ceph-devstack can report job failures

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1855 from ceph/ssh-ux
Dan Mick [Mon, 26 Jun 2023 23:09:43 +0000 (16:09 -0700)]
Merge pull request #1855 from ceph/ssh-ux

Improve error message when there is no SSH key

2 years agoTurn down logging for urllib3.util.retry quiet-urllib3 1859/head
Zack Cerza [Mon, 26 Jun 2023 22:54:04 +0000 (16:54 -0600)]
Turn down logging for urllib3.util.retry

This quiets messages like: "Converted retries value: 10 ->
Retry(total=10, connect=None, read=None, redirect=None, status=None)"

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoexporter: Restart every 24h 1858/head
Zack Cerza [Tue, 13 Jun 2023 23:49:48 +0000 (17:49 -0600)]
exporter: Restart every 24h

A design limitation of prometheus-client's multiprocessing mode is that
each process creates files to store its own metrics; the exporter then
has to read each file, even if the process which created it is dead.

This results in request latency growing over time, to the point of
multiple seconds when the file count gets into the thousands. This
eventually results in prometheus failing to fetch, leaving gaps in our
data.

We can work around this by restarting at a regular interval; 24h seems
like a fine place to start.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1856 from ceph/fog-debug
Dan Mick [Fri, 16 Jun 2023 22:48:04 +0000 (15:48 -0700)]
Merge pull request #1856 from ceph/fog-debug

fog: Add more debug logging

2 years agoFOG._wait_for_ready(): Use instance logger 1856/head
Zack Cerza [Fri, 16 Jun 2023 16:24:29 +0000 (10:24 -0600)]
FOG._wait_for_ready(): Use instance logger

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agodispatcher/supervisor: Set root logger level
Zack Cerza [Fri, 16 Jun 2023 16:23:42 +0000 (10:23 -0600)]
dispatcher/supervisor: Set root logger level

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agofog: Add more debug logging
Zack Cerza [Wed, 14 Jun 2023 20:53:36 +0000 (14:53 -0600)]
fog: Add more debug logging

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1854 from ceph/bootstrap-c9s
Zack Cerza [Wed, 14 Jun 2023 15:33:28 +0000 (09:33 -0600)]
Merge pull request #1854 from ceph/bootstrap-c9s

2 years agoImprove error message when there is no SSH key 1855/head
Zack Cerza [Tue, 13 Jun 2023 20:24:12 +0000 (14:24 -0600)]
Improve error message when there is no SSH key

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1853 from ceph/reimage-no-ctx
Zack Cerza [Tue, 13 Jun 2023 19:08:40 +0000 (13:08 -0600)]
Merge pull request #1853 from ceph/reimage-no-ctx

2 years agobootstrap: Tolerate a missing lsb_release 1854/head
Zack Cerza [Mon, 12 Jun 2023 21:48:34 +0000 (15:48 -0600)]
bootstrap: Tolerate a missing lsb_release

This fixes the lack of support for CentOS 9.Stream

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoprovision: Avoid a possible AttributeError 1853/head
Zack Cerza [Mon, 12 Jun 2023 21:37:56 +0000 (15:37 -0600)]
provision: Avoid a possible AttributeError

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1846 from ceph/stdin-killer
Zack Cerza [Wed, 7 Jun 2023 23:26:37 +0000 (17:26 -0600)]
Merge pull request #1846 from ceph/stdin-killer

2 years agoteuthology/task/install: add stdin-killer helper 1846/head
Patrick Donnelly [Thu, 18 May 2023 13:24:57 +0000 (09:24 -0400)]
teuthology/task/install: add stdin-killer helper

This helper tool runs commands which may or may not take data on stdin.
Like "daemon-helper", if stdin signals EOF, stdin-killer will kill the
command but only as a last resort. It forwards EOF to the command by
closing the command's stdin (pipe) and then waiting a configurable
amount of time for the command to gracefully exit.

Additionally, if stdout or stderr are hung up -- i.e. the ssh parent
process has terminated -- then stdin-killer also detects this and
initiates the graceful shutdown of the command. This is something
daemon-helper does not do.

In general, this tool is a superior replacement of the daemon-helper
tool because you can write to the command's stdin normally.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agosetup.cfg: install binary helpers
Patrick Donnelly [Thu, 18 May 2023 13:20:57 +0000 (09:20 -0400)]
setup.cfg: install binary helpers

These are used by vstart_runner.py for local dev operations. Install
them so they are available in the virtualenv bin directory.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoteuthology/task/install: reorganize binary helpers
Patrick Donnelly [Wed, 17 May 2023 18:32:19 +0000 (14:32 -0400)]
teuthology/task/install: reorganize binary helpers

We intend to install these so move them into an appropriately named
directory.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoMerge pull request #1803 from jdurgin/wip-configurable-timeouts
Zack Cerza [Wed, 31 May 2023 22:12:50 +0000 (16:12 -0600)]
Merge pull request #1803 from jdurgin/wip-configurable-timeouts

2 years agoMerge pull request #1851 from ceph/reimage-failures
Zack Cerza [Wed, 31 May 2023 20:00:35 +0000 (14:00 -0600)]
Merge pull request #1851 from ceph/reimage-failures

2 years agofog: Verify reimaged machine OS 1851/head
Zack Cerza [Thu, 18 May 2023 00:12:13 +0000 (18:12 -0600)]
fog: Verify reimaged machine OS

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1850 from ceph/unmask-unlock-response
Zack Cerza [Fri, 26 May 2023 18:23:34 +0000 (12:23 -0600)]
Merge pull request #1850 from ceph/unmask-unlock-response

2 years agoMerge pull request #1849 from ceph/prom-reimage-results
Dan Mick [Wed, 24 May 2023 21:48:00 +0000 (14:48 -0700)]
Merge pull request #1849 from ceph/prom-reimage-results

exporter: Instrument node reimaging success/fail