]> git.apps.os.sepia.ceph.com Git - teuthology.git/log
teuthology.git
16 months agoteuthology: modify logic to check for multiple completed builds 1935/head
Laura Flores [Fri, 3 May 2024 17:07:40 +0000 (12:07 -0500)]
teuthology: modify logic to check for multiple completed builds

The current logic assumes that there is only one build for each distro/flavor
per SHA1. However, there is a bug in the jenkins infrastrucutre that sometimes
causes multiple builds to trigger for one SHA1. In many of these cases, the first
build succeeds, but the second fails. Teuthology only looks at the latest build,
notices that it failed, and gives up. However, with this logic, teuthology can
go back farther and notice that there is indeed a successful build earlier in the
lineup.

Here is an example in which the first centos 8 x86_64 build succeeded, but a second
build on top of it failed. Teuthology could only detect the latest failed build:
https://shaman.ceph.com/builds/ceph/wip-pdonnell-testing-20240503.010653-debug/ec1d3bd17a3db9d74296aa618f8d63c801bb647e/

Addresses this failure in teuthology:
lflores@teuthology:~$ ./teuthology/virtualenv/bin/teuthology-suite -v -m smithi -c wip-pdonnell-testing-20240503.010653-debug -s fs --subset 111/12000 -p 75 --dry-run
2024-05-03 16:39:35,231.231 INFO:teuthology.suite:Using random seed=9685
2024-05-03 16:39:35,232.232 INFO:teuthology.suite.run:kernel sha1: distro
2024-05-03 16:39:35,673.673 DEBUG:teuthology.repo_utils:git ls-remote https://git.ceph.com/ceph-ci.git wip-pdonnell-testing-20240503.010653-debug -> ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:35,673.673 INFO:teuthology.suite.run:ceph sha1: ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:35,674.674 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F8%2Fx86_64&sha1=ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:looking for centos/8 x86_64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 arm64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/9 x86_64 crimson
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/9 x86_64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 arm64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 x86_64 crimson
2024-05-03 16:39:36,177.177 DEBUG:teuthology.packaging:build: centos/8 x86_64 default
2024-05-03 16:39:36,178.178 INFO:teuthology.suite.util:Container build incomplete
Traceback (most recent call last):
  File "./teuthology/virtualenv/bin/teuthology-suite", line 8, in <module>
    sys.exit(main())
  File "/cephfs/home/lflores/teuthology/scripts/suite.py", line 226, in main
    return teuthology.suite.main(args)
  File "/cephfs/home/lflores/teuthology/teuthology/suite/__init__.py", line 143, in main
    run = Run(conf)
  File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 56, in __init__
    self.base_config = self.create_initial_config()
  File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 94, in create_initial_config
    self.choose_ceph_version(ceph_hash)
  File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 216, in choose_ceph_version
    util.schedule_fail(msg, self.name, dry_run=self.args.dry_run)
  File "/cephfs/home/lflores/teuthology/teuthology/suite/util.py", line 77, in schedule_fail
    raise ScheduleFailError(message, name)
teuthology.exceptions.ScheduleFailError: Scheduling lflores-2024-05-03_16:39:35-fs-wip-pdonnell-testing-20240503.010653-debug-distro-default-smithi failed: Packages for os_type 'centos', flavor default and ceph hash 'ec1d3bd17a3db9d74296aa618f8d63c801bb647e' not found

More work should be done to fix the "double build" issue in jenkins, so this can be thought of as a workaround.

Signed-off-by: Laura Flores <lflores@ibm.com>
16 months agoMerge PR #1926 into main wip-lusov-pre-install
Patrick Donnelly [Wed, 1 May 2024 15:09:28 +0000 (11:09 -0400)]
Merge PR #1926 into main

* refs/pull/1926/head:
teuthology/suite: initialize lua prng using run's seed

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
17 months agoMerge pull request #1932 from ceph/fix-test-admins-kill
Dan Mick [Wed, 17 Apr 2024 20:52:53 +0000 (13:52 -0700)]
Merge pull request #1932 from ceph/fix-test-admins-kill

kill: Fix test-admins' ability to kill

17 months agokill: Fix test-admins' ability to kill 1932/head
Zack Cerza [Fri, 12 Apr 2024 18:25:23 +0000 (12:25 -0600)]
kill: Fix test-admins' ability to kill

By testing access to /bin/true, we were getting false negatives as we meant
to be testing for access to /bin/kill. With our configuration, any sudo access
indicates access to /bin/kill.

Signed-off-by: Zack Cerza <zack@redhat.com>
17 months agoMerge pull request #1929 from lxbsz/wip-64471
Zack Cerza [Thu, 4 Apr 2024 19:47:07 +0000 (13:47 -0600)]
Merge pull request #1929 from lxbsz/wip-64471

suite: add the kdb option support

17 months agoMerge pull request #1930 from VallariAg/fix-cephadmunit-start
Zack Cerza [Wed, 3 Apr 2024 15:54:50 +0000 (09:54 -0600)]
Merge pull request #1930 from VallariAg/fix-cephadmunit-start

fix cephadmunit start() method

17 months agoorchestra/daemon/cephadmunit.py: set is_started in start() 1930/head
Vallari Agrawal [Thu, 28 Mar 2024 08:32:55 +0000 (14:02 +0530)]
orchestra/daemon/cephadmunit.py: set is_started in start()

In CephadmUnit.start() method, `is_started` isn't set to
true. When running() is called after start(), then it would
return false, which is not correct since the daemon has been
started by calling start().
This commit fixes that issue.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
17 months agosuite: add the kdb option support 1929/head
Xiubo Li [Wed, 27 Mar 2024 01:58:21 +0000 (01:58 +0000)]
suite: add the kdb option support

This will allow us to disable the kdb when running the jobs.

URL: https://tracker.ceph.com/issues/64471
Signed-off-by: Xiubo Li <xiubli@redhat.com>
17 months agoorchestra/daemon/cephadmunit.py: fix start() method 1928/head
Vallari Agrawal [Tue, 26 Mar 2024 09:15:50 +0000 (14:45 +0530)]
orchestra/daemon/cephadmunit.py: fix start() method

In CephadmUnit.start() method, explicitly pass start_cmd as
"args" keyword because Remote.run expects keyword arguments.

Fixes: https://tracker.ceph.com/issues/65162
Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
18 months agoteuthology/suite: initialize lua prng using run's seed 1926/head
Patrick Donnelly [Tue, 19 Mar 2024 14:13:27 +0000 (10:13 -0400)]
teuthology/suite: initialize lua prng using run's seed

When a script may use Lua's prng, we want it to produce the same sequence
during a rerun.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
18 months agoMerge pull request #1923 from ceph/poweron-timeout
adam kraitman [Tue, 5 Mar 2024 12:03:03 +0000 (14:03 +0200)]
Merge pull request #1923 from ceph/poweron-timeout

orchestra.PhysicalConsole: Increase timeout

18 months agoorchestra.PhysicalConsole: Increase timeout 1923/head
Zack Cerza [Mon, 4 Mar 2024 17:23:53 +0000 (10:23 -0700)]
orchestra.PhysicalConsole: Increase timeout

... from 40s to 120s. A physical host being slightly slow to boot should not
cause a reimage failure.

Signed-off-by: Zack Cerza <zack@redhat.com>
19 months agoMerge pull request #1919 from ceph/wip-kernel-image-version
Ilya Dryomov [Wed, 14 Feb 2024 23:07:06 +0000 (00:07 +0100)]
Merge pull request #1919 from ceph/wip-kernel-image-version

kernel: make get_image_version() work for rpm

Reviewed-by: Ramana Raja <rraja@redhat.com>
19 months agokernel: make get_image_version() work for rpm 1919/head
Ilya Dryomov [Wed, 14 Feb 2024 19:05:54 +0000 (20:05 +0100)]
kernel: make get_image_version() work for rpm

At some point in the past, the layout of the rpm package has changed.
There is no file matching "/boot/vmlinuz-" there anymore, instead there
is "vmlinuz" file at the root of the modules directory.  For reference:

deb:

-rw-r--r-- root/root  11527168 2024-02-13 16:25 ./boot/vmlinuz-6.8.0-rc4-ga64ccd305b28
-rw-r--r-- root/root     72614 2024-02-13 16:25 ./lib/modules/6.8.0-rc4-ga64ccd305b28/modules.order

rpm:

/lib/modules/6.8.0-rc4-ga64ccd305b28/modules.order
/lib/modules/6.8.0-rc4-ga64ccd305b28/vmlinuz

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
19 months agoMerge pull request #1918 from VallariAg/tapi-threads
Zack Cerza [Tue, 13 Feb 2024 19:59:00 +0000 (12:59 -0700)]
Merge pull request #1918 from VallariAg/tapi-threads

teuthology/__init__.py: don't patch threads when running via teuthology_api

19 months agoteuthology/__init__.py: don't patch threads when running via teuthology_api 1918/head
Vallari Agrawal [Tue, 13 Feb 2024 13:46:01 +0000 (19:16 +0530)]
teuthology/__init__.py: don't patch threads when running via teuthology_api

The project [teuthology-api](https://github.com/ceph/teuthology-api)
requires threads to be not patched.
Currently, we are using "teuth-api" branch of teuthology where threads are
not patched. With this commit, we'll be able to use the "main" branch as
a dependency.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
19 months agoMerge pull request #1914 from ceph/lock-leaks
Zack Cerza [Wed, 31 Jan 2024 02:30:00 +0000 (19:30 -0700)]
Merge pull request #1914 from ceph/lock-leaks

supervisor: Disregard nuke-on-error when unlocking

19 months agotest_exit: Drop bad test_noop 1914/head
Zack Cerza [Wed, 31 Jan 2024 01:56:19 +0000 (18:56 -0700)]
test_exit: Drop bad test_noop

This test races with other tests because Exiter doesn't have a great way to
remove all installed handlers. This is a test-only issue, so we can drop this
test.

Signed-off-by: Zack Cerza <zack@redhat.com>
19 months agosupervisor: Disregard nuke-on-error when unlocking
Zack Cerza [Wed, 31 Jan 2024 01:04:01 +0000 (18:04 -0700)]
supervisor: Disregard nuke-on-error when unlocking

Signed-off-by: Zack Cerza <zack@redhat.com>
19 months agoMerge pull request #1913 from ceph/wip-64193
Zack Cerza [Mon, 29 Jan 2024 21:15:06 +0000 (14:15 -0700)]
Merge pull request #1913 from ceph/wip-64193

supervisor: Do not nuke nodes after jobs finish

19 months agosupervisor: Do not nuke nodes after jobs finish 1913/head
Zack Cerza [Fri, 26 Jan 2024 21:02:09 +0000 (14:02 -0700)]
supervisor: Do not nuke nodes after jobs finish

This was causing a bad race condition, where we could unlock a node, then unlock
it again via the nuke process after a different job had locked it.

Fixes: https://tracker.ceph.com/issues/64193
Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agoMerge pull request #1912 from ceph/kill-report-dead
Zack Cerza [Mon, 22 Jan 2024 20:30:58 +0000 (13:30 -0700)]
Merge pull request #1912 from ceph/kill-report-dead

kill: After killing a run, report it as dead

20 months agokill: After killing a run, report it as dead 1912/head
Zack Cerza [Mon, 22 Jan 2024 18:33:20 +0000 (11:33 -0700)]
kill: After killing a run, report it as dead

In case processes died a messy death.

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agoMerge pull request #1909 from ceph/deps
Zack Cerza [Mon, 8 Jan 2024 19:07:54 +0000 (12:07 -0700)]
Merge pull request #1909 from ceph/deps

Update dependencies

20 months agoJobProcesses: Ignore zombies safely 1907/head 1909/head
Zack Cerza [Wed, 3 Jan 2024 19:28:18 +0000 (12:28 -0700)]
JobProcesses: Ignore zombies safely

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agofind_dispatcher_processes: Ignore zombies safely
Zack Cerza [Wed, 3 Jan 2024 19:25:50 +0000 (12:25 -0700)]
find_dispatcher_processes: Ignore zombies safely

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agoInstall ansible collections individually
Zack Cerza [Tue, 2 Jan 2024 18:58:36 +0000 (11:58 -0700)]
Install ansible collections individually

Going forward, we can maintain our specific collection requirements in
requirements.yml.

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agoDrop ansible for ansible-core
Zack Cerza [Tue, 2 Jan 2024 18:16:15 +0000 (11:16 -0700)]
Drop ansible for ansible-core

The 'ansible' PyPI package installs _all_ collections, which ends up being
~60% the total size of our virtualenv.

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agotox.ini: Move some deps to setup.cfg
Zack Cerza [Wed, 27 Dec 2023 18:53:14 +0000 (11:53 -0700)]
tox.ini: Move some deps to setup.cfg

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agosetup.cfg: python_requires>=3.8
Zack Cerza [Wed, 27 Dec 2023 18:45:27 +0000 (11:45 -0700)]
setup.cfg: python_requires>=3.8

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorepo_utils.fetch_repo: Use less retries
Zack Cerza [Wed, 27 Dec 2023 18:22:39 +0000 (11:22 -0700)]
repo_utils.fetch_repo: Use less retries

If a particular branch cannot successfully bootstrap, it can cause an accidental
DoS.

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update via pip-compile -U
Zack Cerza [Wed, 27 Dec 2023 17:56:51 +0000 (10:56 -0700)]
requirements: Update via pip-compile -U

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agosetup.cfg: Pin urllib3 for botocore
Zack Cerza [Wed, 27 Dec 2023 18:42:47 +0000 (11:42 -0700)]
setup.cfg: Pin urllib3 for botocore

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update ansible
Zack Cerza [Wed, 27 Dec 2023 17:54:22 +0000 (10:54 -0700)]
requirements: Update ansible

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update pyjwt
Zack Cerza [Wed, 27 Dec 2023 17:51:35 +0000 (10:51 -0700)]
requirements: Update pyjwt

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update paramiko
Zack Cerza [Wed, 27 Dec 2023 17:49:59 +0000 (10:49 -0700)]
requirements: Update paramiko

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update gevent
Zack Cerza [Wed, 27 Dec 2023 17:48:40 +0000 (10:48 -0700)]
requirements: Update gevent

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update configobj
Zack Cerza [Thu, 21 Dec 2023 21:33:24 +0000 (14:33 -0700)]
requirements: Update configobj

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update certifi
Zack Cerza [Thu, 21 Dec 2023 21:32:06 +0000 (14:32 -0700)]
requirements: Update certifi

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update requests
Zack Cerza [Thu, 21 Dec 2023 21:30:39 +0000 (14:30 -0700)]
requirements: Update requests

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update PyYAML
Zack Cerza [Thu, 21 Dec 2023 21:29:21 +0000 (14:29 -0700)]
requirements: Update PyYAML

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Move openstack to its own variant
Zack Cerza [Thu, 21 Dec 2023 21:26:33 +0000 (14:26 -0700)]
requirements: Move openstack to its own variant

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update cryptography
Zack Cerza [Thu, 21 Dec 2023 21:01:36 +0000 (14:01 -0700)]
requirements: Update cryptography

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agorequirements: Update pip-tools
Zack Cerza [Thu, 21 Dec 2023 20:59:34 +0000 (13:59 -0700)]
requirements: Update pip-tools

Signed-off-by: Zack Cerza <zack@redhat.com>
20 months agoMerge pull request #1906 from ceph/kill-unbound
Zack Cerza [Wed, 27 Dec 2023 17:44:55 +0000 (10:44 -0700)]
Merge pull request #1906 from ceph/kill-unbound

kill.kill_processes: Fix possibly-unbound variables

21 months agokill.kill_processes: Fix possibly-unbound variables 1906/head
Zack Cerza [Wed, 20 Dec 2023 23:19:10 +0000 (16:19 -0700)]
kill.kill_processes: Fix possibly-unbound variables

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agoMerge pull request #1903 from ceph/wip-package-queries
Zack Cerza [Wed, 20 Dec 2023 22:41:10 +0000 (15:41 -0700)]
Merge pull request #1903 from ceph/wip-package-queries

suite: Improve package query caching

21 months agoMerge pull request #1900 from ceph/systemd
Zack Cerza [Wed, 20 Dec 2023 22:39:42 +0000 (15:39 -0700)]
Merge pull request #1900 from ceph/systemd

Add systemd units for exporter and dispatcher

21 months agorun.util.find_git_parents: Drop refresh() 1903/head
Zack Cerza [Wed, 29 Nov 2023 23:34:51 +0000 (16:34 -0700)]
run.util.find_git_parents: Drop refresh()

This takes a long time, and can time out. The mirror is updated every ten
minutes automatically.

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agoMerge pull request #1896 from ceph/dependabot/pip/urllib3-1.26.18
kyr [Sun, 10 Dec 2023 17:25:23 +0000 (18:25 +0100)]
Merge pull request #1896 from ceph/dependabot/pip/urllib3-1.26.18

build(deps): bump urllib3 from 1.26.6 to 1.26.18

21 months agobuild(deps): bump urllib3 from 1.26.6 to 1.26.18 1896/head
dependabot[bot] [Sun, 10 Dec 2023 16:21:41 +0000 (16:21 +0000)]
build(deps): bump urllib3 from 1.26.6 to 1.26.18

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.6 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.6...1.26.18)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
21 months agorun: Fix some pyright errors
Zack Cerza [Wed, 29 Nov 2023 18:55:28 +0000 (11:55 -0700)]
run: Fix some pyright errors

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agoorchestra.opsys: Add some newer OS codenames
Zack Cerza [Wed, 29 Nov 2023 00:27:04 +0000 (17:27 -0700)]
orchestra.opsys: Add some newer OS codenames

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agotests: Remove some gitbuilder-related tests
Zack Cerza [Wed, 29 Nov 2023 00:25:13 +0000 (17:25 -0700)]
tests: Remove some gitbuilder-related tests

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agoMake logs slightly quieter during scheduling
Zack Cerza [Wed, 22 Nov 2023 01:50:01 +0000 (18:50 -0700)]
Make logs slightly quieter during scheduling

Particularly in non-verbose mode.

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agorepo_utils.ls_remote: Memoize
Zack Cerza [Wed, 22 Nov 2023 01:54:08 +0000 (18:54 -0700)]
repo_utils.ls_remote: Memoize

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agosuite: Improve package query caching
Zack Cerza [Wed, 22 Nov 2023 01:25:56 +0000 (18:25 -0700)]
suite: Improve package query caching

We had our own "system" for caching, but it had the unfortunate characteristic
 of being a big bowl of spaghetti. While eating said pasta I also noticed we
had two competing "distro defaults" concepts - so that let me delete even more
code. Yum!

Signed-off-by: Zack Cerza <zack@redhat.com>
21 months agoMerge pull request #1899 from ceph/kill-proc-perms
Zack Cerza [Wed, 29 Nov 2023 17:23:58 +0000 (10:23 -0700)]
Merge pull request #1899 from ceph/kill-proc-perms

21 months agoMerge pull request #1902 from ceph/dispatcher-quiet
Dan Mick [Tue, 28 Nov 2023 23:28:51 +0000 (15:28 -0800)]
Merge pull request #1902 from ceph/dispatcher-quiet

dispatcher: Dont spam the journal

21 months agoMerge pull request #1792 from VallariAg/unittest-xml-scanner
Zack Cerza [Mon, 27 Nov 2023 23:25:30 +0000 (16:25 -0700)]
Merge pull request #1792 from VallariAg/unittest-xml-scanner

orch/run: Add unit test xml scanner

21 months agoutil/scanner: add UnitTestScanner.num_of_total_failures 1792/head
Vallari Agrawal [Fri, 27 Oct 2023 08:58:18 +0000 (14:28 +0530)]
util/scanner: add UnitTestScanner.num_of_total_failures

In UnitTestScanner's final error message, add total count of failures
before the first error occurance, like "(total x failed) <message>".
Another minor change: add "..." if the failure reason is more than 200 chars.

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
21 months agoadd utils/tests/test_scanner.py
Vallari Agrawal [Tue, 19 Sep 2023 15:04:28 +0000 (20:34 +0530)]
add utils/tests/test_scanner.py

and test_run_unit_test in test_remote.py

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
21 months agoadd Scanner, UnitTestScanner, ValgrindScanner
Vallari Agrawal [Sat, 1 Oct 2022 11:16:52 +0000 (16:46 +0530)]
add Scanner, UnitTestScanner, ValgrindScanner

1. add 'run_unit_test' to Remote
2. create util/scanner.py
3. new exception: UnitTestError
4. add `lxml` dependency in setup.cfg

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
22 months agokill: Don't unlock nodes if killing procs fails 1899/head
Zack Cerza [Fri, 10 Nov 2023 22:24:21 +0000 (15:24 -0700)]
kill: Don't unlock nodes if killing procs fails

... so that we don't unlock nodes while their jobs are running.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agosupervisor: Drop job output 1902/head
Zack Cerza [Fri, 17 Nov 2023 20:48:38 +0000 (13:48 -0700)]
supervisor: Drop job output

It gets logged to its own file in the job archive.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agoMerge pull request #1901 from ceph/fog-debug-quieter
Dan Mick [Fri, 17 Nov 2023 20:43:33 +0000 (12:43 -0800)]
Merge pull request #1901 from ceph/fog-debug-quieter

fog: Drop request debug logging

22 months agodispatcher: Drop supervisor output
Zack Cerza [Fri, 17 Nov 2023 20:34:21 +0000 (13:34 -0700)]
dispatcher: Drop supervisor output

It gets logged to its own file in the job archive.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agofog: Drop request debug logging 1901/head
Zack Cerza [Fri, 17 Nov 2023 20:22:58 +0000 (13:22 -0700)]
fog: Drop request debug logging

It's too noisy.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agoAdd systemd units for exporter and dispatcher 1900/head
Zack Cerza [Wed, 15 Nov 2023 20:03:25 +0000 (13:03 -0700)]
Add systemd units for exporter and dispatcher

These are copies of what is currently in use in sepia.

Signed-off-by: Zack Cerza <zack@redhat.com>
22 months agoMerge pull request #1892 from ceph/devstack-simplified
Zack Cerza [Thu, 26 Oct 2023 19:05:51 +0000 (13:05 -0600)]
Merge pull request #1892 from ceph/devstack-simplified

Add containers/teuthology-dev

23 months agoAdd containers/teuthology-dev 1892/head
Zack Cerza [Tue, 26 Sep 2023 21:32:25 +0000 (14:32 -0700)]
Add containers/teuthology-dev

This is nearly identical to docs/docker-compose/teuthology, but with
some changes to better work with ceph-devstack. The bits in
docs/docker-compose should be able to be adapted easily to work with
this container.

Signed-off-by: Zack Cerza <zack@redhat.com>
23 months agoMerge PR #1895 into main
Patrick Donnelly [Tue, 17 Oct 2023 19:38:56 +0000 (15:38 -0400)]
Merge PR #1895 into main

* refs/pull/1895/head:
install/bin/stdin-killer: macOs (Darwin) compatibility

Reviewed-by: Zack Cerza <zack@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
23 months agoMerge pull request #1894 from VallariAg/fix-readthedocs-builds
Zack Cerza [Tue, 17 Oct 2023 16:16:46 +0000 (10:16 -0600)]
Merge pull request #1894 from VallariAg/fix-readthedocs-builds

fix readthedocs PR builds

23 months agoreadthedocs: fix 'The configuration key "build.image" is deprecated' 1894/head
Vallari Agrawal [Tue, 17 Oct 2023 09:16:42 +0000 (14:46 +0530)]
readthedocs: fix 'The configuration key "build.image" is deprecated'

builds are failing because support for deprecated “build.image” is
fully removed by readthedocs, need to use "build.os" instead.

ref: https://blog.readthedocs.com/use-build-os-config/
error: https://readthedocs.org/projects/teuthology/builds/22250705/

Signed-off-by: Vallari Agrawal <val.agl002@gmail.com>
23 months agoinstall/bin/stdin-killer: macOs (Darwin) compatibility 1895/head
Leonid Usov [Tue, 17 Oct 2023 10:37:27 +0000 (13:37 +0300)]
install/bin/stdin-killer: macOs (Darwin) compatibility

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
23 months agoMerge pull request #1887 from ceph/paramiko-eoferror
Josh Durgin [Wed, 11 Oct 2023 16:42:32 +0000 (09:42 -0700)]
Merge pull request #1887 from ceph/paramiko-eoferror

orchestra: Tolerate EOFError during connect

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2 years agoMerge pull request #1888 from ceph/keyscan-tweak
Zack Cerza [Tue, 5 Sep 2023 23:29:55 +0000 (17:29 -0600)]
Merge pull request #1888 from ceph/keyscan-tweak

2 years agomisc._ssh_keyscan: Sort keys before returning any 1888/head
Zack Cerza [Tue, 5 Sep 2023 18:42:46 +0000 (12:42 -0600)]
misc._ssh_keyscan: Sort keys before returning any

ssh-keyscan's output is unsorted, so this function wasn't deterministic.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoorchestra: Move connection exception handling 1887/head
Zack Cerza [Thu, 31 Aug 2023 18:10:09 +0000 (11:10 -0700)]
orchestra: Move connection exception handling

... to inside the retry loop. Also, add an increment to the safe_while
instance we use.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoorchestra: Treat EOFError as SSHException
Zack Cerza [Thu, 31 Aug 2023 17:34:41 +0000 (10:34 -0700)]
orchestra: Treat EOFError as SSHException

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1886 from ceph/update-paramiko
Zack Cerza [Wed, 30 Aug 2023 17:34:16 +0000 (11:34 -0600)]
Merge pull request #1886 from ceph/update-paramiko

2 years agoMerge pull request #1884 from kamoltat/wip-ksirivad-fix-62445
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:59:27 +0000 (11:59 -0400)]
Merge pull request #1884 from kamoltat/wip-ksirivad-fix-62445

teuthology/scrape: Fix bad backtrace parsing in Teuthology.log
Reviewed-by Zack Cerza <zcerza@redhat.com>

2 years agoMerge pull request #1885 from ceph/fix-docs-build
Kamoltat (Junior) Sirivadhna [Wed, 30 Aug 2023 15:44:58 +0000 (11:44 -0400)]
Merge pull request #1885 from ceph/fix-docs-build

Fix docs build
Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 years agoUpdate paramiko 1886/head
Zack Cerza [Mon, 28 Aug 2023 20:07:19 +0000 (14:07 -0600)]
Update paramiko

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agotox: Avoid buggy sphinx versions 1885/head
Zack Cerza [Wed, 23 Aug 2023 19:24:56 +0000 (13:24 -0600)]
tox: Avoid buggy sphinx versions

See https://github.com/ceph/teuthology/pull/1884

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agosetup.cfg: Drop license_file
Zack Cerza [Wed, 23 Aug 2023 19:21:43 +0000 (13:21 -0600)]
setup.cfg: Drop license_file

It's deprecated in favor of `license_files`, but the default value is
sufficient.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoteuthology/scrape: Fix bad backtrace parsing in Teuthology.log 1884/head
Kamoltat Sirivadhna [Thu, 17 Aug 2023 16:28:00 +0000 (12:28 -0400)]
teuthology/scrape: Fix bad backtrace parsing in Teuthology.log

Problem:

- confusing warning message stating that
the back trace is malformed

- We kept adding to the backtrace buffer
even when we exceeded the `MAX_BT_LINES`

Solution:

- Correct the warning message to be
"Ignoring backtrace that exceeds MAX_BT_LINES"
- reset the buffer once we exceeded MAX_BT_LINES
- Added some cases where we detect start/end of back trace.

Fixes:https://tracker.ceph.com/issues/62445

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
2 years agoMerge pull request #1883 from ceph/nuke-desc-typeerror
Dan Mick [Tue, 15 Aug 2023 19:40:36 +0000 (12:40 -0700)]
Merge pull request #1883 from ceph/nuke-desc-typeerror

nuke: Avoid a TypeError w/ null node description

2 years agonuke: Avoid a TypeError w/ null node description 1883/head
Zack Cerza [Tue, 15 Aug 2023 18:05:38 +0000 (12:05 -0600)]
nuke: Avoid a TypeError w/ null node description

This avoids a `TypeError: argument of type 'NoneType' is not iterable`
when nuking a node whose description is None.

ex: https://sentry.ceph.com/share/issue/91172146663f4c71a6cbfe43725b2e07/

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1882 from ceph/sentry-reimage-taskname
Dan Mick [Mon, 14 Aug 2023 20:34:08 +0000 (13:34 -0700)]
Merge pull request #1882 from ceph/sentry-reimage-taskname

supervisor.reimage(): Improve Sentry reporting

2 years agosupervisor.reimage(): Improve Sentry reporting 1882/head
Zack Cerza [Mon, 14 Aug 2023 18:48:47 +0000 (12:48 -0600)]
supervisor.reimage(): Improve Sentry reporting

Set the `task` tag value to 'reimage' when reporting reimage failures to
Sentry, to make searching for them in its UI easier.

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1881 from ceph/stdin-killer-setpgrp
Zack Cerza [Fri, 4 Aug 2023 17:56:20 +0000 (11:56 -0600)]
Merge pull request #1881 from ceph/stdin-killer-setpgrp

stdin-killer: do not setpgrp is already leader

2 years agostdin-killer: do not setpgrp if already leader 1881/head
Patrick Donnelly [Fri, 4 Aug 2023 13:17:28 +0000 (09:17 -0400)]
stdin-killer: do not setpgrp if already leader

Fixes failure like:

    2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr:Traceback (most recent call last):
    2023-08-03T19:40:10.942 INFO:teuthology.orchestra.run.smithi100.stderr:  File "/usr/bin/stdin-killer", line 213, in <module>
    2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr:    os.setpgrp()
    2023-08-03T19:40:10.943 INFO:teuthology.orchestra.run.smithi100.stderr:PermissionError: [Errno 1] Operation not permitted

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2 years agoMerge pull request #1880 from ceph/wip-62286
Zack Cerza [Wed, 2 Aug 2023 20:25:13 +0000 (14:25 -0600)]
Merge pull request #1880 from ceph/wip-62286

2 years agoPhysicalConsole: Tolerate invalid UTF-8 characters 1880/head
Zack Cerza [Wed, 2 Aug 2023 18:03:21 +0000 (12:03 -0600)]
PhysicalConsole: Tolerate invalid UTF-8 characters

... in pexpect.spawn() calls.

Fixes: https://tracker.ceph.com/issues/62286
Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoPhysicalConsole.check_status(): Use log.exception
Zack Cerza [Wed, 2 Aug 2023 17:04:21 +0000 (11:04 -0600)]
PhysicalConsole.check_status(): Use log.exception

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1845 from NitzanMordhai/wip-nitzan-correct-typo-osd-default-pool...
Zack Cerza [Wed, 2 Aug 2023 16:33:38 +0000 (10:33 -0600)]
Merge pull request #1845 from NitzanMordhai/wip-nitzan-correct-typo-osd-default-pool-size

2 years agoMerge pull request #1877 from ceph/sentry-ae
Dan Mick [Tue, 1 Aug 2023 00:51:49 +0000 (17:51 -0700)]
Merge pull request #1877 from ceph/sentry-ae

supervisor: Fix an AttributeError in reimage()

2 years agosupervisor: Fix an AttributeError in reimage() sentry-ae 1877/head
Zack Cerza [Mon, 31 Jul 2023 23:31:43 +0000 (17:31 -0600)]
supervisor: Fix an AttributeError in reimage()

Signed-off-by: Zack Cerza <zack@redhat.com>
2 years agoMerge pull request #1859 from ceph/quiet-urllib3
Dan Mick [Fri, 28 Jul 2023 23:08:05 +0000 (16:08 -0700)]
Merge pull request #1859 from ceph/quiet-urllib3

Turn down logging for urllib3.util.retry