Patrick Donnelly [Thu, 23 May 2024 14:05:23 +0000 (10:05 -0400)]
teuthology/suite: use paddles solely for rerun info
teuthology.front no longer allows access to the run directories for general
users. In fact, it's not required because we are already getting run
information from paddles.
This commit makes it more stringent in --rerun handling: if
seed/subset/no_nested_subset do not match, exit failure rather than try to
continue. Without a subset, many suites are so large that teuthology-suite will
run for several minutes before ultimately failing to schedule due to the
quantity of jobs. It also doesn't make sense to rerun a suite when these values
do not match.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Laura Flores [Fri, 3 May 2024 17:07:40 +0000 (12:07 -0500)]
teuthology: modify logic to check for multiple completed builds
The current logic assumes that there is only one build for each distro/flavor
per SHA1. However, there is a bug in the jenkins infrastrucutre that sometimes
causes multiple builds to trigger for one SHA1. In many of these cases, the first
build succeeds, but the second fails. Teuthology only looks at the latest build,
notices that it failed, and gives up. However, with this logic, teuthology can
go back farther and notice that there is indeed a successful build earlier in the
lineup.
Here is an example in which the first centos 8 x86_64 build succeeded, but a second
build on top of it failed. Teuthology could only detect the latest failed build:
https://shaman.ceph.com/builds/ceph/wip-pdonnell-testing-20240503.010653-debug/ec1d3bd17a3db9d74296aa618f8d63c801bb647e/
Addresses this failure in teuthology:
lflores@teuthology:~$ ./teuthology/virtualenv/bin/teuthology-suite -v -m smithi -c wip-pdonnell-testing-20240503.010653-debug -s fs --subset 111/12000 -p 75 --dry-run
2024-05-03 16:39:35,231.231 INFO:teuthology.suite:Using random seed=9685
2024-05-03 16:39:35,232.232 INFO:teuthology.suite.run:kernel sha1: distro
2024-05-03 16:39:35,673.673 DEBUG:teuthology.repo_utils:git ls-remote https://git.ceph.com/ceph-ci.git wip-pdonnell-testing-20240503.010653-debug -> ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:35,673.673 INFO:teuthology.suite.run:ceph sha1: ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:35,674.674 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F8%2Fx86_64&sha1=ec1d3bd17a3db9d74296aa618f8d63c801bb647e
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:looking for centos/8 x86_64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 arm64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/9 x86_64 crimson
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/9 x86_64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 arm64 default
2024-05-03 16:39:36,176.176 DEBUG:teuthology.packaging:build: centos/8 x86_64 crimson
2024-05-03 16:39:36,177.177 DEBUG:teuthology.packaging:build: centos/8 x86_64 default
2024-05-03 16:39:36,178.178 INFO:teuthology.suite.util:Container build incomplete
Traceback (most recent call last):
File "./teuthology/virtualenv/bin/teuthology-suite", line 8, in <module>
sys.exit(main())
File "/cephfs/home/lflores/teuthology/scripts/suite.py", line 226, in main
return teuthology.suite.main(args)
File "/cephfs/home/lflores/teuthology/teuthology/suite/__init__.py", line 143, in main
run = Run(conf)
File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 56, in __init__
self.base_config = self.create_initial_config()
File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 94, in create_initial_config
self.choose_ceph_version(ceph_hash)
File "/cephfs/home/lflores/teuthology/teuthology/suite/run.py", line 216, in choose_ceph_version
util.schedule_fail(msg, self.name, dry_run=self.args.dry_run)
File "/cephfs/home/lflores/teuthology/teuthology/suite/util.py", line 77, in schedule_fail
raise ScheduleFailError(message, name)
teuthology.exceptions.ScheduleFailError: Scheduling lflores-2024-05-03_16:39:35-fs-wip-pdonnell-testing-20240503.010653-debug-distro-default-smithi failed: Packages for os_type 'centos', flavor default and ceph hash 'ec1d3bd17a3db9d74296aa618f8d63c801bb647e' not found
More work should be done to fix the "double build" issue in jenkins, so this can be thought of as a workaround.
By testing access to /bin/true, we were getting false negatives as we meant
to be testing for access to /bin/kill. With our configuration, any sudo access
indicates access to /bin/kill.
Vallari Agrawal [Thu, 28 Mar 2024 08:32:55 +0000 (14:02 +0530)]
orchestra/daemon/cephadmunit.py: set is_started in start()
In CephadmUnit.start() method, `is_started` isn't set to
true. When running() is called after start(), then it would
return false, which is not correct since the daemon has been
started by calling start().
This commit fixes that issue.
Ilya Dryomov [Wed, 14 Feb 2024 19:05:54 +0000 (20:05 +0100)]
kernel: make get_image_version() work for rpm
At some point in the past, the layout of the rpm package has changed.
There is no file matching "/boot/vmlinuz-" there anymore, instead there
is "vmlinuz" file at the root of the modules directory. For reference:
Vallari Agrawal [Tue, 13 Feb 2024 13:46:01 +0000 (19:16 +0530)]
teuthology/__init__.py: don't patch threads when running via teuthology_api
The project [teuthology-api](https://github.com/ceph/teuthology-api)
requires threads to be not patched.
Currently, we are using "teuth-api" branch of teuthology where threads are
not patched. With this commit, we'll be able to use the "main" branch as
a dependency.
Zack Cerza [Thu, 1 Feb 2024 00:27:35 +0000 (17:27 -0700)]
Remove nuke: deletions
This commit contains only full file deletions, and the relocation of
nuke.actions.clear_firewall() to nuke/__init__.py to retain compatibility with
older ceph.git tasks.
Zack Cerza [Wed, 31 Jan 2024 01:56:19 +0000 (18:56 -0700)]
test_exit: Drop bad test_noop
This test races with other tests because Exiter doesn't have a great way to
remove all installed handlers. This is a test-only issue, so we can drop this
test.
Zack Cerza [Wed, 22 Nov 2023 01:25:56 +0000 (18:25 -0700)]
suite: Improve package query caching
We had our own "system" for caching, but it had the unfortunate characteristic
of being a big bowl of spaghetti. While eating said pasta I also noticed we
had two competing "distro defaults" concepts - so that let me delete even more
code. Yum!
In UnitTestScanner's final error message, add total count of failures
before the first error occurance, like "(total x failed) <message>".
Another minor change: add "..." if the failure reason is more than 200 chars.
This is nearly identical to docs/docker-compose/teuthology, but with
some changes to better work with ceph-devstack. The bits in
docs/docker-compose should be able to be adapted easily to work with
this container.