Zack Cerza [Tue, 4 Mar 2025 23:37:53 +0000 (16:37 -0700)]
node-cleanup: Grace period for inactive jobs
Once a job is marked finished, the supervisor may still be waiting to unlock its
nodes. Give jobs five minutes to clean up nodes before we consider them "stale".
The original ea170935d4b1c78dc6fd6beae6b3fda65b296f57 removed a method that had
been moved to ceph.git but that broke upgrade tests from releases up to Octopus
where get_valgrind_args had not been backported. So a revert was done until all
upgrade paths have the method.
Now that Quincy is EOL, we can revert the revert.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Kyrylo Shatskyy [Wed, 26 Feb 2025 18:42:05 +0000 (19:42 +0100)]
orchestra/opsys: add tumbleweed version to the distro version map
openSUSE Tumbleweed is a rolling release, so it has fixed version number
and changable version id is represented by a date of snapshot,
example of vm image:
In comparison, openSUSE Leap has regular release, incremental version, for example:
15.0, 15.1, ..., 15.6, etc., and recently introduced 16.0 alpha.
Because there is no sense to stick to the date, it is changing almost daily,
it is suggested to fix Tumbleweed version to 1.0 to distinguish from Leap.
As a side effect, once Teuthology locks a node with imaged Tumbleweed
it updates os version correctly to date of release in paddles, and
pulpito display it interface correspondingly as the date.
Kyr Shatskyy [Fri, 10 Jan 2025 14:09:24 +0000 (15:09 +0100)]
lock/cli: don't update inventory if failed to create a vm
If we were not able to create a vm we don't need to update machine's
inventory, because there is going to be an ssh connection established
to the host and teuthology gets stuck infinitely trying to connect to
the machine which does not exist.
Zack Cerza [Mon, 6 Jan 2025 22:33:08 +0000 (15:33 -0700)]
containers/teuthology-dev: Remove access token
This container is built and pushed via GitHub Actions. GHA likes to provision a
personal access token for each job that gives tightly-scoped access to the git
repository to the job. When we build our container, we end up including
`.git/config`, which contains the token. Later, in ceph-dev-stack's CI, an
`ls-remote` is run against ceph.git, which ends up causing git to prompt for
credentials even though the repo is public. Removing the token should allow
reading all the relevant repos from the built container image.
John Mulligan [Fri, 9 Aug 2024 14:15:15 +0000 (10:15 -0400)]
config: allow reading teuthology config from env var location
Allow changing the default "user" location of the teuthology
configuration yaml using the (optional) TEUTHOLOGY_CONFIG environment
variable. This change aids my effort to run a customized local
teuthology environment.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Nitzan Mordechai [Tue, 10 Dec 2024 06:12:00 +0000 (06:12 +0000)]
teuthology/misc: Add timeout parameter to stop_daemons_of_type for better flexibility
Updated stop_daemons_of_type to accept a timeout parameter,
allowing dynamic control over the timeout value passed to the
stop function of each daemon.
Patrick Donnelly [Fri, 18 Oct 2024 12:44:13 +0000 (08:44 -0400)]
teuthology/suite: merge base_config with other fragments
Presently the code tries to merge the base_config when the worker starts
running. There's no need to construct it this way and it prevents sharing the
"defaults" with the fragment merging infrastructure. It also prevents
overriding defaults like:
A YAML fragment can set kernel.client but it cannot delete the defaults for
kernel.(branch|flavor|kdb|sha1) because there's no way to remove YAML elements
via a deep merge.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Zack Cerza [Thu, 29 Aug 2024 23:03:20 +0000 (17:03 -0600)]
lock: Avoid querying paddles for non-jobs
When we encounter a node that's locked with a description that doesn't look
like it points to a job, avoid the inevitable 404 we'd get from paddles. Without
this, the cleanup process gets short-circuited.
Zack Cerza [Thu, 1 Aug 2024 18:16:04 +0000 (12:16 -0600)]
supervisor: Check for job expiration
This commit isn't strictly necessary for the feature's implementation, but will
allow testing the feature on the production teuthology cluster before merging.
This feature has two parts:
* Specifying expiration dates when scheduling test runs
* A global maximum age
Expiration dates are provided by passing `--expire` to `teuthology-suite` with
a relative value like `1d` (one day), `1w` (one week), or an absolute value like
`1999-12-31_23:59:59`.
A new configuration item, `max_job_age`, is specified in seconds. This defaults
to two weeks.
When the dispatcher checks the queue for the next job to run, it will first
compare the job's `timestamp` value - which reflects the time the job was
scheduled. If more than `max_job_age` seconds have passed, the job is skipped
and marked dead. It next checks for an `expire` value; if that value is in the
past, the job is skipped and marked dead. Otherwise, it will be run as usual.
Zack Cerza [Thu, 1 Aug 2024 19:50:51 +0000 (13:50 -0600)]
suite: Ensure teuthology config is consistent between tests
test_init.py was making modifications to the config object that persisted
between tests. When I fixed that, initially some tests in test_run_.py started
failing because of settings in my local ~/.teuthology.yaml. This change causes
all of the tests in suite.test to use default config values.
Kyr Shatskyy [Sat, 3 Aug 2024 15:08:48 +0000 (17:08 +0200)]
orcherstra/run: don't use pipes, but shlex
Finally get rid of deprecation warning for 'pipes':
teuthology/orchestra/run.py:12
/teuthology/teuthology/orchestra/run.py:12: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
import pipes
Kyr Shatskyy [Fri, 2 Aug 2024 17:32:33 +0000 (19:32 +0200)]
provision/downburst: ignore "username@" prefix in hostname
If downburst gets hostname as an argument which does not have
a username@ prefix (does not include @-symbol) there is an
index out of range error will occur, which we, obviously,
don't want and wish to allow hostnames don't use login names.