Vallari Agrawal [Mon, 20 Jan 2025 14:05:35 +0000 (19:35 +0530)]
Add a wait time to compress_logs()
This is to ensure that files are stable
before compressing them.
Fixes: https://tracker.ceph.com/issues/67420
Valgrind files were compressed while they were
still being written to. This commit should
allow them to be more stable before zipping
them.
Zack Cerza [Mon, 6 Jan 2025 22:33:08 +0000 (15:33 -0700)]
containers/teuthology-dev: Remove access token
This container is built and pushed via GitHub Actions. GHA likes to provision a
personal access token for each job that gives tightly-scoped access to the git
repository to the job. When we build our container, we end up including
`.git/config`, which contains the token. Later, in ceph-dev-stack's CI, an
`ls-remote` is run against ceph.git, which ends up causing git to prompt for
credentials even though the repo is public. Removing the token should allow
reading all the relevant repos from the built container image.
John Mulligan [Fri, 9 Aug 2024 14:15:15 +0000 (10:15 -0400)]
config: allow reading teuthology config from env var location
Allow changing the default "user" location of the teuthology
configuration yaml using the (optional) TEUTHOLOGY_CONFIG environment
variable. This change aids my effort to run a customized local
teuthology environment.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Patrick Donnelly [Fri, 18 Oct 2024 12:44:13 +0000 (08:44 -0400)]
teuthology/suite: merge base_config with other fragments
Presently the code tries to merge the base_config when the worker starts
running. There's no need to construct it this way and it prevents sharing the
"defaults" with the fragment merging infrastructure. It also prevents
overriding defaults like:
A YAML fragment can set kernel.client but it cannot delete the defaults for
kernel.(branch|flavor|kdb|sha1) because there's no way to remove YAML elements
via a deep merge.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Zack Cerza [Thu, 29 Aug 2024 23:03:20 +0000 (17:03 -0600)]
lock: Avoid querying paddles for non-jobs
When we encounter a node that's locked with a description that doesn't look
like it points to a job, avoid the inevitable 404 we'd get from paddles. Without
this, the cleanup process gets short-circuited.
Zack Cerza [Thu, 1 Aug 2024 18:16:04 +0000 (12:16 -0600)]
supervisor: Check for job expiration
This commit isn't strictly necessary for the feature's implementation, but will
allow testing the feature on the production teuthology cluster before merging.
This feature has two parts:
* Specifying expiration dates when scheduling test runs
* A global maximum age
Expiration dates are provided by passing `--expire` to `teuthology-suite` with
a relative value like `1d` (one day), `1w` (one week), or an absolute value like
`1999-12-31_23:59:59`.
A new configuration item, `max_job_age`, is specified in seconds. This defaults
to two weeks.
When the dispatcher checks the queue for the next job to run, it will first
compare the job's `timestamp` value - which reflects the time the job was
scheduled. If more than `max_job_age` seconds have passed, the job is skipped
and marked dead. It next checks for an `expire` value; if that value is in the
past, the job is skipped and marked dead. Otherwise, it will be run as usual.
Zack Cerza [Thu, 1 Aug 2024 19:50:51 +0000 (13:50 -0600)]
suite: Ensure teuthology config is consistent between tests
test_init.py was making modifications to the config object that persisted
between tests. When I fixed that, initially some tests in test_run_.py started
failing because of settings in my local ~/.teuthology.yaml. This change causes
all of the tests in suite.test to use default config values.
Kyr Shatskyy [Sat, 3 Aug 2024 15:08:48 +0000 (17:08 +0200)]
orcherstra/run: don't use pipes, but shlex
Finally get rid of deprecation warning for 'pipes':
teuthology/orchestra/run.py:12
/teuthology/teuthology/orchestra/run.py:12: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
import pipes
Kyr Shatskyy [Fri, 2 Aug 2024 17:32:33 +0000 (19:32 +0200)]
provision/downburst: ignore "username@" prefix in hostname
If downburst gets hostname as an argument which does not have
a username@ prefix (does not include @-symbol) there is an
index out of range error will occur, which we, obviously,
don't want and wish to allow hostnames don't use login names.
Kyr Shatskyy [Tue, 6 Aug 2024 22:54:00 +0000 (00:54 +0200)]
suite/util: list_lock() once for get_arch()
We only want to try to list_locks() for getting arch by
machine_type when scheduling a suite. Otherwise we've
got into an infinite loop and console log flooded
with useless error messages.
Kyr Shatskyy [Sun, 4 Aug 2024 22:06:33 +0000 (00:06 +0200)]
util/scanner: get rid of FutureWarning message
Get rid of the warning:
teuthology/util/test/test_scanner.py::TestValgrindScanner::test_scan_all_files
/Users/kyr/kshtsk/teuthology/teuthology/util/scanner.py:133: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if not xml_tree:
Kyr Shatskyy [Thu, 1 Aug 2024 21:58:08 +0000 (23:58 +0200)]
dispatcher: fix AccessDenied on process lookup
On macOS dispatcher while trying to go through process list gets stuck
on some of the system processes like launchd, logd, systemstats, etc.
and quites unexpectedly with PermissionError and psutil.AccessDenied
exceptions.
Zack Cerza [Tue, 25 Jun 2024 21:42:42 +0000 (15:42 -0600)]
dispatcher: Temporarily pass through to supervisor
The old dispatcher expects to be able to invoke the supervisor via
`teuthology-dispatcher --supervisor`, so add this compatibility shim for the
time being.
Zack Cerza [Mon, 24 Jun 2024 22:05:25 +0000 (16:05 -0600)]
Finish removing teuthology-worker
The dispatcher and supervisor were added in #1546, but code was copied and
pasted into the new modules, leaving the worker untouched. Also untouched were
the unit tests, meaning that the dispatcher and supervisor were never unit
tested. As the copied code changed, the dispatcher and supervisor were not being
tested for regressions, while the worker - which wasn't being anymore - had
passing unit tests, giving some false sense of security.
This commit removes the old worker code, and adapts the old worker tests to
apply to the dispatcher and supervisor. It also splits out teuthology-supervisor
into its own command.