suite/run.py: update sha1 from basic_config to parsed_yaml
This is because during backtracking to find
build when using --newest, the new build/suite
sha is updated on basic_config. And job's parsed_yaml
is never updated with new sha.
suite/test/test_run.py: Add tests for SHA1 handling with --newest
Added two test cases to verify SHA1 handling when using --newest backtracking:
1. test_newest_success_same_branch_same_repo: Tests when ceph_branch and
suite_branch are the same. Verifies that both ceph_hash and suite_hash
are updated to the backtracked working SHA1.
2. test_newest_success_diff_branch_diff_repo: Tests when ceph_branch and
suite_branch differ. Verifies that only ceph_hash is updated to the
working SHA1, while suite_hash remains as the original suite_sha1.
Both tests verify the complete flow through collect_jobs() and
schedule_suite(), ensuring the YAML files generated for each job
contain the correct SHA1 references.
suite/run.py: suite_hash should use backtracked sha1
In schedule_suite() we only try to find backtracked sha1
for ceph sha1 when --newest is provided and packages are
not found in shaman. However, what about suite_sha1?
Currently, we do not use the backtracked sha1 for suite_sha1.
This commit will make sure that suite_sha1 is using
the backtracked sha1 when ceph_branch and suite_branch are the same and
ceph_repo and suite_repo are the same. This ensures that suite_sha1
only use the backtracked sha1 when the user clearly expects
shaman to not have the latest sha1 and want to use the --newest sha1
for both the ceph code base and suite qa code base.
Vallari Agrawal [Thu, 12 Jun 2025 14:13:57 +0000 (19:43 +0530)]
teuthology/schedule.py: update parsed_yaml with base_config
In `schedule_suite` method,
`self.base_config` gets updated many times while backtracking
when using "--newest" flag. These changes were never updated
to `parsed_yaml` (job yaml) in `configs`.
As `configs` is intialised before backtracking.
Before, we used to write base_config to a tmp
file and pass that to teuthology-schedule
which used to take care of updates to base_config.
But this logic was removed in https://github.com/ceph/teuthology/pull/2008/files
so the updates to base_config then never make
it to job yaml anymore.
Scrape script that trying to find backtrace in gzip log files
can hit TypeError: a bytes-like object is required, not 'str' error
and fail to collect results. the gzip file need to be decoded.
task/install/rpm: drop code duplication for extra_system_packages
After introducing install_ceph_packages option there is added
new extra_system_packages list append instruction which made
adding the packages twice if the option is set to False.
So we just drop the first attempt to update install package list.
Get rid of package names duplication when extra_system_packages provided
for rpm.
For example, when it is requested 'bzip2' and 'perl-Test-Harness' to be
installed as extra system packages, there can be observed 5 times
mentioning of the packages, see the log excerpt:
provision/downburst: add ntp or chrony to cloud-init packages
Since downburst may use default cloud images which mostly
miss ntp by default, make sure we preinstall ntp or chrony
required for ceph cluster to sync the date time, otherwise
osd self scrub check will fail.
Zack Cerza [Tue, 4 Mar 2025 23:37:53 +0000 (16:37 -0700)]
node-cleanup: Grace period for inactive jobs
Once a job is marked finished, the supervisor may still be waiting to unlock its
nodes. Give jobs five minutes to clean up nodes before we consider them "stale".
The original ea170935d4b1c78dc6fd6beae6b3fda65b296f57 removed a method that had
been moved to ceph.git but that broke upgrade tests from releases up to Octopus
where get_valgrind_args had not been backported. So a revert was done until all
upgrade paths have the method.
Now that Quincy is EOL, we can revert the revert.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Kyrylo Shatskyy [Wed, 26 Feb 2025 18:42:05 +0000 (19:42 +0100)]
orchestra/opsys: add tumbleweed version to the distro version map
openSUSE Tumbleweed is a rolling release, so it has fixed version number
and changable version id is represented by a date of snapshot,
example of vm image:
In comparison, openSUSE Leap has regular release, incremental version, for example:
15.0, 15.1, ..., 15.6, etc., and recently introduced 16.0 alpha.
Because there is no sense to stick to the date, it is changing almost daily,
it is suggested to fix Tumbleweed version to 1.0 to distinguish from Leap.
As a side effect, once Teuthology locks a node with imaged Tumbleweed
it updates os version correctly to date of release in paddles, and
pulpito display it interface correspondingly as the date.
Kyr Shatskyy [Fri, 10 Jan 2025 14:09:24 +0000 (15:09 +0100)]
lock/cli: don't update inventory if failed to create a vm
If we were not able to create a vm we don't need to update machine's
inventory, because there is going to be an ssh connection established
to the host and teuthology gets stuck infinitely trying to connect to
the machine which does not exist.
Zack Cerza [Mon, 6 Jan 2025 22:33:08 +0000 (15:33 -0700)]
containers/teuthology-dev: Remove access token
This container is built and pushed via GitHub Actions. GHA likes to provision a
personal access token for each job that gives tightly-scoped access to the git
repository to the job. When we build our container, we end up including
`.git/config`, which contains the token. Later, in ceph-dev-stack's CI, an
`ls-remote` is run against ceph.git, which ends up causing git to prompt for
credentials even though the repo is public. Removing the token should allow
reading all the relevant repos from the built container image.
John Mulligan [Fri, 9 Aug 2024 14:15:15 +0000 (10:15 -0400)]
config: allow reading teuthology config from env var location
Allow changing the default "user" location of the teuthology
configuration yaml using the (optional) TEUTHOLOGY_CONFIG environment
variable. This change aids my effort to run a customized local
teuthology environment.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Nitzan Mordechai [Tue, 10 Dec 2024 06:12:00 +0000 (06:12 +0000)]
teuthology/misc: Add timeout parameter to stop_daemons_of_type for better flexibility
Updated stop_daemons_of_type to accept a timeout parameter,
allowing dynamic control over the timeout value passed to the
stop function of each daemon.
Patrick Donnelly [Fri, 18 Oct 2024 12:44:13 +0000 (08:44 -0400)]
teuthology/suite: merge base_config with other fragments
Presently the code tries to merge the base_config when the worker starts
running. There's no need to construct it this way and it prevents sharing the
"defaults" with the fragment merging infrastructure. It also prevents
overriding defaults like:
A YAML fragment can set kernel.client but it cannot delete the defaults for
kernel.(branch|flavor|kdb|sha1) because there's no way to remove YAML elements
via a deep merge.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>