Kamoltat [Tue, 25 Jan 2022 01:02:56 +0000 (01:02 +0000)]
docs/docker-compose: add ansible inventory to README
Added instructions that will help users
set up ansible inventory files in after they
built their local teuthology container. This
part is needed when ansible tasks are executed
when jobs are running.
Dan Mick [Tue, 16 Nov 2021 01:37:23 +0000 (17:37 -0800)]
test_misc.py: fix bad assumption about LogRecord fields
The test was using LogRecord's asctime attribute to calculate a time
difference between two log entries. Although the attribute is documented
with no caveat, others have run into the problem that it does not exist
on logging.LogRecord unless a formatter with a format string referencing
{asctime} has been used. Since there's a 'created' time that's more
appropriate for this test anyway, use that instead.
This commit enables updating pytest, because pytest's logging init
code has changed: https://github.com/pytest-dev/pytest/discussions/9324
Dan Mick [Tue, 9 Nov 2021 19:49:09 +0000 (11:49 -0800)]
teuthology/packaging.py: fix build_complete: search for requested arch
The workaround from https://github.com/ceph/teuthology/pull/1649 was
necessary because my original algorithm was faulty: when searching
through all the builds for a ref/sha1, one must match the arch
requested by the call to build_complete (in the Builder object);
that arch's presence in the shaman api/search result is not enough
of a match, as it can contain multiple arches in multiple states
of build success. Only a failure *on the requested arch* should be
considered a "requested build not complete".
(note: this will still currently fail a request for a build whose
repo is complete but container build failed, as "build complete"
currently conflates those two statuses. Teuthology does not
contain the information whether a build is being requested for
packages, containers, or both.)
Also add testing for build_complete().
Fixes: https://tracker.ceph.com/issues/53205 Signed-off-by: Dan Mick <dmick@redhat.com>
Sage Weil [Thu, 28 Oct 2021 13:58:39 +0000 (08:58 -0500)]
task/internal/__init__: print core file output before splitting
Debugging this failure:
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_c56135d151713269e811ede3163c9743c2e269de/teuthology/run_tasks.py", line 176, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_c56135d151713269e811ede3163c9743c2e269de/teuthology/task/internal/__init__.py", line 398, in archive
fetch_binaries_for_coredumps(path, rem)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_c56135d151713269e811ede3163c9743c2e269de/teuthology/task/internal/__init__.py", line 320, in fetch_binaries_for_coredumps
dump_program = dump_out.split("from '")[1].split(' ')[0]
IndexError: list index out of range
...on output that should look like this:
./remote/smithi084/coredump/1635398181.133353.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/bin/podman stop ceph-462d7c58-37ab-11ec-8c28-001a4aab830c-node-exporter-smithi', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/bin/podman', platform: 'x86_64'
Zack Cerza [Wed, 20 Oct 2021 18:51:30 +0000 (12:51 -0600)]
supervisor: Don't unlock nodes w/ bad description
Very rarely, we enter a situation where nodes get used by two jobs
simultaneously. We can break this cycle if jobs refuse to unlock a
node that is locked by a different job.
This will not entirely prevent the problem, but it will keep it from
perpetuating itself.
gitbuilder has been replaced by shaman project and no longer being used,
the same functional testing we are doing with TestShamanProject,
therefore there is no need to maintain testing for obsolete code.
s/basic/default we no longer need mapping of basic to default
finally we are substituting basic as default when tring to do URI search,
we can remove this mangling and directly make use of default flavor,
we do not use flavor 'basic' for determining kernel flavor now.
since addressing comment:
FIXME: ceph flavor and kernel flavor are separate things
remove basic -> default(flavor) mapping & update s/basic/default in docs
Sage Weil [Thu, 7 Oct 2021 15:03:22 +0000 (10:03 -0500)]
tasks/kernel: add 'hwe' config flag for ubuntu distro hwe kernel
The hwe kernel supports nvme_loop, but the non-hwe kernel does not.
I don't want to futz with the 'distro' moniker (although that is another
valid approach) because only some tests need hwe, and I can imagine a
situation where we want to run tests on both kernels. Note that this
flag doesn't rule out adding support for something like
'-k distro-hwe' later.
In unlock_one, we currently have a retry mechanism that is only triggered on a particular exception. With this change, we retry the request to unlock no matter what the cause of failure.
the function `try_push_job_info()` is not
updating `job_info` dictionary properly since
we want to update `job_info` with `extra_info`,
however, in lines 498 and 499 we are assigning
`job_info` to a copy of `extra_info` and updating
`job_info` with `job_config` which is incorrect.
Instead, we should assign `job_info` with
a copy of `job_config` and update `job_info` with
`extra_info`
The previous behavior was causing machines to get nuked before any
attempt to fetch logs. If a machine took longer than 60s to become
available, collecting logs would fail. Since we also nuke after this
step, don't bother here.
Kefu Chai [Sat, 28 Aug 2021 01:10:12 +0000 (09:10 +0800)]
bootstrap,tox.ini: set a UTF-8 locale for decoding non-latin chars
Python3 uses locale.getdefaultlocale() to get the locale, which is used
to determine the encoding of filenames.
see https://docs.python.org/3/library/locale.html
there is chance that the specified locale is not capable of decoding the
filenames encoded in non-latin-1 characters, in that case, pip just fails
to decompress that file, like:
me/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 550, in _prepare_linked_requirement
local_file = unpack_url(
File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 249, in unpack_url
unpack_file(file.path, location, file.content_type)
File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 256, in unpack_file
untar_file(filename, location)
File "/home/jenkins-build/build/workspace/teuthology-pull-requests/.tox/flake8/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
in this change, UTF-8 is used, and also change the lang part to
en_US, so the output should be readable to who can read English.
Kyr Shatskyy [Fri, 6 Aug 2021 14:20:02 +0000 (16:20 +0200)]
bootstrap: drop hardcode LC_ALL=C
Trying to resolve UnicodeEncodeError while trying to install bootstrap:
ERROR: Exception:
Traceback (most recent call last):
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
return func(self, options, args)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
reqs, check_supported_wheels=not options.target_dir
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
if not criterion.candidates:
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
return bool(self._sequence)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
return any(self)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
candidate = func()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
version=version,
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in __init__
version=version,
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
dist = self._prepare_distribution()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
self._ireq, parallel_builds=True
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
self.download_dir, hashes
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 249, in unpack_url
unpack_file(file.path, location, file.content_type)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 256, in unpack_file
untar_file(filename, location)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 112: ordinal not in range(128)
suite: rename disable-num-jobs-check to job-threshold
Rename --disable-num-jobs-check to --job-threshold:
- for shorter recallable name;
- to allow change threshold value via parameter;
- to allow define default threshold value in teuthology config.
Use `--job-threshold 0` to disable job threshold check.
teuthology/suite/run.py, scripts/suite.py: disallow scheduling too many jobs
Add check_num_jobs() to prevent users from accidentally scheduling too many jobs, like
in rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distro-basic-smithi.
JOBS_TO_SCHEDULE_THRESHOLD, set to 500 (most runs have fewer jobs than this),
will disallow users from scheduling more than 500 jobs. Users can schedule
more than 500 jobs by disabling this check using the --disable-num-jobs-check flag.
Sage Weil [Wed, 9 Jun 2021 19:51:37 +0000 (14:51 -0500)]
nuke: kill -9 the teuthology process
If the process has been kill -STOPped, then we'll unlock the machines, but
the process will stick around and we'll try to nuke it again later,
zapping the machines after they're being used by some other job, leading
to failures. (Usually this manifests as an error when the other job stops
where it has trouble gzipping the logs.)
Use -9 to make sure even STOPped processes are killed.
teuthology-suite: pick _machine_type /etc/teuthology.yml if not specified explicitly
right now, users have to always pass --machine-type when scheduling a
run, when not specified, command fails with no machine type specified
error.
Instead of failing, we can have `default_machine_type` which in our case
should pick smithi, specified in /etc/teuthology.yml