Kyr Shatskyy [Fri, 6 Aug 2021 14:20:02 +0000 (16:20 +0200)]
bootstrap: drop hardcode LC_ALL=C
Trying to resolve UnicodeEncodeError while trying to install bootstrap:
ERROR: Exception:
Traceback (most recent call last):
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
return func(self, options, args)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
reqs, check_supported_wheels=not options.target_dir
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
if not criterion.candidates:
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
return bool(self._sequence)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
return any(self)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
candidate = func()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
version=version,
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in __init__
version=version,
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
dist = self._prepare_distribution()
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
self._ireq, parallel_builds=True
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
self.download_dir, hashes
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 249, in unpack_url
unpack_file(file.path, location, file.content_type)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 256, in unpack_file
untar_file(filename, location)
File "/home/opensuse/teuthology/virtualenv/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 112: ordinal not in range(128)
suite: rename disable-num-jobs-check to job-threshold
Rename --disable-num-jobs-check to --job-threshold:
- for shorter recallable name;
- to allow change threshold value via parameter;
- to allow define default threshold value in teuthology config.
Use `--job-threshold 0` to disable job threshold check.
teuthology/suite/run.py, scripts/suite.py: disallow scheduling too many jobs
Add check_num_jobs() to prevent users from accidentally scheduling too many jobs, like
in rfriedma-2021-06-26_19:32:15-rados-wip-ronenf-scrubs-config-distro-basic-smithi.
JOBS_TO_SCHEDULE_THRESHOLD, set to 500 (most runs have fewer jobs than this),
will disallow users from scheduling more than 500 jobs. Users can schedule
more than 500 jobs by disabling this check using the --disable-num-jobs-check flag.
Sage Weil [Wed, 9 Jun 2021 19:51:37 +0000 (14:51 -0500)]
nuke: kill -9 the teuthology process
If the process has been kill -STOPped, then we'll unlock the machines, but
the process will stick around and we'll try to nuke it again later,
zapping the machines after they're being used by some other job, leading
to failures. (Usually this manifests as an error when the other job stops
where it has trouble gzipping the logs.)
Use -9 to make sure even STOPped processes are killed.
teuthology-suite: pick _machine_type /etc/teuthology.yml if not specified explicitly
right now, users have to always pass --machine-type when scheduling a
run, when not specified, command fails with no machine type specified
error.
Instead of failing, we can have `default_machine_type` which in our case
should pick smithi, specified in /etc/teuthology.yml
Sage Weil [Fri, 14 May 2021 15:31:25 +0000 (10:31 -0500)]
task/internal/syslog: ignore misc.log
These regexes are all intended for kernel errors. Ceph daemon
logs may leak into misc.log (*shakes fist at systemd-journald*)
and cause false positives (i.e., test failures).
WARNING: --use-feature=2020-resolver no longer has any effect, since it
is now the default dependency resolver in pip. This will become an error
in pip 21.0.
Retry paddles requests, and for get_status() return an empty dict
rather than None so callers behave.
get_status() failing in particular has caused the dispatcher and jobs
to fail several times over the past few weeks. With this change, we
should be able to run multiple paddles workers again, since all the
common callers will retry on error.
Rishabh Dave [Thu, 4 Mar 2021 11:08:37 +0000 (16:38 +0530)]
orchestra: move methods for shell commands from remote.Remote
Move methods that issue commands via shell and that don't necessarily
need to depend on SHH from class Remote to a different class. This
enables applications like vstart_runner.py (in Ceph repo) to reuse these
methods for running tests locally without necessarily depending on SSH
and without duplicating them in vstart_runner.py.
This is causing log gzipping to fail because the logs already exist as .gz files.
My guess is that the logs are left over from previous, but I'm not sure how
that would happen.
In any case, the merge of this PR corresponds exactly to when we started seeing
the log gzip failures.
Sage Weil [Fri, 12 Mar 2021 17:58:47 +0000 (11:58 -0600)]
task/internal: ignore systemd-sysusers core file
This is related to dnsmasq. When installing hte kubic podman 3.0.1
packages,
Running scriptlet: dnsmasq-2.79-13.el8_3.1.x86_64 14/16
/var/tmp/rpm-tmp.6MFp00: line 5: 9079 Segmentation fault (core dumped) systemd-sysusers - &> /dev/null <<SYSTEMD_INLINE_EOF
u dnsmasq - "Dnsmasq DHCP and DNS server" /var/lib/dnsmasq
SYSTEMD_INLINE_EOF
Installing : dnsmasq-2.79-13.el8_3.1.x86_64 14/16
warning: group dnsmasq does not exist - using root
warning: group dnsmasq does not exist - using root
warning: group dnsmasq does not exist - using root
Kyr Shatskyy [Fri, 26 Feb 2021 13:13:31 +0000 (14:13 +0100)]
requirements.in: stick ansible version to 2.8 version
Since we are not ready for ansible 3 from ceph-cm-ansible point of view:
2021-02-26T12:45:17.668 INFO:teuthology.task.ansible.out:ERROR! couldn't resolve module/action 'firewalld'. This often indicates a misspelling, missing collection, or incorrect module path.
Josh Durgin [Tue, 9 Feb 2021 21:16:46 +0000 (21:16 +0000)]
supervisor: kill processes before gathering logs
When we hit the max job timeout, we need to stop the test programs
before collecting logs or else we run into errors like 'file size
changed while zipping' trying to compress them, and we can't save them
or stop the job.
Josh Durgin [Tue, 9 Feb 2021 19:24:02 +0000 (19:24 +0000)]
nuke: allow not rebooting again
The default behavior was changed to always reboot in 1d47a121b385e2656e9314e9d63faf68a8e865e4 but the --reboot-all option
remained. Keep the original option around for compatibility with
existing scripts.