Shraddha Agrawal [Fri, 21 Aug 2020 16:14:46 +0000 (21:44 +0530)]
don't try to delete job from server if first or last in suite
This commit removes trying to delete job from server in case the
job is first or last in suite. As first and last job are not
reported to the server in the first place, we don't need to delete
them.
Shraddha Agrawal [Wed, 19 Aug 2020 13:52:58 +0000 (19:22 +0530)]
archive logs before nuking machines when job times out
This commit does the following:
1. teuthology/task/internal/__init__.py: Adds the file path of ceph
log directory to the job's info.yaml log file.
2. teuthology/dispatcher/supervisor.py: Compress and transfer the
log dirs listed in info.yaml to teuthology host before nuking test
machines incase job times out.
Shraddha Agrawal [Mon, 17 Aug 2020 18:22:18 +0000 (23:52 +0530)]
use block_and_lock_machines method to lock machines
This commit adds the method block_and_lock machines to ops.py
to enable locking machines in dispatcher. This is to ensure that
lock_machines task is not used in dispatcher.
Shraddha Agrawal [Tue, 11 Aug 2020 14:27:06 +0000 (19:57 +0530)]
send config file path instead of file descriptor
This commit saves job config in its archive dir and sends its
path instead of file descriptor of a temporary file in the
dispatcher and the supervisor.
Shraddha Agrawal [Mon, 10 Aug 2020 13:44:55 +0000 (19:14 +0530)]
unlock taregts after job run in teuthology-dispatcher supervisor
This commit unlocks target machines in teuthology-dispacther's
supervisor mode after job completes run and teuthology subprocess
exits. It either unlock nodes or nukes them depending on job's
status.
1. a new cmd teuthology-dispatcher: It watches a queue and takes
job from it, locks required nodes without reimaging them and runs
the job as its suprocess by invoking teuthology-dispacther in
supervisor mode. Supervisor mode reimages the target machines in
the config, and invokes teuthology cmd to run the job.
2. refactors task/internal/lock_machines.py: doing so enables
locking machines in dispatcher while following DRY.
3. refactors reimaging logic in lock/ops.py: doing so enables
reimaging machines in dispatcher's supervisor mode while following
DRY.
4. adds an argument, reimage, to lock_many in lock/ops.py: enables
optional reimagining of machines depending on the value. Defaults
to True. Used in dispatcher to lock machines without reimaging them.
please note, some packages not related to python-openstackclient are
also updated. the reason is that, because before this change, all
packages are pin'ed to a certain version, but after this change
only the direct dependencies are pin'ed using requirements.txt,
so, if any of the direct dependencies does not pin its own dependencies
to a certain version, pip-compile will pick the latest available version
from pypi to fulfill the requirement, and generated requirements.txt
accordingly.
requirements.in: add requirements.in back for tracking pinned direct requirements
and update `update-requirements.sh` to use requirements.in for building
requirements.txt
for some install_requires versus requirements.txt, see
https://packaging.python.org/discussions/install-requires-vs-requirements/.
in an ideal world, we should have not put those non-essential dependencies in
``setup.py``.
teuthology: don't use git to define version for released teuthology
After installing teuthology using pip there can be found error messages
in the log when running any command, for example 'teuthology --version':
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Can't get version from git rev-parse Command '['git', 'rev-parse', '--short', 'HEAD']' returned non-zero exit status 128.
1.0.0
Kyr Shatskyy [Sun, 10 May 2020 14:45:09 +0000 (16:45 +0200)]
suite: allow override architecture
By default for generation job configuration and filtering
them out there is 'arch' field is used determined automatically
via request to paddles database for give machine_type.
This makes it possible to override the 'arch' value with
teuthology-suite '--arch' parameter. This is only useful
at the moment just when a user don't one to make any request
to paddles.
Originally the arch is used to filter out the suites
which are not supposed to be queued on the given nodes.
In future we probably need to have tests with heterogeneous
configuration which will use multiple architectures.
run_tasks.py: Python3 don't have exception.message anymore.
This was probably working fine due to six library which was removed
recently: https://github.com/ceph/teuthology/commit/9f99b9298265be583c350c71de47109bac4bf6f1
Error seen:
```
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/worker/src/github.com_ceph_teuthology_master/teuthology/contextutil.py", line 33, in nested
yield vars
File "/home/worker/src/github.com_ceph_teuthology_master/teuthology/task/install/__init__.py", line 612, in task
yield
File "/home/worker/src/github.com_ceph_teuthology_master/teuthology/run_tasks.py", line 151, in run_tasks
if e.message == 'too many values to unpack':
AttributeError: 'ValueError' object has no attribute 'message'
```
Nathan Cutler [Tue, 30 Jun 2020 12:14:45 +0000 (14:14 +0200)]
kernel: add kernel version matching code for opensuse
The current code for checking if the running kernel version matches the
most recent kernel version in the repos does not work on opensuse when
"-k distro" is given.
This commit adds an opensuse-specific codepath with a version match
check that works in the opensuse testing environment.
Vasu Kulkarni [Thu, 24 Jan 2019 21:47:58 +0000 (13:47 -0800)]
scripts: add cli tools for reimaging nodes without locking
Add teuthology-reimage cli tool to be able to provision nodes using
Fog or Pelagos without locking and unlocking.
This is useful, for example, when someone just locks the node for
development or debugging purposes and do not want to release while
resetting the image, because it can happen that there are no free
nodes available.
Kyr Shatskyy [Tue, 12 May 2020 16:27:49 +0000 (18:27 +0200)]
schedule: do not report status for first and last in suite jobs
Addresses the issue when teuthology run gets stuck with
first_in_suite or laste_in_suite jobs in queued state.
Attention: This change requires the next steps,
which are not mutually exclusive:
1) server teuthology worker restart, otherwise old
worker's code will try to remove reported job
from paddles and exit with unexpected exception.
2) user's teuthology runner environment should be
updated to recent code, because new workers will
not cleanup FIS and LIS jobs and they will remain
in paddles, correspondingly the run will get stuck.
Kefu Chai [Fri, 19 Jun 2020 07:59:49 +0000 (15:59 +0800)]
drop requirements.in
dependabot adds an entry in generated requirements.txt like
```
-e file:///home/dependabot/dependabot-updater/tmp/dependabot_20200617-72-1n8af4b # via -r requirements.in
```
this is expected. as dependabot does not read docs/INSTALL.rst, hence
failed to apply
```
sed '/^-e / d' to the generated file.
instead of teaching dependabot to read the manual, it's simpler
just ditch requirements.in .
see also #1367
* update-requirements.sh: add a helper script for updating
requirements.txt
* requirements.in: removed
* docs/INSTALL.rst: updated to point user to the new
helper script
Kefu Chai [Tue, 23 Jun 2020 07:25:00 +0000 (15:25 +0800)]
suite/run.py: check config['verify_ceph_hash'] before verifying ceph packages
there is chance that we don't have ceph packages built for the
combination of specified os_type and os_version, for instance, in the
case of rados/thash_old_clients, older ceph clients are installed on
el7, but the ceph cluster is deployed using cephadm, which in turn pull
ceph container images built using the ceph being tested and el8.
since we've dropped the build of master on el7, there is no need to
verify if ceph package is available if cephadm is used for deploying the
cluster.