teuthology/task/install/valgrind.supp: add suppression for Boost.Thread
Boost.Thread passes `tls_destructor` to `pthread_key_create()` in hope
to free the allocated memory stored in TLS key `current_thread_tls_key`,
but neither Boost.Thread nor us uses `pthread_exit()` for calling the
cleanup functions. and Boost.Thread is against `pthread_exit()`, see [0,1].
but Boost.Thread offers a preprocessor macro to define a global variable
whose destructor calls `tls_destructor()`, but per [2], this macro is
not defined by default. and per [3], this macro could cause assertion
failure in Boost. so it might be advisable to not define it, even we
could do so in BuildBoost.cmake.
and since this `Leak_StillReachable` leak is a one-shot thing. i am
adding it to the suppression file.
Add the word "seconds" at the end of the log message, since "time.sleep()"
takes a number which is always interpreted as the number of seconds to sleep
for.
Before this commit, the log said:
INFO:teuthology.task.sleep:Sleeping for 10
After:
INFO:teuthology.task.sleep:Sleeping for 10 seconds
before d488b9bd, these params were mandatory. after d488b9bd, they are
optional. because
- these parameters passed in only for "first-in-suite" job
- subset is not mandatory even for "first-in-suite", because there is
chance that user want to run the full combination of the test matrix.
worker: create archive_dir before putting log file in it
before d488b9bd, the memo for rerunning a suite is noted down by the
last-in-suite job. when the last-in-suite job is performed, the
archive_dir has been created by the jobs which performs tests, see
the `Creating archive dir` line in run_job() in teuthology/worker.py .
but after d488b9bd, the memo is logged by the first-in-suite job, by
then, none of the test jobs is performed, so their archive dirs are not
created. i think that's why the first-in-suite job fails to write the
memo to $archive_dir/results.log.
worker: do not pass --timeout to first-in-suite job
likewise, do not pass --seed or --subset to last-in-suite job
otherwise, teuthology/schedule.py will raise a ValueError at seeing
--subset or --seed not coming along with --first-in-suite, or
--email or --timeout not coming along with --last-in-suite.
Kefu Chai [Fri, 24 Aug 2018 13:15:31 +0000 (21:15 +0800)]
suite/run,schedule,result: write rerun memo as the first job in suite
so we don't need to wait for the job to write result to for rerunning
the test suite. without this change, the "result" is normally the last
job in the suite to be scheduled, so it's likely we will not have the
results.log until the suite is almost completed. afer this change,
a "first-in-suite" job is scheduled as the first job to note down
the subset and seed to run the suite.
Kefu Chai [Sat, 4 Aug 2018 00:42:02 +0000 (00:42 +0000)]
setup,py,requirements.txt: add pytest
pytest requires pluggy >= 0.7, while we always use pluggy 0.6, as
specified by requirements.txt. as this version is good enough for
tox. but in tox.ini, we do use pytest, and no version is specified,
so we have good chance running into https://github.com/pytest-dev/pytest/issues/3753
also, remove pytest from tox.ini, as this dependency has been
added in requirements.txt
install: support extra_system_packages config option
On DEB systems, packages specified via the extra_packages option are installed
while forcing the same package version number as the version of the project
(i.e. Ceph) under test. So extra_packages can only be used to specify
additional project (Ceph) packages.
If we wanted to specify additional system (non-project, non-Ceph) packages to
install, we were out of luck. This commit implements an extra_system_packages
option for specifying extra non-project packages.
A branch name containing a slash is perfectly legal in git, but
teuthology uses branch names verbatim in run names, which causes POSTs
to fail when submitting runs to paddles. Replace all '/' in run names
with ':' to allow for branches with slashes in their names.
Signed-off-by: Adam Wolfe Gordon <awg@digitalocean.com>
results,schedule,woker: persist --seed and --subset in results.log
to create a repeatable test suite, in addition to `--seed <SEED>`, we also
need to pass the same `--subset <SUBSET>` to teuthology-suite when
rerunning the failed tests. but it would be handy if teuthology-suite
could remember these settings and recall them when `--rerun <RUN>`.
in this change, we repurpose the last job sending the email to report
the test result to note down the subset and seed used for scheduling the
test suite. these variables are stored in results.log at this moment.
I could not figure out the owner of the requested job.
Please pass --owner <owner>.
Worker dies with unhandled exception in run_with_watchdog
if it can't figure out owner of a job, which it tries
to kill when job runs longer then given limit of time.
Kyr Shatskyy [Mon, 7 May 2018 15:01:57 +0000 (18:01 +0300)]
Restart dead workers
This patch allows to restart dead workers separately
not stopping the rest of the teuthology components,
and what is more important the beanstalkd service.
That makes it possible to extend the number of workers too.
Also, either of pulpito and paddles can be restarted alone.
Kefu Chai [Sat, 16 Jun 2018 16:09:36 +0000 (00:09 +0800)]
teuthology-suite: add --seed option for repeatable random test
currently --rerun does not match tests of
'supported-random-distro$/ubuntu_latest.yaml' with
'supported-random-distro$/centos_latest.yaml'. the former could be part
of description of a failed test, the latter is a a part of job
description generated by build_matrix(). because the '$' operator
instructs theuthology to choose a random file under the directory ending
with '$', and we expand the '$' to a randomly picked file *before*
filtering the generated job list with the filter collected from the
failed tests, there is good chance that the job descriptions of the
failed jobs in self.args.filter_in cannot match with the randomly
generated ones.
so, we introduce an argument '--seed' for teuthology-suite for the
repeatable random test. this argument allows user to specify a seed for
tne RNG used by build_matrix().
placeholder: whitelist MDS_ALL_DOWN, MDS_UP_LESS_THAN_MAX by default
because, in ceph/qa/tasks/ceph.py, we start mon, mgr, osd, and then mds.
there is a time window where there is no mds around, but mgr is checking
mdsmap for MDS_ALL_DOWN errors. there is no way to disable this check in
this time window. so we just whitelist MDS_ALL_DOWN here.
Zack Cerza [Wed, 21 Mar 2018 23:37:32 +0000 (17:37 -0600)]
task.ansible: Allow passing in custom group_vars
Up until now, if you wanted to inject vars to a playbook run, you had to
use --extra-vars, which don't behave the same way that group_vars do.
This commit adds that functionality.
We look for a 'group_vars' dict in the task's config object. If it's
there, we create group_vars files with names taken from the keys, and
content taken from the values.