Josh Durgin [Fri, 29 Mar 2013 23:33:49 +0000 (16:33 -0700)]
locker: try to make up for apache timeouts
If the lock request succeeds in updating the db, but the client gets a
timeout from apache, they can now try again and get back the machines
they just locked.
Only automatic runs have a description set when locking several
machines, so this does not affect users of teuthology-lock
--lock-many, where no description can be set in the same request.
Sam Lang [Wed, 27 Mar 2013 13:48:45 +0000 (08:48 -0500)]
task/mds_thrash: Log mds dump after long delay
In cases where the mds thrasher continuously loops
waiting for an mds to be removed from the map, or
for a new mds to become active, we want to start logging
the mds state for debugging.
Sage Weil [Sun, 24 Mar 2013 03:58:46 +0000 (20:58 -0700)]
verify /var/lib/ceph not present on start
Verify there is no /var/lib/ceph, just like we do with the cephtest
directory. We will need to change this (or make it optional) when we
allow runs against an existing cluster, but then a whole bunch of other
things will need to change then as well.
Warren Usui [Sat, 16 Mar 2013 01:18:56 +0000 (18:18 -0700)]
Fixed ceph-fuse mount point cleanup bug
Tested for the existence of /sys/fs/fuse/connections/*/abort
before clobbering it. This problem was generated when all
the machines were virtual CentOS machines.
Sam Lang [Mon, 11 Mar 2013 18:22:10 +0000 (13:22 -0500)]
task/restart: Restart task for testing daemon kill
The ceph daemons support being killed at a specific code point
with a config option. In some cases, we want to test a kill point
only once for a given daemon run (such as replay that only occurs
during daemon startup). This task allows running a script or executable
and (when the script sends a command to the task) restarting it with
a temporary config that has the appropriate kill point set. Once
the daemon asserts and gets restarted, the original config is used.
Adds a specific restart_with_args() method to the DaemonState in the
ceph task.
Right now this task follows the workunit task closely, but uses stdout/stdin
to specify when to restart a daemon.
Warren Usui [Thu, 14 Mar 2013 21:24:50 +0000 (14:24 -0700)]
Added el6 install functionality for CentOS systems.
install_packages, remove_packages and remove_sources are now the
installation and removal functions used by teuthology. Debian
references have been removed outside of tasks/install.py. CentOS
functionality parallel to Debian have been added to tasks/install.py,
and el6 references have been added to nuke.py, task/ceph-fuse.y and
task/install.py.
Some files created by CentOS are removed with rm -fr. This should
be changed once the installation/removal rpm procedure is implemented.
Signed-off-by: Warren Usui <warren.usui@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sam Lang [Wed, 13 Mar 2013 15:11:06 +0000 (10:11 -0500)]
task/thrashosds: Ipmi checking/setup in thrashosds
We don't need to setup the ipmi console on runs that
don't use powercycling, so delay setup of the RemoteConsole
with ipmi to the thrashosd task and only then if the powercycle
config is set. This avoids spurious test failures from flaky
ipmi.
Warren Usui [Wed, 27 Feb 2013 19:32:37 +0000 (11:32 -0800)]
Implement email task.
Email.py was added so that the emailto attribute could be passed,
and to prevent 'module object has no attribute: email' errors from
happening. Run.py actual performs the email operation and calls
suite.email_results to do the actual send mail operation. The
information passed right now is the summary and config information.
Joe Buck [Wed, 20 Feb 2013 19:58:45 +0000 (11:58 -0800)]
teuthology: add an extra_packages flag to install
Some tests require additional packages
(e.g., java bindings, hadoop bindings).
Extend the install task to allow for those
packages to be specified in the yaml files.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
task: mon_thrash: Thrash multiple monitors and 'maintain-quorum' option
We now add a new option 'thrash-many' that by being set to true will break
the default behaviour of killing only one monitor at a time. Instead,
this option will select up to the maximum number of killable monitors to
kill in each round.
We also add a new 'maintain-quorum' option that will limit the amount of
monitors that can be killed in each thrashing round. If set to true, this
option will limit the amount of killable monitors up to (n/2-1). This
means that if we are running a configuration that only has up to two
configured monitors, if 'maintain-quorum' is set to true, this task won't
run as there are no killable monitors -- in such a scenario, this option
should be set to false.
Furthermore, if 'store-thrash' is set to true, then 'maintain-quorum' must
also be set to true, as we cannot let the task to thrash all the monitor
stores, or we wouldn't be able to sync from other monitors, nor can we
let quorum be dropped, or we won't be able to resync our way into quorum.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
task: mon_thrash: Add 'seed' and 'store-thrash' options
This patch introduces an option to thrash a monitor store when we thrash
the monitors, as well as a 'store-thrash-probability' option (defaulting
to 50%).
We also took this opportunity to introduce a new 'seed' option, that ought
to allow a given run of this task to be reproducible. This might come in
hand when attempting to reproduce a given behavior that would otherwise
be randomly triggered.
You should note that while the 'seed' option will indeed mimic past
behaviors, this only applies to a past behavior of this task: other tasks
are not affected by this value, nor are any workunits or even ceph daemons.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>