Sam Lang [Fri, 1 Feb 2013 13:46:04 +0000 (07:46 -0600)]
run.py: Fix argument parsing for --name
With the addition of the --name argument to the
teuthology program (run.py), jobs were failing
because --name was being treated as a non-arg
option, even though the name was being supplied
by the workers. Fix that and give it a metavar.
Sam Lang [Wed, 23 Jan 2013 02:27:41 +0000 (20:27 -0600)]
Assign devices to osds using the device wwn
Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk. Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sam Lang [Wed, 23 Jan 2013 20:37:39 +0000 (14:37 -0600)]
Replace /tmp/cephtest/ with configurable path
Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run. This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name. If no name
is specified, {user}-{timestamp} is used.
To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.
This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sam Lang [Thu, 31 Jan 2013 13:56:56 +0000 (07:56 -0600)]
Scripts to use pyflakes to check python syntax.
pyflakes runs a basic syntax checker against python code.
The added check-syntax.sh script and Makefile run pyflakes
on the python code within the teuthology directory reporting
any syntax errors that are found.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
task: mon_clock_skew_check: use absolute value when comparing mon_skew
The monitors may report either positive or negative clock skews, and by
not using an absolute value we were constantly ignoring reported negative
clock skews.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
task: mon_clock_skew_check: mark as ran once if an expected skew was found
... even if we didn't get a clean/finished result from the monitors
This ought to significantly cut the waiting time if something else (or
someone else) is leaving the leader hanging thus unable to finish a given
timecheck round.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
task: mon_clock_skew_check: increase timeout and kick it off only on stop
We were kicking-off the timeout as soon as we started; it's better however
to kick if off only when we are told to stop (as long as 'at-least-once'
is true).
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Sandon Van Ness [Wed, 23 Jan 2013 22:08:53 +0000 (14:08 -0800)]
Use ceph git repo instead of github.
This code change is so that instead of pulling the tarball of github
which can be unreliable at times it instead uses the ceph repo mirror
and serves as the same function. Now it is using git archive and no
longer uses wget. Because of this less tar-fu is needed to extract
the necessary files as it can be done directly through git archive.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
Joe Buck [Mon, 14 Jan 2013 20:09:56 +0000 (12:09 -0800)]
test: create /tmp/cephtest/mnt.{id}
The workunit task assumes that a mount exists
at /tmp/cephtest/mnt.{id}
This patch creates the path if it doesn't
exist, enabling workunits to run in the absense
of kclient or ceph-fuse tasks.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
task: mon_clock_skew_check: add option to run at least one timecheck
at-least-once Runs at least once, even if we are told to stop.
(default: True)
at-least-once-timeout If we were told to stop but we are attempting to
run at least once, timeout after this many
seconds. (default: 300)
Fixes: #3854 Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Alex Elder [Fri, 18 Jan 2013 18:47:34 +0000 (12:47 -0600)]
rbd.py: update scratch and test image sizes
Test 167 was failing due to running out of space on the scratch
file system. The test reserves 21MB in a file, and repeats 50
times. It required just over 1GB, so I bumped the default size
for the testing device to 1200 MB. I increased the test device
size as well.
This resolves http://tracker.newdream.net/issues/3864.
Loic Dachary [Wed, 16 Jan 2013 10:48:05 +0000 (11:48 +0100)]
When running teuthology with targets provisionned on OpenStack and kvm, the disks will show under /dev/vda, /dev/vdb etc. Add them to the list of devices to inspect and use for tests.
Josh Durgin [Tue, 15 Jan 2013 01:55:43 +0000 (17:55 -0800)]
Add cram task
This runs cram tests, which are an easy way to test output
stays consistent. We already use cram for basic cli tests with no cluster,
and now we can use it for whole system tests too.
Sage Weil [Mon, 14 Jan 2013 06:52:00 +0000 (22:52 -0800)]
ceph.conf: separate replicas across osds
ceph.git master now separates across crush hosts without this setting.
For teuthology clusters, we don't want that (unless the tests specifies
otherwise).
Sam Lang [Thu, 27 Dec 2012 23:33:07 +0000 (17:33 -0600)]
task/pexec: Add barrier capability
This patch adds the ability to barrier between
parallel exec tasks so that all tasks will perform
the following step (after the barrier) at the same
time.
Sam Lang [Fri, 14 Dec 2012 17:30:15 +0000 (07:30 -1000)]
task/pexec: More fixes for all case, exec on hosts
We don't want to do an exec per role, but per-host. We
were already doing an exec per host, but the names were confusing.
This fixes the names up and removes the role parameters.
task: mon_clock_skew_check.py: Check for clock skews on the monitors
Will run for as long as teuthology runs. By default, fails if any clock
skews higher than 0.05 seconds are detected, but will only fail when the
teuthology run finishes and after reporting a list of all the detected
skews.
Accepted options:
interval amount of seconds to wait in-between checks. (default: 30.0)
max-skew maximum skew, in seconds, that is considered tolerable
before issuing a warning. (default: 0.05)
expect-skew 'true' or 'false', to indicate whether to expect a skew
during the run or not. If 'true', the test will fail if no
skew is found, and succeed if a skew is indeed found; if
'false', it's the other way around. (default: false)
never-fail Don't fail the run if a skew is detected and we weren't
expecting it, or if no skew is detected and we were
expecting it. (default: False)
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Joe Buck [Thu, 6 Dec 2012 22:19:55 +0000 (14:19 -0800)]
Adding a Hadoop task.
This task configures and starts a Hadoop cluster.
It does not run any jobs, that must be done after
this task runs.
Can run on either Ceph or HDFS.
Joe Buck [Thu, 6 Dec 2012 22:18:41 +0000 (14:18 -0800)]
New ssh task that adds keys for node -> node ssh.
This generates a new keypair, pushes it to all nodes
in the context and adds all hosts to all other hosts
.ssh/authorized_keys file.
Cleans up all keys and authorized_keys entries
afterwards.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
Josh Durgin [Tue, 20 Nov 2012 22:01:03 +0000 (14:01 -0800)]
xfstests: run in parallel on multiple machines
xfstests itself still seems to have some global dependencies that
make it hard to run more than one instance per node, so keep
the one client per node restriction.
Name the image after the client using it, and only run the
nested context managers once, so this task can work with
more than one client.
Samuel Just [Fri, 9 Nov 2012 00:22:40 +0000 (16:22 -0800)]
Add divergent_priors test
Tests scenario where merge_old_entry encounters a divergent
entry where the prior_version is prior to log_tail. This
is a problem since it will go into the missing set, but won't
be re-added to the missing set during read_log() if the node
restarts prior to recovering the object.
Sam Lang [Thu, 8 Nov 2012 14:55:36 +0000 (08:55 -0600)]
workunit: Move cleanup to separate run
Removing the scratchdir in the remote run command
at the end of the script invocation will do the remove
once the first script finishes. With possibly a shared
scratch dir across workunit clients, we want to wait to
remove the scratch dir once all the workunit scripts have
completed.
Samuel Just [Wed, 7 Nov 2012 20:36:37 +0000 (12:36 -0800)]
ceph_manager: add test_min_size action
Thrasher can now with configurable frequency test min_size by
taking down all but one osd, waiting, killing that osd and bringing
back the others, and verifying that the cluster goes clean.
Alex Elder [Thu, 1 Nov 2012 18:32:56 +0000 (13:32 -0500)]
rbd task: support xfstests repeat count
This adds the ability to use the new repeat count argument to the
run_xfstests.sh script. By default, the test suite will be run
once, but if a count is specified the script will execute the suite
that many times, but will only perform the setup (building the
tests, etc.) once.
Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>