Sage Weil [Fri, 15 Feb 2013 23:39:02 +0000 (15:39 -0800)]
ceph: simpilfy package removal
apt-get doesn't have a nice way to tell if the package is not install and
we don't need to purge it. Well, not one I found in 5 minutes. Just
do a big purge and assume it works, or failed because there was nothing to
be done.
Sander Pool [Wed, 6 Feb 2013 19:16:52 +0000 (19:16 +0000)]
Install ceph debs and use installed debs
The ceph task installs ceph using the debian
packages now, and all invocations of binaries installed
in {tmpdir}/binary/usr/local/bin/ are replace with
the use of the binaries installed in standard locations
by the debs.
Author: Sander Pool <sander.pool@inktank.com> Signed-off-by: Sam Lang <sam.lang@inktank.com>
Sandon Van Ness [Fri, 8 Feb 2013 00:34:14 +0000 (16:34 -0800)]
Merge to include --machine-type and changes to --summary
Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.
Also updated teutholoy-lock --summary to be machine type aware
and sort things in a nice output.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sandon Van Ness [Tue, 5 Feb 2013 20:53:08 +0000 (12:53 -0800)]
Added support for multiple types of machines.
Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.
Signed-off-by: Sandon Van Ness <sandon@van-ness.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Josh Durgin [Wed, 6 Feb 2013 07:31:37 +0000 (23:31 -0800)]
nuke: don't try unmount if we're rebooting everything anyway
This can cause issues when unmount hangs. Our automatic runs reboot
everything unconditionally, so this caused a bunch of unecessary hangs
when an fs was accidentally rendered un-unmountable.
Sam Lang [Tue, 5 Feb 2013 22:20:52 +0000 (16:20 -0600)]
misc: Close connections on reboot
When nodes are rebooted, the connections remain open
even after calling reconnect and setting up new ssh
sessions to the rebooted nodes. This causes ECONNRESET
errors to show up in the teuthology output.
Close the existing connections before trying to reconnect.
Sam Lang [Tue, 5 Feb 2013 16:38:48 +0000 (10:38 -0600)]
task/ceph_manager: Fix NoneType config issue
kill_mon is getting a config set to None, which blows
up now due to the check for powercycle. Initialize
the config to an empty dict if we don't get anything
on init. This is the error showing up in teuthology:
2013-02-04T15:04:16.595 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1fcafd0>
Traceback (most recent call last):
File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 45, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 142, in task
thrash_proc.do_join()
File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 69, in do_join
self.thread.get()
File "/var/lib/teuthworker/teuthology-master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 308, in get
raise self._exception
AttributeError: 'NoneType' object has no attribute 'get'
Sage Weil [Sun, 3 Feb 2013 05:01:08 +0000 (21:01 -0800)]
ceph_manager: use get() for self.config powercycle checks
I think this is what is going on...
Traceback (most recent call last):
File "/var/lib/teuthworker/teuthology-master/teuthology/contextutil.py", line 27, in nested
yield vars
File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1158, in task
yield
File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks
manager = _run_one_task(taskname, ctx=ctx, config=config)
File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 14, in _run_one_task
return fn(**kwargs)
File "/var/lib/teuthworker/teuthology-master/teuthology/task/dump_stuck.py", line 93, in task
manager.kill_osd(id_)
File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph_manager.py", line 665, in kill_osd
if 'powercycle' in self.config and self.config['powercycle']:
TypeError: argument of type 'NoneType' is not iterable
Sam Lang [Fri, 1 Feb 2013 17:45:04 +0000 (11:45 -0600)]
nuke: Fix cleanup of test dir
Nuke used to remove /tmp/cephtest, now it tries to
remove the test dir, which it may not have the name
for. Instead of removing the test dir, we just
remove the base directory for all test directories,
which may or may not be /tmp/cephtest.
Sam Lang [Fri, 1 Feb 2013 13:46:04 +0000 (07:46 -0600)]
run.py: Fix argument parsing for --name
With the addition of the --name argument to the
teuthology program (run.py), jobs were failing
because --name was being treated as a non-arg
option, even though the name was being supplied
by the workers. Fix that and give it a metavar.
Sam Lang [Wed, 23 Jan 2013 02:27:41 +0000 (20:27 -0600)]
Assign devices to osds using the device wwn
Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk. Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sam Lang [Wed, 23 Jan 2013 20:37:39 +0000 (14:37 -0600)]
Replace /tmp/cephtest/ with configurable path
Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run. This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name. If no name
is specified, {user}-{timestamp} is used.
To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.
This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sam Lang [Thu, 31 Jan 2013 13:56:56 +0000 (07:56 -0600)]
Scripts to use pyflakes to check python syntax.
pyflakes runs a basic syntax checker against python code.
The added check-syntax.sh script and Makefile run pyflakes
on the python code within the teuthology directory reporting
any syntax errors that are found.
Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
task: mon_clock_skew_check: use absolute value when comparing mon_skew
The monitors may report either positive or negative clock skews, and by
not using an absolute value we were constantly ignoring reported negative
clock skews.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
task: mon_clock_skew_check: mark as ran once if an expected skew was found
... even if we didn't get a clean/finished result from the monitors
This ought to significantly cut the waiting time if something else (or
someone else) is leaving the leader hanging thus unable to finish a given
timecheck round.
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
task: mon_clock_skew_check: increase timeout and kick it off only on stop
We were kicking-off the timeout as soon as we started; it's better however
to kick if off only when we are told to stop (as long as 'at-least-once'
is true).
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Sandon Van Ness [Wed, 23 Jan 2013 22:08:53 +0000 (14:08 -0800)]
Use ceph git repo instead of github.
This code change is so that instead of pulling the tarball of github
which can be unreliable at times it instead uses the ceph repo mirror
and serves as the same function. Now it is using git archive and no
longer uses wget. Because of this less tar-fu is needed to extract
the necessary files as it can be done directly through git archive.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
Joe Buck [Mon, 14 Jan 2013 20:09:56 +0000 (12:09 -0800)]
test: create /tmp/cephtest/mnt.{id}
The workunit task assumes that a mount exists
at /tmp/cephtest/mnt.{id}
This patch creates the path if it doesn't
exist, enabling workunits to run in the absense
of kclient or ceph-fuse tasks.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>