Zack Cerza [Tue, 28 Jan 2014 16:05:03 +0000 (10:05 -0600)]
Attempt to fix #7241
This involves moving everything in build_ceph_cluster() inside the try:
block, so if an exception fails the cleanup in the finally: block will
actually be executed.
Dan Mick [Thu, 16 Jan 2014 20:51:39 +0000 (12:51 -0800)]
lock.py: request only rsa keys from ssh-keyscan
New versions of ssh-keyscan return two keytypes if possible; this
breaks the comparison of "number of lines of output from keyscan"
to "number of hosts we request keys from". Fix by asking for only
one type of key (as older ssh-keyscans did).
Fixes: #7164 Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Fri, 10 Jan 2014 19:02:06 +0000 (11:02 -0800)]
valgrind: ignore tcmalloc uninitialized memory
This is the main source of noise when running valgrind +
tcmalloc. Apparently there are other issues, so I think we
still need the notcmalloc gitbuilder, but this gets us part of
the way.
Ilya Dryomov [Fri, 10 Jan 2014 10:26:09 +0000 (12:26 +0200)]
kernel: use utsrelease string for need_to_install() purposes
Currently, to see if a node has rebooted into the right kernel,
need_to_install() compares a given 40-char commit hash with a 7-char
commit hash abbreviation it pulls from the output of 'uname -r'.
gitbuilders can now export UTS_RELEASE kernel version string through
.../$SHA1/version file. Use this string instead of the 40-char commit
hash and compare it with the output of 'uname -r' directly. This saves
us the parsing exercise and, more importantly, makes it possible to
install clean tagged kernels using 'tag:' element, which wasn't
possible before because version string of such kernels doesn't have
a commit hash in it.
If version file is unavailable, fallback to the existing way of doing
things.
Warren Usui [Tue, 7 Jan 2014 22:22:57 +0000 (14:22 -0800)]
Fix a bug where ctx.config['targets'] was looped through again
in connect(). The bug caused vm behavior to happen for a
target if any of the machines in the cluster was a vm. The code
was also changed to set the key to none only if rsa or dsa keys
were used on a vm.
Ilya Dryomov [Mon, 23 Dec 2013 17:54:11 +0000 (19:54 +0200)]
rbd: bump the default scratch size for xfstests to 10G
autobuild-ceph.git commit 53db7a34aba5 had silently changed the default
elevator from cfq to deadline, which made xfstests 167 very unhappy.
It looks like with deadline and noop elevators it requires a ~6G
scratch partition. Bump the default scratch image size to 10G.
Sandon Van Ness [Wed, 18 Dec 2013 20:38:50 +0000 (12:38 -0800)]
Use saucy gitbuilder for arm package checking.
Some-how missed it checks both sha1 and package version file
and package version was still the quantal gitbuilder which wont
work as the hardware is down.
This was causing scheduling failures.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Zack Cerza [Thu, 12 Dec 2013 23:33:53 +0000 (17:33 -0600)]
Make sure to report all results.
If a just-finished job was using a teuthology branch not known to
contain the reporting feature, then report the job via the
teuthology-report script. Note that in some cases this will result in
double reporting but the extra load should be negligible.
John Spray [Thu, 12 Dec 2013 21:33:19 +0000 (13:33 -0800)]
Fix FSID not being set in ceph.conf
Symptom was that 'ceph --admin-daemon... config get fsid'
returned zeros, while correct fsid was present in cluster maps.
Fix it by populating FSID in ceph.conf, after extracting it from
monmap.
Sandon Van Ness [Thu, 12 Dec 2013 02:07:43 +0000 (18:07 -0800)]
Longer timeout after sync/reboot.
With only a 5 second sleep via ssh and python it looks like a
race-condition was sometimes hitting where it would think
the machine is back up before the reboot command had completed.
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Sage Weil [Mon, 9 Dec 2013 19:42:12 +0000 (11:42 -0800)]
nuke: ignore exceptions while issuing reboot command
I'm seeing failed tasks (and nuke) leak machines. It looks like we are
getting an exception on the '... reboot -f -n' command when we should be
ignoring it and waiting for the machine to restart.
For example:
http://qa-proxy.ceph.com/teuthology/sage-2013-12-08_19:25:06-rados:thrash-wip-tier-foo-basic-plana/136321/teuthology.log
Warren Usui [Thu, 5 Dec 2013 01:49:21 +0000 (17:49 -0800)]
A create_if_vm call was made more than once when a lock-many style lock
was performed. This caused downburst to run twice, and the second
downburst fails as a result of the first downburst running.
Warren Usui [Mon, 2 Dec 2013 22:37:12 +0000 (14:37 -0800)]
Implement --downburst-conf parameter for teuthology-lock.
Load the appropriate yaml information when found (this formerly
did not work). Make sure teuthology --lock works with a downburst
entry in the yaml files. Document how this works in README.rst.
Warren Usui [Wed, 4 Dec 2013 02:16:04 +0000 (18:16 -0800)]
Added docstrings. Cleaned up code (broke up long lines, removed unused
variable references, pep8 formatted most of the code (one set of long lines
remains), and changed some variable and method names to conform to pylint
standards).