There's no need for an explicit cleanup function, so move it back
to where it came from (except in s3roundtrip, which did not have it).
Instead, since these use a nested contextmanager, pass through
and yield to the top-level run_tasks after the nested
contextmanager has finished (and thus run all the cleanup steps
in the subtasks for this test).
Sage Weil [Sun, 28 Apr 2013 16:35:45 +0000 (09:35 -0700)]
install: prefer 'branch' over 'sha1'
The upgrade tasks specify 'branch' in the job file, but the
schedule_suite.sh script sets a sha1 in the overrides. Make
the upgrade tests actually test an upgrade by preferring branch
over sha1 when both are specified.
This is fragile, but ought to do the trick for now!
Sam Lang [Wed, 17 Apr 2013 00:08:45 +0000 (19:08 -0500)]
misc: Check for 'None' string from yaml
The description attribute from the machines yaml returned by the
locker might be the string 'None'. Need to explicitly check for
that to avoid using a test dir of /tmp/cephtest/None.
Sam Lang [Fri, 12 Apr 2013 17:55:54 +0000 (12:55 -0500)]
lock: Fix import cycle breakage
fa2049f caused an import cycle between lock.py and misc.py. Move the
needed functions from lock.py to lockstatus.py so that we can avoid the
import cycle.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Conflicts:
teuthology/lock.py
Sam Lang [Thu, 11 Apr 2013 14:23:10 +0000 (09:23 -0500)]
misc: Use job id and make short path for testdir
Nightlies run on teuthology currently use a testdir of
/home/ubuntu/cephtest, but this causes stale job errors occasionally
from the previous tests not getting properly cleaned up, which prevents
the nightlies from running successfully.
The misc.py get_testdir() function can specify a testdir that is
specific to the job, but previously the path was too long and would
cause separate job failures.
This patch does two things to resolve that. First, it uses the job id
from the teuthology run if one exists. This should be a relatively
short number that will identify the job run effectively. Second,
if the job id isn't available, it creates a shortened form of the
job's name, for example the job name:
Sage Weil [Wed, 17 Apr 2013 03:50:50 +0000 (20:50 -0700)]
ceph-deploy: purge before archiving
Purge will uninstall and (in so doing) stop the daemons. This avoids trying
to tar up the mon data or logs while they are being written to, which
avoids errors like
2013-04-16T20:21:47.103 INFO:teuthology.task.ceph-deploy:Archiving mon data...
2013-04-16T20:21:47.545 INFO:teuthology.orchestra.run.err:tar: ./ceph-mira089/store.db/000009.log: file changed as we read it
Also drop the unnecessary uninstall (it is implied by purge).
Sigh. As it turns out, /etc/default/grub being hacked also
causes the same problem. I think there's a way to fix that cleanly
as well, but until then, replacing the "accept installed version"
hack here so jobs can run.
With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing. Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.
Dan Mick [Tue, 9 Apr 2013 22:53:49 +0000 (15:53 -0700)]
kernel.py: put submenu name in 01_ceph_kernel if necessary
We had been writing 01_ceph_kernel with the kernel title, and
relying on the fact that grub.cfg would never have submenus in it
(implemented by a hack to /etc/grub.d/10_linux which neutered its
submenu creation). However, that hack was modifying a package file,
and got in the way of later apt commands. Rather than doing it
that way, this divines the title of the submenu and sets the
default variable to "submenu>kernel", which works to select the
desired kernel.
It depends on there being only one level of submenu, and on the
format of the menuentry and submenu commands, dictated by grub2.
None of this is likely to work at all outside Ubuntu.
Fixes: #4496 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 52aec32a7da07ca6e9a22ecedde78dafb4b74dfc)
Josh Durgin [Fri, 29 Mar 2013 23:33:49 +0000 (16:33 -0700)]
locker: try to make up for apache timeouts
If the lock request succeeds in updating the db, but the client gets a
timeout from apache, they can now try again and get back the machines
they just locked.
Only automatic runs have a description set when locking several
machines, so this does not affect users of teuthology-lock
--lock-many, where no description can be set in the same request.
Sam Lang [Wed, 27 Mar 2013 13:48:45 +0000 (08:48 -0500)]
task/mds_thrash: Log mds dump after long delay
In cases where the mds thrasher continuously loops
waiting for an mds to be removed from the map, or
for a new mds to become active, we want to start logging
the mds state for debugging.
Sage Weil [Sun, 24 Mar 2013 03:58:46 +0000 (20:58 -0700)]
verify /var/lib/ceph not present on start
Verify there is no /var/lib/ceph, just like we do with the cephtest
directory. We will need to change this (or make it optional) when we
allow runs against an existing cluster, but then a whole bunch of other
things will need to change then as well.
Warren Usui [Sat, 16 Mar 2013 01:18:56 +0000 (18:18 -0700)]
Fixed ceph-fuse mount point cleanup bug
Tested for the existence of /sys/fs/fuse/connections/*/abort
before clobbering it. This problem was generated when all
the machines were virtual CentOS machines.
Sam Lang [Mon, 11 Mar 2013 18:22:10 +0000 (13:22 -0500)]
task/restart: Restart task for testing daemon kill
The ceph daemons support being killed at a specific code point
with a config option. In some cases, we want to test a kill point
only once for a given daemon run (such as replay that only occurs
during daemon startup). This task allows running a script or executable
and (when the script sends a command to the task) restarting it with
a temporary config that has the appropriate kill point set. Once
the daemon asserts and gets restarted, the original config is used.
Adds a specific restart_with_args() method to the DaemonState in the
ceph task.
Right now this task follows the workunit task closely, but uses stdout/stdin
to specify when to restart a daemon.
Warren Usui [Thu, 14 Mar 2013 21:24:50 +0000 (14:24 -0700)]
Added el6 install functionality for CentOS systems.
install_packages, remove_packages and remove_sources are now the
installation and removal functions used by teuthology. Debian
references have been removed outside of tasks/install.py. CentOS
functionality parallel to Debian have been added to tasks/install.py,
and el6 references have been added to nuke.py, task/ceph-fuse.y and
task/install.py.
Some files created by CentOS are removed with rm -fr. This should
be changed once the installation/removal rpm procedure is implemented.
Signed-off-by: Warren Usui <warren.usui@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sam Lang [Wed, 13 Mar 2013 15:11:06 +0000 (10:11 -0500)]
task/thrashosds: Ipmi checking/setup in thrashosds
We don't need to setup the ipmi console on runs that
don't use powercycling, so delay setup of the RemoteConsole
with ipmi to the thrashosd task and only then if the powercycle
config is set. This avoids spurious test failures from flaky
ipmi.