With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing. Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.
Dan Mick [Tue, 9 Apr 2013 22:53:49 +0000 (15:53 -0700)]
kernel.py: put submenu name in 01_ceph_kernel if necessary
We had been writing 01_ceph_kernel with the kernel title, and
relying on the fact that grub.cfg would never have submenus in it
(implemented by a hack to /etc/grub.d/10_linux which neutered its
submenu creation). However, that hack was modifying a package file,
and got in the way of later apt commands. Rather than doing it
that way, this divines the title of the submenu and sets the
default variable to "submenu>kernel", which works to select the
desired kernel.
It depends on there being only one level of submenu, and on the
format of the menuentry and submenu commands, dictated by grub2.
None of this is likely to work at all outside Ubuntu.
Fixes: #4496 Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sam Lang [Thu, 11 Apr 2013 14:23:10 +0000 (09:23 -0500)]
misc: Use job id and make short path for testdir
Nightlies run on teuthology currently use a testdir of
/home/ubuntu/cephtest, but this causes stale job errors occasionally
from the previous tests not getting properly cleaned up, which prevents
the nightlies from running successfully.
The misc.py get_testdir() function can specify a testdir that is
specific to the job, but previously the path was too long and would
cause separate job failures.
This patch does two things to resolve that. First, it uses the job id
from the teuthology run if one exists. This should be a relatively
short number that will identify the job run effectively. Second,
if the job id isn't available, it creates a shortened form of the
job's name, for example the job name:
This is a fix for issue #4677 which was caused by kdb output being
hard-coded to ttyS1 which is fine for all our hardware except mira
machines. This change just checks to see if mira is in the host's
name and uses ttyS2 instead (simple fix).
Joe Buck [Wed, 20 Mar 2013 04:26:16 +0000 (21:26 -0700)]
teuthology: remove previous test ssh keys
Updated the ssh-keys task to cleanup
any left-over keys from previous tasks
(indicated by the user being 'ssh-keys-user').
Also, some of the functions in the ssh_keys task seem
like they could be useful in general.
This patch refactors them into misc.py.
Signed-off-by: Joe Buck <jbbuck@gmail.com> Reviewd-by: Sam Lang <sam.lang@inktank.com>
Downburst create is used to reinstall a VM when it is locked.
Downburst destroy is used to remove a VM when it is unlocked.
Host keys are regenerated on each vm instantiation, so the keys
need to be checked prior to use.
If needed, qa-ceph-chef is run on newly installed systems to insure that
they are fully functional.
Josh Durgin [Fri, 29 Mar 2013 23:33:49 +0000 (16:33 -0700)]
locker: try to make up for apache timeouts
If the lock request succeeds in updating the db, but the client gets a
timeout from apache, they can now try again and get back the machines
they just locked.
Only automatic runs have a description set when locking several
machines, so this does not affect users of teuthology-lock
--lock-many, where no description can be set in the same request.
Sam Lang [Wed, 27 Mar 2013 13:48:45 +0000 (08:48 -0500)]
task/mds_thrash: Log mds dump after long delay
In cases where the mds thrasher continuously loops
waiting for an mds to be removed from the map, or
for a new mds to become active, we want to start logging
the mds state for debugging.
Sage Weil [Sun, 24 Mar 2013 03:58:46 +0000 (20:58 -0700)]
verify /var/lib/ceph not present on start
Verify there is no /var/lib/ceph, just like we do with the cephtest
directory. We will need to change this (or make it optional) when we
allow runs against an existing cluster, but then a whole bunch of other
things will need to change then as well.
Warren Usui [Sat, 16 Mar 2013 01:18:56 +0000 (18:18 -0700)]
Fixed ceph-fuse mount point cleanup bug
Tested for the existence of /sys/fs/fuse/connections/*/abort
before clobbering it. This problem was generated when all
the machines were virtual CentOS machines.
Sam Lang [Mon, 11 Mar 2013 18:22:10 +0000 (13:22 -0500)]
task/restart: Restart task for testing daemon kill
The ceph daemons support being killed at a specific code point
with a config option. In some cases, we want to test a kill point
only once for a given daemon run (such as replay that only occurs
during daemon startup). This task allows running a script or executable
and (when the script sends a command to the task) restarting it with
a temporary config that has the appropriate kill point set. Once
the daemon asserts and gets restarted, the original config is used.
Adds a specific restart_with_args() method to the DaemonState in the
ceph task.
Right now this task follows the workunit task closely, but uses stdout/stdin
to specify when to restart a daemon.
Warren Usui [Thu, 14 Mar 2013 21:24:50 +0000 (14:24 -0700)]
Added el6 install functionality for CentOS systems.
install_packages, remove_packages and remove_sources are now the
installation and removal functions used by teuthology. Debian
references have been removed outside of tasks/install.py. CentOS
functionality parallel to Debian have been added to tasks/install.py,
and el6 references have been added to nuke.py, task/ceph-fuse.y and
task/install.py.
Some files created by CentOS are removed with rm -fr. This should
be changed once the installation/removal rpm procedure is implemented.
Signed-off-by: Warren Usui <warren.usui@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sam Lang [Wed, 13 Mar 2013 15:11:06 +0000 (10:11 -0500)]
task/thrashosds: Ipmi checking/setup in thrashosds
We don't need to setup the ipmi console on runs that
don't use powercycling, so delay setup of the RemoteConsole
with ipmi to the thrashosd task and only then if the powercycle
config is set. This avoids spurious test failures from flaky
ipmi.
Warren Usui [Wed, 27 Feb 2013 19:32:37 +0000 (11:32 -0800)]
Implement email task.
Email.py was added so that the emailto attribute could be passed,
and to prevent 'module object has no attribute: email' errors from
happening. Run.py actual performs the email operation and calls
suite.email_results to do the actual send mail operation. The
information passed right now is the summary and config information.