]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agotestrados -> ceph_test_rados
Sage Weil [Mon, 18 Feb 2013 21:39:20 +0000 (13:39 -0800)]
testrados -> ceph_test_rados

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocleanup-run.sh: <owner> <run name>
Sage Weil [Mon, 18 Feb 2013 21:14:59 +0000 (13:14 -0800)]
cleanup-run.sh: <owner> <run name>

Sloppy... this assumes the run is in the description as the archive dir.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolock: allow filtering by description, description substring
Sage Weil [Mon, 18 Feb 2013 21:13:11 +0000 (13:13 -0800)]
lock: allow filtering by description, description substring

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw: sudo
Sage Weil [Mon, 18 Feb 2013 20:14:14 +0000 (12:14 -0800)]
rgw: sudo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agofix a few archive/log stragglers
Sage Weil [Mon, 18 Feb 2013 20:14:12 +0000 (12:14 -0800)]
fix a few archive/log stragglers

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: make gitbuilder host configurable
Sage Weil [Mon, 18 Feb 2013 19:59:26 +0000 (11:59 -0800)]
ceph: make gitbuilder host configurable

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: install -dbg packages, too
Sage Weil [Mon, 18 Feb 2013 17:45:05 +0000 (09:45 -0800)]
ceph: install -dbg packages, too

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: create /var/run/ceph
Sage Weil [Mon, 18 Feb 2013 17:41:00 +0000 (09:41 -0800)]
ceph: create /var/run/ceph

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph-fuse: sudo
Sage Weil [Sun, 17 Feb 2013 17:59:04 +0000 (09:59 -0800)]
ceph-fuse: sudo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agosudo for admin socket commands
Sage Weil [Sun, 17 Feb 2013 17:23:23 +0000 (09:23 -0800)]
sudo for admin socket commands

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocfuse -> ceph-fuse
Sage Weil [Sun, 17 Feb 2013 07:53:23 +0000 (23:53 -0800)]
cfuse -> ceph-fuse

12 years agoceph: store logs in normal location
Sage Weil [Sun, 17 Feb 2013 07:44:03 +0000 (23:44 -0800)]
ceph: store logs in normal location

We need to switch around how these are compressed and pulled, since they
aren't in the regular archive dir anymore.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: use default data, keyring locations
Sage Weil [Sun, 17 Feb 2013 06:32:16 +0000 (22:32 -0800)]
ceph: use default data, keyring locations

This required reordering the cluster setup so that we do the ceph-osd
--mkfs --mkkey prior to gathering keys and initializing the monitors.

Also, run daemons as root.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: don't uninstall librados, librbd
Sage Weil [Sun, 17 Feb 2013 05:53:46 +0000 (21:53 -0800)]
ceph: don't uninstall librados, librbd

This forces uninstall of kvm too, which is expensive.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: pass package version to apt-get install
Sage Weil [Sun, 17 Feb 2013 05:31:50 +0000 (21:31 -0800)]
ceph: pass package version to apt-get install

This avoids problems when a different or newer version of the package is
already installed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoavoid secretfile, except for kclient
Sage Weil [Sun, 17 Feb 2013 05:30:57 +0000 (21:30 -0800)]
avoid secretfile, except for kclient

Only mount.ceph needs the secret in a standalone file.  Remove other users,
and simplify that one.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw: specify keyring location
Sage Weil [Sun, 17 Feb 2013 03:37:17 +0000 (19:37 -0800)]
rgw: specify keyring location

Otherwise we look at the default /var/lib/ceph/radosgw/ceph-$id/keyring.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonuke: tolerate failed dpkg --configure -a/apt-get -f install
Sage Weil [Sun, 17 Feb 2013 03:36:45 +0000 (19:36 -0800)]
nuke: tolerate failed dpkg --configure -a/apt-get -f install

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinstall radosgw
Sage Weil [Sun, 17 Feb 2013 00:20:07 +0000 (16:20 -0800)]
install radosgw

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorbd: remove merge cruft
Sage Weil [Sun, 17 Feb 2013 00:18:44 +0000 (16:18 -0800)]
rbd: remove merge cruft

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: simplify apt-key management
Sage Weil [Sat, 16 Feb 2013 01:41:46 +0000 (17:41 -0800)]
ceph: simplify apt-key management

Run apt-key as root. No need to initialize ubuntu user's gpg.  Fix
whitespace.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: put client keyrings in /etc/ceph/ceph.$name.keyring
Sage Weil [Sat, 16 Feb 2013 01:19:32 +0000 (17:19 -0800)]
ceph: put client keyrings in /etc/ceph/ceph.$name.keyring

And make it world readable, for ubuntu's sake.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonuke: dpkg --configure -a and apt-get -f install
Sage Weil [Sat, 16 Feb 2013 00:58:40 +0000 (16:58 -0800)]
nuke: dpkg --configure -a and apt-get -f install

Installing debs means we are more likely to hit a case where we interrupt
apt/dpkg.  Try to mop up as best we can in nuke.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonuke: whitespace
Sage Weil [Sat, 16 Feb 2013 00:48:41 +0000 (16:48 -0800)]
nuke: whitespace

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: simpilfy package removal
Sage Weil [Fri, 15 Feb 2013 23:39:02 +0000 (15:39 -0800)]
ceph: simpilfy package removal

apt-get doesn't have a nice way to tell if the package is not install and
we don't need to purge it.  Well, not one I found in 5 minutes.  Just
do a big purge and assume it works, or failed because there was nothing to
be done.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agonuke: remove librados, librbd
Sage Weil [Fri, 15 Feb 2013 23:38:08 +0000 (15:38 -0800)]
nuke: remove librados, librbd

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: install ceph-mds, ceph-common
Sage Weil [Fri, 15 Feb 2013 23:17:25 +0000 (15:17 -0800)]
ceph: install ceph-mds, ceph-common

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph: fix purge
Sage Weil [Fri, 15 Feb 2013 23:17:18 +0000 (15:17 -0800)]
ceph: fix purge

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoInstall ceph debs and use installed debs
Sander Pool [Wed, 6 Feb 2013 19:16:52 +0000 (19:16 +0000)]
Install ceph debs and use installed debs

The ceph task installs ceph using the debian
packages now, and all invocations of binaries installed
in {tmpdir}/binary/usr/local/bin/ are replace with
the use of the binaries installed in standard locations
by the debs.

Author:    Sander Pool <sander.pool@inktank.com>
Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: testrados -> ceph_test_rados
Sage Weil [Mon, 18 Feb 2013 21:38:54 +0000 (13:38 -0800)]
nuke: testrados -> ceph_test_rados

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomisc: replace : with - in testdir name
Sage Weil [Mon, 18 Feb 2013 06:13:45 +0000 (22:13 -0800)]
misc: replace : with - in testdir name

The :'s break the list in $PATH.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: fix s3branch
Sage Weil [Fri, 15 Feb 2013 17:33:27 +0000 (09:33 -0800)]
schedule_suite.sh: fix s3branch

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorbd_fsx: binary name now has ceph_ prefix
Sage Weil [Fri, 15 Feb 2013 17:12:25 +0000 (09:12 -0800)]
rbd_fsx: binary name now has ceph_ prefix

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorados: testrados -> ceph_test_rados
Sage Weil [Wed, 13 Feb 2013 22:10:33 +0000 (14:10 -0800)]
rados: testrados -> ceph_test_rados

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: choose s3branch based on teuthology branch
Sage Weil [Wed, 13 Feb 2013 16:50:46 +0000 (08:50 -0800)]
schedule_suite.sh: choose s3branch based on teuthology branch

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: take option teuthology branch arg
Sage Weil [Wed, 13 Feb 2013 05:15:52 +0000 (21:15 -0800)]
schedule_suite.sh: take option teuthology branch arg

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: ensure ceph and kernel branches exist
Sage Weil [Wed, 13 Feb 2013 05:24:16 +0000 (21:24 -0800)]
schedule_suite.sh: ensure ceph and kernel branches exist

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agopeer: add recovery delay to make test behave
Sage Weil [Mon, 11 Feb 2013 14:59:17 +0000 (06:59 -0800)]
peer: add recovery delay to make test behave

Otherwise it was (very) racy!

12 years agoMerge to include --machine-type and changes to --summary
Sandon Van Ness [Fri, 8 Feb 2013 00:34:14 +0000 (16:34 -0800)]
Merge to include --machine-type and changes to --summary

Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.

Also updated teutholoy-lock --summary to be machine type aware
and sort things in a nice output.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMade teuthology-lock --summary machine type aware.
Sandon Van Ness [Fri, 8 Feb 2013 00:06:21 +0000 (16:06 -0800)]
Made teuthology-lock --summary machine type aware.

Signed-off-by: Sandon Van Ness <sandon@van-ness.com>
12 years agoAdded support for multiple types of machines.
Sandon Van Ness [Tue, 5 Feb 2013 20:53:08 +0000 (12:53 -0800)]
Added support for multiple types of machines.

Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.

Signed-off-by: Sandon Van Ness <sandon@van-ness.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorgw: parse testdir into apache.conf
Sage Weil [Thu, 7 Feb 2013 06:02:10 +0000 (22:02 -0800)]
rgw: parse testdir into apache.conf

Also fix up the template to use {{field}} for stuff we don't want to parse.
There is probably a better way...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd_recovery: add missing testdir arg
Sage Weil [Thu, 7 Feb 2013 05:43:58 +0000 (21:43 -0800)]
osd_recovery: add missing testdir arg

12 years agoceph_manager: take int or string to osd_admin_socket
Sage Weil [Thu, 7 Feb 2013 01:14:24 +0000 (17:14 -0800)]
ceph_manager: take int or string to osd_admin_socket

This fixes a failure on dump_stuck.

12 years agoradosbench: fix missing format value
Sage Weil [Wed, 6 Feb 2013 17:06:35 +0000 (09:06 -0800)]
radosbench: fix missing format value

tdir is substituted in at the end.  There is probably a better way to do
this.

12 years agorgw: fix testdir format on f
Sage Weil [Wed, 6 Feb 2013 17:04:37 +0000 (09:04 -0800)]
rgw: fix testdir format on f

Format the path, not filehandle

12 years agonuke: don't try unmount if we're rebooting everything anyway
Josh Durgin [Wed, 6 Feb 2013 07:31:37 +0000 (23:31 -0800)]
nuke: don't try unmount if we're rebooting everything anyway

This can cause issues when unmount hangs. Our automatic runs reboot
everything unconditionally, so this caused a bunch of unecessary hangs
when an fs was accidentally rendered un-unmountable.

12 years agonuke: make tmpfs check only umount tmpfs
Josh Durgin [Wed, 6 Feb 2013 07:28:08 +0000 (23:28 -0800)]
nuke: make tmpfs check only umount tmpfs

This would catch things like /tmp/cephtest/mnt.client.0, which are
used by cfuse, rbd, and kclient.

12 years agorbd: fix rbd image unmount
Sage Weil [Wed, 6 Feb 2013 07:19:23 +0000 (23:19 -0800)]
rbd: fix rbd image unmount

The testdir param was missing.  Avoid this class of errors by unmounting
exactly what we mounted.

12 years agorbd: set env before running sudo
Sage Weil [Wed, 6 Feb 2013 07:01:25 +0000 (23:01 -0800)]
rbd: set env before running sudo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomisc: Close connections on reboot
Sam Lang [Tue, 5 Feb 2013 22:20:52 +0000 (16:20 -0600)]
misc:  Close connections on reboot

When nodes are rebooted, the connections remain open
even after calling reconnect and setting up new ssh
sessions to the rebooted nodes.  This causes ECONNRESET
errors to show up in the teuthology output.

Close the existing connections before trying to reconnect.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph_manager: Fix NoneType config issue
Sam Lang [Tue, 5 Feb 2013 16:38:48 +0000 (10:38 -0600)]
task/ceph_manager:  Fix NoneType config issue

kill_mon is getting a config set to None, which blows
up now due to the check for powercycle.  Initialize
the config to an empty dict if we don't get anything
on init.  This is the error showing up in teuthology:

2013-02-04T15:04:16.595 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1fcafd0>
Traceback (most recent call last):
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 142, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 69, in do_join
    self.thread.get()
  File "/var/lib/teuthworker/teuthology-master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
AttributeError: 'NoneType' object has no attribute 'get'

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: don't use colon in default run name
Josh Durgin [Mon, 4 Feb 2013 18:39:21 +0000 (10:39 -0800)]
misc: don't use colon in default run name

LD_LIBRARY_PATH does not work with colons (and backslash does not escape them.)

12 years agoAdd testdir param to get_valgrind_args() calls
Sam Lang [Mon, 4 Feb 2013 04:08:40 +0000 (22:08 -0600)]
Add testdir param to get_valgrind_args() calls

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge branch 'wip-misc-fixes'
Sam Lang [Sun, 3 Feb 2013 17:38:10 +0000 (11:38 -0600)]
Merge branch 'wip-misc-fixes'

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke.py: Allow name of job/run to be specified
Sam Lang [Sun, 3 Feb 2013 17:09:49 +0000 (11:09 -0600)]
nuke.py:  Allow name of job/run to be specified

Nuke will cleanup the base test directory by default, but can
cleanup the test directory for a given run if specified.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorun.py: Add target name to logging info
Sam Lang [Sun, 3 Feb 2013 17:09:04 +0000 (11:09 -0600)]
run.py: Add target name to logging info

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorbd: fix .format() call with {1} syntax
Sage Weil [Sun, 3 Feb 2013 16:18:52 +0000 (08:18 -0800)]
rbd: fix .format() call with {1} syntax

IndexError: tuple index out of range

12 years agoceph_manager: use get() for self.config powercycle checks
Sage Weil [Sun, 3 Feb 2013 05:01:08 +0000 (21:01 -0800)]
ceph_manager: use get() for self.config powercycle checks

I think this is what is going on...

Traceback (most recent call last):
  File "/var/lib/teuthworker/teuthology-master/teuthology/contextutil.py", line 27, in nested
    yield vars
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1158, in task
    yield
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks
    manager = _run_one_task(taskname, ctx=ctx, config=config)
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 14, in _run_one_task
    return fn(**kwargs)
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/dump_stuck.py", line 93, in task
    manager.kill_osd(id_)
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph_manager.py", line 665, in kill_osd
    if 'powercycle' in self.config and self.config['powercycle']:
TypeError: argument of type 'NoneType' is not iterable

12 years agoFixup latest commits that use /tmp/cephtest.
Sam Lang [Sat, 2 Feb 2013 17:00:17 +0000 (11:00 -0600)]
Fixup latest commits that use /tmp/cephtest.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/chdir-coredump: Use readlink -e
Sam Lang [Fri, 1 Feb 2013 22:07:29 +0000 (16:07 -0600)]
task/chdir-coredump:  Use readlink -e

realpath isn't available everywhere, use readlink -e instead.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Fix typo in previous commit
Sam Lang [Fri, 1 Feb 2013 20:07:10 +0000 (14:07 -0600)]
task/ceph: Fix typo in previous commit

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: get_testdir_base needs to be imported
Sam Lang [Fri, 1 Feb 2013 19:01:25 +0000 (13:01 -0600)]
nuke: get_testdir_base needs to be imported

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: Fix cleanup of test dir
Sam Lang [Fri, 1 Feb 2013 17:45:04 +0000 (11:45 -0600)]
nuke: Fix cleanup of test dir

Nuke used to remove /tmp/cephtest, now it tries to
remove the test dir, which it may not have the name
for.  Instead of removing the test dir, we just
remove the base directory for all test directories,
which may or may not be /tmp/cephtest.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Initialize disk_config maps
Sam Lang [Fri, 1 Feb 2013 17:37:13 +0000 (11:37 -0600)]
task/ceph: Initialize disk_config maps

The mount_options and fstype maps need to be
initialized properly for later.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Don't include existing partitions in devs
Sam Lang [Fri, 1 Feb 2013 16:53:47 +0000 (10:53 -0600)]
misc: Don't include existing partitions in devs

We don't want to include /dev/sda1, etc. in the
list of devices to use.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Fix device list
Sam Lang [Fri, 1 Feb 2013 16:16:44 +0000 (10:16 -0600)]
task/ceph: Fix device list

dict.items() returns a tuple, whereas we want
the values().

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: get_wwn_id_map() needs to return dict
Sam Lang [Fri, 1 Feb 2013 15:13:48 +0000 (09:13 -0600)]
misc: get_wwn_id_map() needs to return dict

If we can't find device ids, we need to return
a dict, not a list.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: Optionally check console status
Sam Lang [Fri, 1 Feb 2013 14:24:41 +0000 (08:24 -0600)]
nuke:  Optionally check console status

Only check the ipmi console status if the ipmi
parameters have been defined in .teuthology.yaml.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Fix get_wwn_id_map() to be optional
Sam Lang [Fri, 1 Feb 2013 14:20:43 +0000 (08:20 -0600)]
misc: Fix get_wwn_id_map() to be optional

Not all plana nodes have symlinks setup when
we check /dev/disk/by-id/wwn-*.  Instead of failing
here, just use the /dev/disk/sd* devices.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorun.py: Fix argument parsing for --name
Sam Lang [Fri, 1 Feb 2013 13:46:04 +0000 (07:46 -0600)]
run.py: Fix argument parsing for --name

With the addition of the --name argument to the
teuthology program (run.py), jobs were failing
because --name was being treated as a non-arg
option, even though the name was being supplied
by the workers.  Fix that and give it a metavar.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoceph_manager: wait for admin socket on restart, use for set_config
Samuel Just [Thu, 31 Jan 2013 00:45:46 +0000 (16:45 -0800)]
ceph_manager: wait for admin socket on restart, use for set_config

Fixes: #3966
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agothrashosds: note assumption for powercycling
Josh Durgin [Thu, 31 Jan 2013 17:14:06 +0000 (09:14 -0800)]
thrashosds: note assumption for powercycling

12 years agoRemove console.py
Sam Lang [Wed, 23 Jan 2013 22:27:32 +0000 (16:27 -0600)]
Remove console.py

Handling of ipmi via the console is now done through the
Console class in teuthology/orchestra/remote.py.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoAssign devices to osds using the device wwn
Sam Lang [Wed, 23 Jan 2013 02:27:41 +0000 (20:27 -0600)]
Assign devices to osds using the device wwn

Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk.  Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoSupport power cycling osds/nodes through ipmi
Sam Lang [Wed, 23 Jan 2013 02:13:19 +0000 (20:13 -0600)]
Support power cycling osds/nodes through ipmi

This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.

Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoadd --name option to teuthology
Sam Lang [Wed, 23 Jan 2013 03:53:14 +0000 (21:53 -0600)]
add --name option to teuthology

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoReplace /tmp/cephtest/ with configurable path
Sam Lang [Wed, 23 Jan 2013 20:37:39 +0000 (14:37 -0600)]
Replace /tmp/cephtest/ with configurable path

Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run.  This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name.  If no name
is specified, {user}-{timestamp} is used.

To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.

This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoFixes for syntax errors found by pyflakes.
Sam Lang [Thu, 31 Jan 2013 13:58:57 +0000 (07:58 -0600)]
Fixes for syntax errors found by pyflakes.

This patch includes minor fixes to the teuthology
python code for syntax errors found by running
check-syntax.sh (which runs pyflakes on each file).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoScripts to use pyflakes to check python syntax.
Sam Lang [Thu, 31 Jan 2013 13:56:56 +0000 (07:56 -0600)]
Scripts to use pyflakes to check python syntax.

pyflakes runs a basic syntax checker against python code.
The added check-syntax.sh script and Makefile run pyflakes
on the python code within the teuthology directory reporting
any syntax errors that are found.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agotask: mon_clock_skew_check: use absolute value when comparing mon_skew
Joao Eduardo Luis [Wed, 30 Jan 2013 20:52:39 +0000 (20:52 +0000)]
task: mon_clock_skew_check: use absolute value when comparing mon_skew

The monitors may report either positive or negative clock skews, and by
not using an absolute value we were constantly ignoring reported negative
clock skews.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask: mon_clock_skew_check: mark as ran once if an expected skew was found
Joao Eduardo Luis [Wed, 30 Jan 2013 20:52:03 +0000 (20:52 +0000)]
task: mon_clock_skew_check: mark as ran once if an expected skew was found

... even if we didn't get a clean/finished result from the monitors

This ought to significantly cut the waiting time if something else (or
someone else) is leaving the leader hanging thus unable to finish a given
timecheck round.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agopeer: fix filtering out of scrub from pg state
Sage Weil [Tue, 29 Jan 2013 22:04:09 +0000 (14:04 -0800)]
peer: fix filtering out of scrub from pg state

12 years agoadmin_socket: don't bother remote executing if there is no test
Sage Weil [Tue, 29 Jan 2013 11:45:45 +0000 (03:45 -0800)]
admin_socket: don't bother remote executing if there is no test

12 years agoosd_recovery: use --no-cleanup for rados bench
Samuel Just [Tue, 29 Jan 2013 03:36:17 +0000 (19:36 -0800)]
osd_recovery: use --no-cleanup for rados bench

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_recovery: inject a recovery delay
Samuel Just [Tue, 29 Jan 2013 03:22:42 +0000 (19:22 -0800)]
osd_recovery: inject a recovery delay

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_backfill: --no-cleanup for rados bench
Sage Weil [Tue, 29 Jan 2013 03:53:34 +0000 (19:53 -0800)]
osd_backfill: --no-cleanup for rados bench

12 years agocram: fix for runs with coverage enabled
Josh Durgin [Mon, 28 Jan 2013 22:53:43 +0000 (14:53 -0800)]
cram: fix for runs with coverage enabled

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoosdthrasher: inject pause on a live (on in) osd
Sage Weil [Sat, 26 Jan 2013 21:13:08 +0000 (13:13 -0800)]
osdthrasher: inject pause on a live (on in) osd

12 years agotask: mon_clock_skew_check: increase timeout and kick it off only on stop
Joao Eduardo Luis [Fri, 25 Jan 2013 12:09:49 +0000 (12:09 +0000)]
task: mon_clock_skew_check: increase timeout and kick it off only on stop

We were kicking-off the timeout as soon as we started; it's better however
to kick if off only when we are told to stop (as long as 'at-least-once'
is true).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask: mon_clock_skew_check: distinguish between on-going and finished check
Joao Eduardo Luis [Thu, 24 Jan 2013 18:00:39 +0000 (18:00 +0000)]
task: mon_clock_skew_check: distinguish between on-going and finished check

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agoceph_manager: turn long stall injection off by default
Samuel Just [Fri, 25 Jan 2013 01:31:38 +0000 (17:31 -0800)]
ceph_manager: turn long stall injection off by default

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_recovery: fix up incomplete test
Sage Weil [Fri, 25 Jan 2013 00:24:16 +0000 (16:24 -0800)]
osd_recovery: fix up incomplete test

- stop rados bench from cleaning up
- flush pg stats
- fix sleep call

One or more of these helped fix this test, don't really care which.

12 years agoceph_manager: fix get_num_active_recovered()
Sage Weil [Fri, 25 Jan 2013 00:23:33 +0000 (16:23 -0800)]
ceph_manager: fix get_num_active_recovered()

The states now have 'backfill' *or* 'recover' in them.

12 years agoworkunit: pass java path as env variable
Sage Weil [Thu, 24 Jan 2013 23:20:47 +0000 (15:20 -0800)]
workunit: pass java path as env variable

The libcephfs-java test needs this.

12 years agoceph_manager: use 80/70 as pause_long, pause_check_after defaults
Samuel Just [Thu, 24 Jan 2013 20:50:24 +0000 (12:50 -0800)]
ceph_manager: use 80/70 as pause_long, pause_check_after defaults

OSD::op_tp suicides after 150.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoceph_manager: use do_rados for rmpool
Samuel Just [Thu, 24 Jan 2013 18:07:10 +0000 (10:07 -0800)]
ceph_manager: use do_rados for rmpool

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip_heartbeat'
Sage Weil [Thu, 24 Jan 2013 02:43:02 +0000 (18:43 -0800)]
Merge remote-tracking branch 'gh/wip_heartbeat'

12 years agoceph_manager: default chance_down to 0.4
Samuel Just [Thu, 24 Jan 2013 01:44:05 +0000 (17:44 -0800)]
ceph_manager: default chance_down to 0.4

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoceph_manager: add filestore and heartbeat stalls
Samuel Just [Thu, 24 Jan 2013 00:13:22 +0000 (16:13 -0800)]
ceph_manager: add filestore and heartbeat stalls

Signed-off-by: Samuel Just <sam.just@inktank.com>