]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agorbd_fsx: binary name now has ceph_ prefix
Sage Weil [Fri, 15 Feb 2013 17:12:25 +0000 (09:12 -0800)]
rbd_fsx: binary name now has ceph_ prefix

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorados: testrados -> ceph_test_rados
Sage Weil [Wed, 13 Feb 2013 22:10:33 +0000 (14:10 -0800)]
rados: testrados -> ceph_test_rados

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: choose s3branch based on teuthology branch
Sage Weil [Wed, 13 Feb 2013 16:50:46 +0000 (08:50 -0800)]
schedule_suite.sh: choose s3branch based on teuthology branch

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: take option teuthology branch arg
Sage Weil [Wed, 13 Feb 2013 05:15:52 +0000 (21:15 -0800)]
schedule_suite.sh: take option teuthology branch arg

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoschedule_suite.sh: ensure ceph and kernel branches exist
Sage Weil [Wed, 13 Feb 2013 05:24:16 +0000 (21:24 -0800)]
schedule_suite.sh: ensure ceph and kernel branches exist

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agopeer: add recovery delay to make test behave
Sage Weil [Mon, 11 Feb 2013 14:59:17 +0000 (06:59 -0800)]
peer: add recovery delay to make test behave

Otherwise it was (very) racy!

12 years agoMerge to include --machine-type and changes to --summary
Sandon Van Ness [Fri, 8 Feb 2013 00:34:14 +0000 (16:34 -0800)]
Merge to include --machine-type and changes to --summary

Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.

Also updated teutholoy-lock --summary to be machine type aware
and sort things in a nice output.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMade teuthology-lock --summary machine type aware.
Sandon Van Ness [Fri, 8 Feb 2013 00:06:21 +0000 (16:06 -0800)]
Made teuthology-lock --summary machine type aware.

Signed-off-by: Sandon Van Ness <sandon@van-ness.com>
12 years agoAdded support for multiple types of machines.
Sandon Van Ness [Tue, 5 Feb 2013 20:53:08 +0000 (12:53 -0800)]
Added support for multiple types of machines.

Added the ability to support multiple types of machines with
--machine-type added to teuthology-lock when used with --lock-many
or --machine-type with teuthology --lock (automated tests). It
defaults to 'plana' and the 'vps' type is currently unused but
should be in the future.

Signed-off-by: Sandon Van Ness <sandon@van-ness.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorgw: parse testdir into apache.conf
Sage Weil [Thu, 7 Feb 2013 06:02:10 +0000 (22:02 -0800)]
rgw: parse testdir into apache.conf

Also fix up the template to use {{field}} for stuff we don't want to parse.
There is probably a better way...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd_recovery: add missing testdir arg
Sage Weil [Thu, 7 Feb 2013 05:43:58 +0000 (21:43 -0800)]
osd_recovery: add missing testdir arg

12 years agoceph_manager: take int or string to osd_admin_socket
Sage Weil [Thu, 7 Feb 2013 01:14:24 +0000 (17:14 -0800)]
ceph_manager: take int or string to osd_admin_socket

This fixes a failure on dump_stuck.

12 years agoradosbench: fix missing format value
Sage Weil [Wed, 6 Feb 2013 17:06:35 +0000 (09:06 -0800)]
radosbench: fix missing format value

tdir is substituted in at the end.  There is probably a better way to do
this.

12 years agorgw: fix testdir format on f
Sage Weil [Wed, 6 Feb 2013 17:04:37 +0000 (09:04 -0800)]
rgw: fix testdir format on f

Format the path, not filehandle

12 years agonuke: don't try unmount if we're rebooting everything anyway
Josh Durgin [Wed, 6 Feb 2013 07:31:37 +0000 (23:31 -0800)]
nuke: don't try unmount if we're rebooting everything anyway

This can cause issues when unmount hangs. Our automatic runs reboot
everything unconditionally, so this caused a bunch of unecessary hangs
when an fs was accidentally rendered un-unmountable.

12 years agonuke: make tmpfs check only umount tmpfs
Josh Durgin [Wed, 6 Feb 2013 07:28:08 +0000 (23:28 -0800)]
nuke: make tmpfs check only umount tmpfs

This would catch things like /tmp/cephtest/mnt.client.0, which are
used by cfuse, rbd, and kclient.

12 years agorbd: fix rbd image unmount
Sage Weil [Wed, 6 Feb 2013 07:19:23 +0000 (23:19 -0800)]
rbd: fix rbd image unmount

The testdir param was missing.  Avoid this class of errors by unmounting
exactly what we mounted.

12 years agorbd: set env before running sudo
Sage Weil [Wed, 6 Feb 2013 07:01:25 +0000 (23:01 -0800)]
rbd: set env before running sudo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomisc: Close connections on reboot
Sam Lang [Tue, 5 Feb 2013 22:20:52 +0000 (16:20 -0600)]
misc:  Close connections on reboot

When nodes are rebooted, the connections remain open
even after calling reconnect and setting up new ssh
sessions to the rebooted nodes.  This causes ECONNRESET
errors to show up in the teuthology output.

Close the existing connections before trying to reconnect.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph_manager: Fix NoneType config issue
Sam Lang [Tue, 5 Feb 2013 16:38:48 +0000 (10:38 -0600)]
task/ceph_manager:  Fix NoneType config issue

kill_mon is getting a config set to None, which blows
up now due to the check for powercycle.  Initialize
the config to an empty dict if we don't get anything
on init.  This is the error showing up in teuthology:

2013-02-04T15:04:16.595 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1fcafd0>
Traceback (most recent call last):
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 142, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/mon_thrash.py", line 69, in do_join
    self.thread.get()
  File "/var/lib/teuthworker/teuthology-master/virtualenv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
AttributeError: 'NoneType' object has no attribute 'get'

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: don't use colon in default run name
Josh Durgin [Mon, 4 Feb 2013 18:39:21 +0000 (10:39 -0800)]
misc: don't use colon in default run name

LD_LIBRARY_PATH does not work with colons (and backslash does not escape them.)

12 years agoAdd testdir param to get_valgrind_args() calls
Sam Lang [Mon, 4 Feb 2013 04:08:40 +0000 (22:08 -0600)]
Add testdir param to get_valgrind_args() calls

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge branch 'wip-misc-fixes'
Sam Lang [Sun, 3 Feb 2013 17:38:10 +0000 (11:38 -0600)]
Merge branch 'wip-misc-fixes'

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke.py: Allow name of job/run to be specified
Sam Lang [Sun, 3 Feb 2013 17:09:49 +0000 (11:09 -0600)]
nuke.py:  Allow name of job/run to be specified

Nuke will cleanup the base test directory by default, but can
cleanup the test directory for a given run if specified.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorun.py: Add target name to logging info
Sam Lang [Sun, 3 Feb 2013 17:09:04 +0000 (11:09 -0600)]
run.py: Add target name to logging info

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorbd: fix .format() call with {1} syntax
Sage Weil [Sun, 3 Feb 2013 16:18:52 +0000 (08:18 -0800)]
rbd: fix .format() call with {1} syntax

IndexError: tuple index out of range

12 years agoceph_manager: use get() for self.config powercycle checks
Sage Weil [Sun, 3 Feb 2013 05:01:08 +0000 (21:01 -0800)]
ceph_manager: use get() for self.config powercycle checks

I think this is what is going on...

Traceback (most recent call last):
  File "/var/lib/teuthworker/teuthology-master/teuthology/contextutil.py", line 27, in nested
    yield vars
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph.py", line 1158, in task
    yield
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks
    manager = _run_one_task(taskname, ctx=ctx, config=config)
  File "/var/lib/teuthworker/teuthology-master/teuthology/run_tasks.py", line 14, in _run_one_task
    return fn(**kwargs)
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/dump_stuck.py", line 93, in task
    manager.kill_osd(id_)
  File "/var/lib/teuthworker/teuthology-master/teuthology/task/ceph_manager.py", line 665, in kill_osd
    if 'powercycle' in self.config and self.config['powercycle']:
TypeError: argument of type 'NoneType' is not iterable

12 years agoFixup latest commits that use /tmp/cephtest.
Sam Lang [Sat, 2 Feb 2013 17:00:17 +0000 (11:00 -0600)]
Fixup latest commits that use /tmp/cephtest.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/chdir-coredump: Use readlink -e
Sam Lang [Fri, 1 Feb 2013 22:07:29 +0000 (16:07 -0600)]
task/chdir-coredump:  Use readlink -e

realpath isn't available everywhere, use readlink -e instead.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Fix typo in previous commit
Sam Lang [Fri, 1 Feb 2013 20:07:10 +0000 (14:07 -0600)]
task/ceph: Fix typo in previous commit

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: get_testdir_base needs to be imported
Sam Lang [Fri, 1 Feb 2013 19:01:25 +0000 (13:01 -0600)]
nuke: get_testdir_base needs to be imported

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: Fix cleanup of test dir
Sam Lang [Fri, 1 Feb 2013 17:45:04 +0000 (11:45 -0600)]
nuke: Fix cleanup of test dir

Nuke used to remove /tmp/cephtest, now it tries to
remove the test dir, which it may not have the name
for.  Instead of removing the test dir, we just
remove the base directory for all test directories,
which may or may not be /tmp/cephtest.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Initialize disk_config maps
Sam Lang [Fri, 1 Feb 2013 17:37:13 +0000 (11:37 -0600)]
task/ceph: Initialize disk_config maps

The mount_options and fstype maps need to be
initialized properly for later.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Don't include existing partitions in devs
Sam Lang [Fri, 1 Feb 2013 16:53:47 +0000 (10:53 -0600)]
misc: Don't include existing partitions in devs

We don't want to include /dev/sda1, etc. in the
list of devices to use.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph: Fix device list
Sam Lang [Fri, 1 Feb 2013 16:16:44 +0000 (10:16 -0600)]
task/ceph: Fix device list

dict.items() returns a tuple, whereas we want
the values().

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: get_wwn_id_map() needs to return dict
Sam Lang [Fri, 1 Feb 2013 15:13:48 +0000 (09:13 -0600)]
misc: get_wwn_id_map() needs to return dict

If we can't find device ids, we need to return
a dict, not a list.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agonuke: Optionally check console status
Sam Lang [Fri, 1 Feb 2013 14:24:41 +0000 (08:24 -0600)]
nuke:  Optionally check console status

Only check the ipmi console status if the ipmi
parameters have been defined in .teuthology.yaml.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Fix get_wwn_id_map() to be optional
Sam Lang [Fri, 1 Feb 2013 14:20:43 +0000 (08:20 -0600)]
misc: Fix get_wwn_id_map() to be optional

Not all plana nodes have symlinks setup when
we check /dev/disk/by-id/wwn-*.  Instead of failing
here, just use the /dev/disk/sd* devices.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorun.py: Fix argument parsing for --name
Sam Lang [Fri, 1 Feb 2013 13:46:04 +0000 (07:46 -0600)]
run.py: Fix argument parsing for --name

With the addition of the --name argument to the
teuthology program (run.py), jobs were failing
because --name was being treated as a non-arg
option, even though the name was being supplied
by the workers.  Fix that and give it a metavar.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoceph_manager: wait for admin socket on restart, use for set_config
Samuel Just [Thu, 31 Jan 2013 00:45:46 +0000 (16:45 -0800)]
ceph_manager: wait for admin socket on restart, use for set_config

Fixes: #3966
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agothrashosds: note assumption for powercycling
Josh Durgin [Thu, 31 Jan 2013 17:14:06 +0000 (09:14 -0800)]
thrashosds: note assumption for powercycling

12 years agoRemove console.py
Sam Lang [Wed, 23 Jan 2013 22:27:32 +0000 (16:27 -0600)]
Remove console.py

Handling of ipmi via the console is now done through the
Console class in teuthology/orchestra/remote.py.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoAssign devices to osds using the device wwn
Sam Lang [Wed, 23 Jan 2013 02:27:41 +0000 (20:27 -0600)]
Assign devices to osds using the device wwn

Linux doesn't guarantee device names (/dev/sdb, etc.)
are always mapped to the same disk.  Instead of assigning
nominal devices to osds, we map devices by their wwn
(/dev/disk/by-id/wwn-*) to an osd (both data and journal).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoSupport power cycling osds/nodes through ipmi
Sam Lang [Wed, 23 Jan 2013 02:13:19 +0000 (20:13 -0600)]
Support power cycling osds/nodes through ipmi

This patch defines a RemoteConsole class associated
with each Remote class instance, allowing
power cycling a target through ipmi.

Fixes/Implements #3782.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoadd --name option to teuthology
Sam Lang [Wed, 23 Jan 2013 03:53:14 +0000 (21:53 -0600)]
add --name option to teuthology

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoReplace /tmp/cephtest/ with configurable path
Sam Lang [Wed, 23 Jan 2013 20:37:39 +0000 (14:37 -0600)]
Replace /tmp/cephtest/ with configurable path

Teuthology uses /tmp/cephtest/ as the scratch test directory for
a run.  This patch replaces /tmp/cephtest/ everywhere with a
per-run directory: {basedir}/{rundir} where {basedir} is a directory
configured in .teuthology.yaml (/tmp/cephtest if not specified),
and {rundir} is the name of the run, as given in --name.  If no name
is specified, {user}-{timestamp} is used.

To get the old behavior (/tmp/cephtest), set test_path: /tmp/cephtest
in .teuthology.yaml.

This change was modivated by #3782, which requires a test dir that
survives across reboots, but also resolves #3767.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoFixes for syntax errors found by pyflakes.
Sam Lang [Thu, 31 Jan 2013 13:58:57 +0000 (07:58 -0600)]
Fixes for syntax errors found by pyflakes.

This patch includes minor fixes to the teuthology
python code for syntax errors found by running
check-syntax.sh (which runs pyflakes on each file).

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoScripts to use pyflakes to check python syntax.
Sam Lang [Thu, 31 Jan 2013 13:56:56 +0000 (07:56 -0600)]
Scripts to use pyflakes to check python syntax.

pyflakes runs a basic syntax checker against python code.
The added check-syntax.sh script and Makefile run pyflakes
on the python code within the teuthology directory reporting
any syntax errors that are found.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agotask: mon_clock_skew_check: use absolute value when comparing mon_skew
Joao Eduardo Luis [Wed, 30 Jan 2013 20:52:39 +0000 (20:52 +0000)]
task: mon_clock_skew_check: use absolute value when comparing mon_skew

The monitors may report either positive or negative clock skews, and by
not using an absolute value we were constantly ignoring reported negative
clock skews.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask: mon_clock_skew_check: mark as ran once if an expected skew was found
Joao Eduardo Luis [Wed, 30 Jan 2013 20:52:03 +0000 (20:52 +0000)]
task: mon_clock_skew_check: mark as ran once if an expected skew was found

... even if we didn't get a clean/finished result from the monitors

This ought to significantly cut the waiting time if something else (or
someone else) is leaving the leader hanging thus unable to finish a given
timecheck round.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agopeer: fix filtering out of scrub from pg state
Sage Weil [Tue, 29 Jan 2013 22:04:09 +0000 (14:04 -0800)]
peer: fix filtering out of scrub from pg state

12 years agoadmin_socket: don't bother remote executing if there is no test
Sage Weil [Tue, 29 Jan 2013 11:45:45 +0000 (03:45 -0800)]
admin_socket: don't bother remote executing if there is no test

12 years agoosd_recovery: use --no-cleanup for rados bench
Samuel Just [Tue, 29 Jan 2013 03:36:17 +0000 (19:36 -0800)]
osd_recovery: use --no-cleanup for rados bench

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_recovery: inject a recovery delay
Samuel Just [Tue, 29 Jan 2013 03:22:42 +0000 (19:22 -0800)]
osd_recovery: inject a recovery delay

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_backfill: --no-cleanup for rados bench
Sage Weil [Tue, 29 Jan 2013 03:53:34 +0000 (19:53 -0800)]
osd_backfill: --no-cleanup for rados bench

12 years agocram: fix for runs with coverage enabled
Josh Durgin [Mon, 28 Jan 2013 22:53:43 +0000 (14:53 -0800)]
cram: fix for runs with coverage enabled

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoosdthrasher: inject pause on a live (on in) osd
Sage Weil [Sat, 26 Jan 2013 21:13:08 +0000 (13:13 -0800)]
osdthrasher: inject pause on a live (on in) osd

12 years agotask: mon_clock_skew_check: increase timeout and kick it off only on stop
Joao Eduardo Luis [Fri, 25 Jan 2013 12:09:49 +0000 (12:09 +0000)]
task: mon_clock_skew_check: increase timeout and kick it off only on stop

We were kicking-off the timeout as soon as we started; it's better however
to kick if off only when we are told to stop (as long as 'at-least-once'
is true).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask: mon_clock_skew_check: distinguish between on-going and finished check
Joao Eduardo Luis [Thu, 24 Jan 2013 18:00:39 +0000 (18:00 +0000)]
task: mon_clock_skew_check: distinguish between on-going and finished check

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agoceph_manager: turn long stall injection off by default
Samuel Just [Fri, 25 Jan 2013 01:31:38 +0000 (17:31 -0800)]
ceph_manager: turn long stall injection off by default

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_recovery: fix up incomplete test
Sage Weil [Fri, 25 Jan 2013 00:24:16 +0000 (16:24 -0800)]
osd_recovery: fix up incomplete test

- stop rados bench from cleaning up
- flush pg stats
- fix sleep call

One or more of these helped fix this test, don't really care which.

12 years agoceph_manager: fix get_num_active_recovered()
Sage Weil [Fri, 25 Jan 2013 00:23:33 +0000 (16:23 -0800)]
ceph_manager: fix get_num_active_recovered()

The states now have 'backfill' *or* 'recover' in them.

12 years agoworkunit: pass java path as env variable
Sage Weil [Thu, 24 Jan 2013 23:20:47 +0000 (15:20 -0800)]
workunit: pass java path as env variable

The libcephfs-java test needs this.

12 years agoceph_manager: use 80/70 as pause_long, pause_check_after defaults
Samuel Just [Thu, 24 Jan 2013 20:50:24 +0000 (12:50 -0800)]
ceph_manager: use 80/70 as pause_long, pause_check_after defaults

OSD::op_tp suicides after 150.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoceph_manager: use do_rados for rmpool
Samuel Just [Thu, 24 Jan 2013 18:07:10 +0000 (10:07 -0800)]
ceph_manager: use do_rados for rmpool

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip_heartbeat'
Sage Weil [Thu, 24 Jan 2013 02:43:02 +0000 (18:43 -0800)]
Merge remote-tracking branch 'gh/wip_heartbeat'

12 years agoceph_manager: default chance_down to 0.4
Samuel Just [Thu, 24 Jan 2013 01:44:05 +0000 (17:44 -0800)]
ceph_manager: default chance_down to 0.4

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoceph_manager: add filestore and heartbeat stalls
Samuel Just [Thu, 24 Jan 2013 00:13:22 +0000 (16:13 -0800)]
ceph_manager: add filestore and heartbeat stalls

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoUse ceph git repo instead of github.
Sandon Van Ness [Wed, 23 Jan 2013 22:08:53 +0000 (14:08 -0800)]
Use ceph git repo instead of github.

This code change is so that instead of pulling the tarball of github
which can be unreliable at times it instead uses the ceph repo mirror
and serves as the same function. Now it is using git archive and no
longer uses wget. Because of this less tar-fu is needed to extract
the necessary files as it can be done directly through git archive.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoosd: Testing of deep-scrub omap changes
David Zafman [Sat, 19 Jan 2013 01:11:09 +0000 (17:11 -0800)]
osd: Testing of deep-scrub omap changes

Fix scrub_test.py and add omap corruption test

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agotest: create /tmp/cephtest/mnt.{id}
Joe Buck [Mon, 14 Jan 2013 20:09:56 +0000 (12:09 -0800)]
test: create /tmp/cephtest/mnt.{id}

The workunit task assumes that a mount exists
at /tmp/cephtest/mnt.{id}
This patch creates the path if it doesn't
exist, enabling workunits to run in the absense
of kclient or ceph-fuse tasks.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agotask: mon_clock_skew_check: add option to run at least one timecheck
Joao Eduardo Luis [Fri, 18 Jan 2013 17:43:47 +0000 (17:43 +0000)]
task: mon_clock_skew_check: add option to run at least one timecheck

  at-least-once          Runs at least once, even if we are told to stop.
                         (default: True)
  at-least-once-timeout  If we were told to stop but we are attempting to
                         run at least once, timeout after this many
                         seconds. (default: 300)

Fixes: #3854
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask/mds_thrasher: New task for thrashing the mds
Sam Lang [Wed, 9 Jan 2013 22:02:42 +0000 (16:02 -0600)]
task/mds_thrasher:  New task for thrashing the mds

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agorbd.py: update scratch and test image sizes
Alex Elder [Fri, 18 Jan 2013 18:47:34 +0000 (12:47 -0600)]
rbd.py: update scratch and test image sizes

Test 167 was failing due to running out of space on the scratch
file system.  The test reserves 21MB in a file, and repeats 50
times.  It required just over 1GB, so I bumped the default size
for the testing device to 1200 MB.  I increased the test device
size as well.

This resolves http://tracker.newdream.net/issues/3864.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agoceph: pass ceph.conf to osdmaptool
Sage Weil [Thu, 17 Jan 2013 20:26:51 +0000 (12:26 -0800)]
ceph: pass ceph.conf to osdmaptool

This ensure it sees the chooseleaf option and generates the proper
CRUSH rules.

12 years agoWhen running teuthology with targets provisionned on OpenStack and kvm, the disks...
Loic Dachary [Wed, 16 Jan 2013 10:48:05 +0000 (11:48 +0100)]
When running teuthology with targets provisionned on OpenStack and kvm, the disks will show under /dev/vda, /dev/vdb etc. Add them to the list of devices to inspect and use for tests.

Signed-off-by: Loic Dachary <loic@dachary.org>
12 years agoAdd cram task
Josh Durgin [Tue, 15 Jan 2013 01:55:43 +0000 (17:55 -0800)]
Add cram task

This runs cram tests, which are an easy way to test output
stays consistent. We already use cram for basic cli tests with no cluster,
and now we can use it for whole system tests too.

12 years agoRevert "task/kclient: chmod root to 1777."
Greg Farnum [Tue, 15 Jan 2013 00:14:08 +0000 (16:14 -0800)]
Revert "task/kclient: chmod root to 1777."

This reverts commit f17847e537802671c6f90bd1a0cdaa0e9d1e6f7a. It had
a typo and we hopefully don't need it.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoceph.conf: separate replicas across osds
Sage Weil [Mon, 14 Jan 2013 06:52:00 +0000 (22:52 -0800)]
ceph.conf: separate replicas across osds

ceph.git master now separates across crush hosts without this setting.
For teuthology clusters, we don't want that (unless the tests specifies
otherwise).

12 years agoMerge remote-tracking branch 'gh/wip-3633'
Sage Weil [Fri, 11 Jan 2013 02:04:55 +0000 (18:04 -0800)]
Merge remote-tracking branch 'gh/wip-3633'

12 years agotask/kclient: chmod root to 1777.
Greg Farnum [Tue, 8 Jan 2013 18:11:03 +0000 (10:11 -0800)]
task/kclient: chmod root to 1777.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agotask/mpi: Allow working directory to be specified
Sam Lang [Tue, 8 Jan 2013 15:56:41 +0000 (09:56 -0600)]
task/mpi:  Allow working directory to be specified

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask: A task to setup mpi
Sam Lang [Fri, 14 Dec 2012 03:54:26 +0000 (17:54 -1000)]
task: A task to setup mpi

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/ceph-fuse: chmod root to 1777
Sam Lang [Fri, 14 Dec 2012 17:10:39 +0000 (07:10 -1000)]
task/ceph-fuse: chmod root to 1777

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/pexec: Add barrier capability
Sam Lang [Thu, 27 Dec 2012 23:33:07 +0000 (17:33 -0600)]
task/pexec: Add barrier capability

This patch adds the ability to barrier between
parallel exec tasks so that all tasks will perform
the following step (after the barrier) at the same
time.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/pexec: More fixes for all case, exec on hosts
Sam Lang [Fri, 14 Dec 2012 17:30:15 +0000 (07:30 -1000)]
task/pexec: More fixes for all case, exec on hosts

We don't want to do an exec per role, but per-host.  We
were already doing an exec per host, but the names were confusing.
This fixes the names up and removes the role parameters.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/pexec: Fix when 'all' is used
Sam Lang [Tue, 11 Dec 2012 16:53:57 +0000 (06:53 -1000)]
task/pexec: Fix when 'all' is used

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask: mon_clock_skew_check.py: Check for clock skews on the monitors
Joao Eduardo Luis [Fri, 4 Jan 2013 18:16:58 +0000 (18:16 +0000)]
task: mon_clock_skew_check.py: Check for clock skews on the monitors

Will run for as long as teuthology runs. By default, fails if any clock
skews higher than 0.05 seconds are detected, but will only fail when the
teuthology run finishes and after reporting a list of all the detected
skews.

Accepted options:

 interval     amount of seconds to wait in-between checks. (default: 30.0)
 max-skew     maximum skew, in seconds, that is considered tolerable
              before issuing a warning. (default: 0.05)
 expect-skew  'true' or 'false', to indicate whether to expect a skew
              during the run or not. If 'true', the test will fail if no
              skew is found, and succeed if a skew is indeed found; if
              'false', it's the other way around. (default: false)
 never-fail   Don't fail the run if a skew is detected and we weren't
              expecting it, or if no skew is detected and we were
              expecting it. (default: False)

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agotask: ceph_manager: add 'get_mon_health' function
Joao Eduardo Luis [Fri, 4 Jan 2013 17:03:55 +0000 (17:03 +0000)]
task: ceph_manager: add 'get_mon_health' function

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agossh_keys.py: pull the keys out of targets entry
Joe Buck [Thu, 13 Dec 2012 22:42:09 +0000 (14:42 -0800)]
ssh_keys.py: pull the keys out of targets entry
rather than the hosts known hosts file.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoceph: malloc check =3 means we hear on stderr too
Sage Weil [Thu, 3 Jan 2013 04:44:33 +0000 (20:44 -0800)]
ceph: malloc check =3 means we hear on stderr too

12 years agoceph: enable malloc debugging for ceph-osd
Sage Weil [Wed, 2 Jan 2013 20:31:48 +0000 (12:31 -0800)]
ceph: enable malloc debugging for ceph-osd

12 years agotask: ceph: don't wait for 'healthy' if 'wait-for-healthy' is false.
Joao Eduardo Luis [Mon, 31 Dec 2012 16:11:50 +0000 (16:11 +0000)]
task: ceph: don't wait for 'healthy' if 'wait-for-healthy' is false.

This new config option obviously defaults to 'true' in order to not only
maintain compatibility, but because it makes sense.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
12 years agorgw: enable logging in ceph.conf
Sage Weil [Sat, 29 Dec 2012 16:28:44 +0000 (08:28 -0800)]
rgw: enable logging in ceph.conf

12 years agotask/swift: change upstream repository url
Yehuda Sadeh [Fri, 21 Dec 2012 18:20:02 +0000 (10:20 -0800)]
task/swift: change upstream repository url

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoCephManager: add ability to test split
Samuel Just [Tue, 11 Dec 2012 22:21:48 +0000 (14:21 -0800)]
CephManager: add ability to test split

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agopexec.py: Parse out role ID from the back.
Joe Buck [Thu, 6 Dec 2012 22:17:16 +0000 (14:17 -0800)]
pexec.py: Parse out role ID from the back.
Also, do not assume that the command needs to run from a specific directory.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
12 years agoAdding a Hadoop task.
Joe Buck [Thu, 6 Dec 2012 22:19:55 +0000 (14:19 -0800)]
Adding a Hadoop task.
This task configures and starts a Hadoop cluster.
It does not run any jobs, that must be done after
this task runs.
Can run on either Ceph or HDFS.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
12 years agoNew ssh task that adds keys for node -> node ssh.
Joe Buck [Thu, 6 Dec 2012 22:18:41 +0000 (14:18 -0800)]
New ssh task that adds keys for node -> node ssh.
This generates a new keypair, pushes it to all nodes
in the context and adds all hosts to all other hosts
.ssh/authorized_keys file.
Cleans up all keys and authorized_keys entries
afterwards.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoceph.conf: default to smaller recovery chunk
Samuel Just [Mon, 10 Dec 2012 22:33:41 +0000 (14:33 -0800)]
ceph.conf: default to smaller recovery chunk

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>