]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agofix misc checks that wait for N osds to be up
Sage Weil [Thu, 19 Apr 2012 19:43:54 +0000 (12:43 -0700)]
fix misc checks that wait for N osds to be up

These all cut&pasted broken code, blah!

13 years agowhitelist xfs_fsr syslog noise
Sage Weil [Wed, 18 Apr 2012 18:21:10 +0000 (11:21 -0700)]
whitelist xfs_fsr syslog noise

Ignore lines like

2012-04-17T13:44:11-07:00 plana59 fsr[5454]: DEBUG: fsize=450560 blsz_dio=450560 d_min=512 d_max=2147483136 pgsz=4096

13 years agoAdd task for running fsx on an rbd image.
Josh Durgin [Thu, 12 Apr 2012 01:03:44 +0000 (18:03 -0700)]
Add task for running fsx on an rbd image.

13 years agofilestore_idempotent: use new sequence-based tester
Sage Weil [Sat, 14 Apr 2012 21:06:12 +0000 (14:06 -0700)]
filestore_idempotent: use new sequence-based tester

random seed, inject at 50-300.

13 years agorbd.py: add xfstests functionality
Sage Weil [Sat, 14 Apr 2012 05:28:05 +0000 (22:28 -0700)]
rbd.py: add xfstests functionality

Add tasks for running xfstests over a pair of rbd volumes.  The main
one is called xfstests, and it sets up rbd volumes of specified size
and runs a set of likely-to-be-successful tests.  The other one is
used by the first, and is called run_xfstests.  This provides a
generic (device rather than rbd device oriented) interface to
xfstests, and should probably be made standalone and distinct from
rbd at some point.

Using multiple rbd devices required the rbd udev rule manipulation
to ignore errors, since it appears that each device caused the a
teardown attempt, which leads to failures the second time around.
There's probably a more robust solution, but this works for now.

Signed-off-by: Alex Elder <elder@dreamhost.com>
13 years agoceph_manager: don't try to start greenlet twice
Josh Durgin [Tue, 10 Apr 2012 23:23:58 +0000 (16:23 -0700)]
ceph_manager: don't try to start greenlet twice

spawn already scheduled it. Trying to start it again hits an assert.

13 years agokernel: kludge around mysterious 0-byte .git/HEAD files
Sage Weil [Tue, 10 Apr 2012 20:41:16 +0000 (13:41 -0700)]
kernel: kludge around mysterious 0-byte .git/HEAD files

No idea where these are coming from, but they break nodes with behavior
like

ubuntu@plana08:~$ sudo install -d -m0755 /lib/firmware/updates && cd /lib/firmware/updates && sudo git init
Reinitialized existing Git repository in /lib/firmware/updates/.git/
ubuntu@plana08:/lib/firmware/updates$ sudo git --git-dir=/lib/firmware/updates/.git config --get remote.origin.url >/dev/null || sudo git --git-dir=/lib/firmware/updates/.git remote add origin git://ceph.newdream.net/git/linux-firmware.git
ubuntu@plana08:/lib/firmware/updates$ cd /lib/firmware/updates && sudo git pull origin master
fatal: Not a git repository (or any of the parent directories): .git

where the .git directory looks like

total 32
drwxr-xr-x 7 root root 4096 2012-04-10 12:52 .
drwxr-xr-x 3 root root 4096 2012-04-06 13:54 ..
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 branches
-rwxr--r-- 1 root root  236 2012-04-10 11:33 config
-rw-r--r-- 1 root root    0 2012-04-10 12:52 config.lock
-rw-r--r-- 1 root root    0 2012-04-06 13:54 description
-rw-r--r-- 1 root root    0 2012-04-06 13:54 FETCH_HEAD
-rw-r--r-- 1 root root    0 2012-04-06 13:54 HEAD
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 hooks
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 info
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 objects
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 refs

Hopefully someone can figure out what is causing this and revert this
later.

13 years agokernel: reset to remote firmware branch; don't pull
Sage Weil [Tue, 10 Apr 2012 16:17:24 +0000 (09:17 -0700)]
kernel: reset to remote firmware branch; don't pull

Pull might merge if upstream rebases.  Just make our branch match the
remote one.

13 years agokernel: change git incantation for firmware pull
Sage Weil [Tue, 10 Apr 2012 16:12:01 +0000 (09:12 -0700)]
kernel: change git incantation for firmware pull

The 'git pull <uri>' seemed to consistently fail on some nodes.  Can't be
sure this was really the problem with them all down now, but this is more
common, and works.

13 years agols: another newline
Sage Weil [Tue, 10 Apr 2012 15:59:47 +0000 (08:59 -0700)]
ls: another newline

13 years agols: remote stray newline
Sage Weil [Tue, 10 Apr 2012 15:57:19 +0000 (08:57 -0700)]
ls: remote stray newline

13 years agoChange to local mirror of linux-firmware repo to try to stop failures
Dan Mick [Mon, 9 Apr 2012 23:58:59 +0000 (16:58 -0700)]
Change to local mirror of linux-firmware repo to try to stop failures

13 years agoKernel: Pull linux-firmware from git
Mark Nelson [Tue, 27 Mar 2012 22:25:41 +0000 (17:25 -0500)]
Kernel: Pull linux-firmware from git

Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
13 years agocleanup-and-unlock.sh: helper to nuke and then unlock a set of nodes
Sage Weil [Wed, 4 Apr 2012 20:56:10 +0000 (13:56 -0700)]
cleanup-and-unlock.sh: helper to nuke and then unlock a set of nodes

I usually do something like

 teuthology-lock --list-targets --owner scheduled_sage@metropolis > /tmp/b
 ./cleanup-and-unlock.sh /tmp/b scheduled_sage@metropolis

It's a huge headache when some of the nodes are down, though.  A better
thing would be if nuke had an --unlock option, and would continue with the
nodes that didn't error out.

But, this is still useful as is.

13 years agoschedule_suite.sh: helper to schedule a suite
Sage Weil [Wed, 4 Apr 2012 20:54:43 +0000 (13:54 -0700)]
schedule_suite.sh: helper to schedule a suite

There's a bunch of stuff hardcoded in here, similar to the nightly, but
it's a useful starting point.

13 years agoAdded assertion to check that targets > roles
Mark Nelson [Tue, 3 Apr 2012 21:53:17 +0000 (14:53 -0700)]
Added assertion to check that targets > roles

Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>
13 years agonuke: don't run umount when no xargs args
Sage Weil [Tue, 3 Apr 2012 22:56:36 +0000 (15:56 -0700)]
nuke: don't run umount when no xargs args

Gets rid of this noise:

INFO:teuthology.nuke:Unmount any osd data directories...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
...

13 years agoceph.conf: enable 'osd recover clone overlap'
Sage Weil [Fri, 30 Mar 2012 23:15:20 +0000 (16:15 -0700)]
ceph.conf: enable 'osd recover clone overlap'

to test the recovery cloning in qa.  this was redone, but forgot to enable
it in qa.

13 years agomake Thrasher not inherit from Greenlet
Samuel Just [Fri, 30 Mar 2012 01:07:30 +0000 (18:07 -0700)]
make Thrasher not inherit from Greenlet

13 years agoAdd test for object source marked down
Samuel Just [Fri, 30 Mar 2012 01:07:10 +0000 (18:07 -0700)]
Add test for object source marked down

13 years agoallow use of a separate journal block device
Samuel Just [Tue, 27 Mar 2012 22:05:11 +0000 (15:05 -0700)]
allow use of a separate journal block device

13 years agorbd: fix typo in default config
Josh Durgin [Mon, 26 Mar 2012 18:54:49 +0000 (11:54 -0700)]
rbd: fix typo in default config

pyflakes would have caught this if 'all' weren't a built-in function

13 years agoadd osd_recovery task to test divergent osd logs
Sage Weil [Sat, 24 Mar 2012 23:42:47 +0000 (16:42 -0700)]
add osd_recovery task to test divergent osd logs

13 years agobackfill: use 'rbd' pool instead of 'data'
Sage Weil [Sat, 24 Mar 2012 23:43:19 +0000 (16:43 -0700)]
backfill: use 'rbd' pool instead of 'data'

(data has a replay interval, which makes writes take longer to resume
after repeering)

13 years agorename backfill -> osd_backfill
Sage Weil [Sat, 24 Mar 2012 23:05:11 +0000 (16:05 -0700)]
rename backfill -> osd_backfill

13 years agoput filestore xattr option in [global]
Sage Weil [Sat, 24 Mar 2012 22:35:43 +0000 (15:35 -0700)]
put filestore xattr option in [global]

...for test_filestore_idempotent's benefit

13 years agosuite: add missing print statement
Josh Durgin [Wed, 21 Mar 2012 19:00:55 +0000 (12:00 -0700)]
suite: add missing print statement

13 years agosuite: fix print statement when summary doesn't exist
Josh Durgin [Wed, 21 Mar 2012 18:58:17 +0000 (11:58 -0700)]
suite: fix print statement when summary doesn't exist

13 years agoAdd watch op to rados.py
Samuel Just [Wed, 21 Mar 2012 01:56:20 +0000 (18:56 -0700)]
Add watch op to rados.py

Signed-off-by: Samuel Just <sam.just@dreamhost.com>
13 years agosuite: failed runs might not have durations
Josh Durgin [Tue, 20 Mar 2012 14:48:45 +0000 (07:48 -0700)]
suite: failed runs might not have durations

This was one cause of emails not being sent - stale /tmp/cephtest dirs
fail without recording a duration.

13 years agosuite, coverage: use absolute dirs for isdir checks
Josh Durgin [Mon, 19 Mar 2012 21:16:14 +0000 (14:16 -0700)]
suite, coverage: use absolute dirs for isdir checks

This fixes the results to wait for all jobs to complete again.

13 years agofilestore_idempotent: get coverage and coredumps
Josh Durgin [Mon, 19 Mar 2012 18:57:02 +0000 (11:57 -0700)]
filestore_idempotent: get coverage and coredumps

13 years agosuite: more results logging
Josh Durgin [Mon, 19 Mar 2012 18:31:33 +0000 (11:31 -0700)]
suite: more results logging

13 years agoceph.conf: no comment
Sage Weil [Sun, 18 Mar 2012 18:56:18 +0000 (11:56 -0700)]
ceph.conf: no comment

13 years agoceph.conf: set 'filestore xattr use omap = true'
Sage Weil [Sun, 18 Mar 2012 18:06:05 +0000 (11:06 -0700)]
ceph.conf: set 'filestore xattr use omap = true'

13 years agofix teuthology-ls isdir check
Sage Weil [Sun, 18 Mar 2012 17:50:17 +0000 (10:50 -0700)]
fix teuthology-ls isdir check

13 years agorun valgrind with cwd set to /tmp/cephtest/archive/coredump
Sage Weil [Wed, 14 Mar 2012 20:20:54 +0000 (13:20 -0700)]
run valgrind with cwd set to /tmp/cephtest/archive/coredump

This lets us capture the vgcore.* files, which always go to valgrind's
cwd.

Fixes: #1953
13 years agosuite: log results and coverage generation
Josh Durgin [Fri, 16 Mar 2012 18:40:17 +0000 (11:40 -0700)]
suite: log results and coverage generation

Need to figure out where and when results emails are failing.

13 years agoresults: make sure email is sent before anything else fails
Josh Durgin [Thu, 15 Mar 2012 23:21:33 +0000 (16:21 -0700)]
results: make sure email is sent before anything else fails

13 years agoMerge branch 'master' of github.com:ceph/teuthology
Mark Nelson [Wed, 14 Mar 2012 20:32:23 +0000 (15:32 -0500)]
Merge branch 'master' of github.com:ceph/teuthology

13 years agogitbuilder: put flavor last
Sage Weil [Tue, 13 Mar 2012 17:09:18 +0000 (10:09 -0700)]
gitbuilder: put flavor last

in case we refine the field later

13 years agoPull from new gitbuilder.ceph.com locations.
Sage Weil [Tue, 13 Mar 2012 17:02:26 +0000 (10:02 -0700)]
Pull from new gitbuilder.ceph.com locations.

Simplifies the flavor stuff into a tuple of

<package,type,flavor,dist,arch>

where package is ceph, kenrel, etc.
type is tarball, deb
flavor is basic, gcov, notcmalloc
arch is x86_64, i686 (uname -m)
dist is oneiric, etc. (lsb_release -s -c)

13 years agoMade the example better with multiple roles.
Mark Nelson [Mon, 12 Mar 2012 20:13:36 +0000 (15:13 -0500)]
Made the example better with multiple roles.

13 years agoAdded some example yaml files and an example parallel execution task.
Mark Nelson [Mon, 12 Mar 2012 19:33:10 +0000 (14:33 -0500)]
Added some example yaml files and an example parallel execution task.

13 years agoautotest: pull from github.com/ceph/autotest
Sage Weil [Sun, 11 Mar 2012 03:15:21 +0000 (19:15 -0800)]
autotest: pull from github.com/ceph/autotest

13 years agoworkunit: include python2.7 path too
Sage Weil [Sat, 10 Mar 2012 23:34:19 +0000 (15:34 -0800)]
workunit: include python2.7 path too

13 years agorados.py: include setattr and rmattr
Samuel Just [Fri, 17 Feb 2012 00:10:45 +0000 (16:10 -0800)]
rados.py: include setattr and rmattr

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agolock: Improved logging when there aren't enough nodes available to lock-many.
Mark Nelson [Wed, 7 Mar 2012 16:34:55 +0000 (08:34 -0800)]
lock: Improved logging when there aren't enough nodes available to lock-many.

13 years agolock: Added a --locked flag to teuthology-lock.
Mark Nelson [Wed, 7 Mar 2012 17:02:39 +0000 (09:02 -0800)]
lock: Added a --locked flag to teuthology-lock.

Can be used to restrict searches based on lock status, e.g.
'teuthology-lock --list -a --locked false --status up' shows available nodes.

13 years agonuke: unmount osd data directories
Sage Weil [Tue, 6 Mar 2012 17:34:38 +0000 (09:34 -0800)]
nuke: unmount osd data directories

This helps us avoid reboot to clean up osd data directories that are left
mounted.

13 years agoUse non-zero exit status if any tests failed
Josh Durgin [Mon, 5 Mar 2012 18:28:35 +0000 (10:28 -0800)]
Use non-zero exit status if any tests failed

Fixes: #1989
13 years agogithub.com/NewDreamNetwork -> github.com/ceph
Sage Weil [Fri, 2 Mar 2012 18:55:19 +0000 (10:55 -0800)]
github.com/NewDreamNetwork -> github.com/ceph

13 years agodump_stuck: note required ceph configuration
Josh Durgin [Wed, 29 Feb 2012 23:47:17 +0000 (15:47 -0800)]
dump_stuck: note required ceph configuration

13 years agodump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean...
Josh Durgin [Tue, 28 Feb 2012 21:55:46 +0000 (13:55 -0800)]
dump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean/stale pgs

13 years agopeer: ignore +scrubbing portion of pg state
Sage Weil [Tue, 28 Feb 2012 17:50:29 +0000 (09:50 -0800)]
peer: ignore +scrubbing portion of pg state

It can cause the mon state and osd states to not match.

13 years agopeer: wait for peering to complete, or block
Sage Weil [Sun, 26 Feb 2012 05:05:00 +0000 (21:05 -0800)]
peer: wait for peering to complete, or block

We need to wait for peering to either complete, or block because it is
waiting for another PG.  _Then_ look at all the PG states and compare the
mon values with what we get from qeurying the OSDs directly.

13 years agopeer: remove unused variable
Josh Durgin [Fri, 24 Feb 2012 23:01:34 +0000 (15:01 -0800)]
peer: remove unused variable

13 years agomisc: always return a usable result from get_valgrind_args
Josh Durgin [Fri, 24 Feb 2012 22:55:49 +0000 (14:55 -0800)]
misc: always return a usable result from get_valgrind_args

13 years agorgw: simplify valgrind args
Josh Durgin [Fri, 24 Feb 2012 22:55:23 +0000 (14:55 -0800)]
rgw: simplify valgrind args

13 years agoadd peer task
Sage Weil [Fri, 24 Feb 2012 23:05:17 +0000 (15:05 -0800)]
add peer task

Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.

13 years agolost_unfound: list missing/unfound for each pg and verify the unfound counts
Sage Weil [Fri, 24 Feb 2012 19:11:59 +0000 (11:11 -0800)]
lost_unfound: list missing/unfound for each pg and verify the unfound counts

This also tests the pg list_missing functionality.

13 years agoceph_manager: list_pg_missing
Sage Weil [Fri, 24 Feb 2012 17:22:03 +0000 (09:22 -0800)]
ceph_manager: list_pg_missing

List missing objects for the given pgid.

13 years agoWhitespace and unnecessary formatting fixes
Josh Durgin [Fri, 24 Feb 2012 20:04:58 +0000 (12:04 -0800)]
Whitespace and unnecessary formatting fixes

13 years agoceph, ceph-fuse: simplify valgrind argument additions
Josh Durgin [Fri, 24 Feb 2012 19:21:04 +0000 (11:21 -0800)]
ceph, ceph-fuse: simplify valgrind argument additions

13 years agorefactor all valgrind users to use a get_valgrind_args() helper
Sage Weil [Wed, 22 Feb 2012 17:18:17 +0000 (09:18 -0800)]
refactor all valgrind users to use a get_valgrind_args() helper

This avoids much annoying, duplicated code.

13 years agoceph: always create valgrind logs dir
Sage Weil [Wed, 22 Feb 2012 01:06:50 +0000 (17:06 -0800)]
ceph: always create valgrind logs dir

Other tasks use it too.  It's more annoying to conditionally create it.

13 years agoceph: always try to process valgrind logs
Sage Weil [Wed, 22 Feb 2012 00:10:37 +0000 (16:10 -0800)]
ceph: always try to process valgrind logs

Check for errors in valgrind logs even if there is no valgrind option
the ceph task config stanza.  Other tasks can run via valgrind (ceph-fuse,
rgw).  If the logs aren't there, this is harmless.

13 years agorgw: add valgrind support
Sage Weil [Wed, 22 Feb 2012 00:08:21 +0000 (16:08 -0800)]
rgw: add valgrind support

tasks:
- ceph:
- rgw:
   client.a:
     valgrind: [--tool=memcheck]

13 years agorgw: accept dict
Sage Weil [Tue, 21 Feb 2012 23:47:32 +0000 (15:47 -0800)]
rgw: accept dict

e.g.,

tasks:
...
- rgw:
    client.0:
    client.1:

13 years agolost_unfound: new mark_unfound_lost syntax
Sage Weil [Fri, 24 Feb 2012 04:07:24 +0000 (20:07 -0800)]
lost_unfound: new mark_unfound_lost syntax

13 years agodump_stuck: flush stats before waiting for recovery/clean
Josh Durgin [Fri, 24 Feb 2012 01:07:26 +0000 (17:07 -0800)]
dump_stuck: flush stats before waiting for recovery/clean

13 years agoAdd a task for testing stuck pg visibility.
Josh Durgin [Tue, 21 Feb 2012 21:11:05 +0000 (13:11 -0800)]
Add a task for testing stuck pg visibility.

13 years agoMove duration calculation to an internal task
Josh Durgin [Tue, 21 Feb 2012 23:01:45 +0000 (15:01 -0800)]
Move duration calculation to an internal task

This excludes all generic start up costs, like waiting for locks,
rebooting into a new kernel, etc.

13 years agoAdd necessary imports for s3 tasks, and keep them alphabetical.
Josh Durgin [Tue, 21 Feb 2012 22:54:33 +0000 (14:54 -0800)]
Add necessary imports for s3 tasks, and keep them alphabetical.

13 years agos3roundtrip, s3readwrite: access key uses url safe chars
Yehuda Sadeh [Tue, 21 Feb 2012 20:23:38 +0000 (12:23 -0800)]
s3roundtrip, s3readwrite: access key uses url safe chars

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agorgw: access key uses url safe chars
Yehuda Sadeh [Tue, 21 Feb 2012 20:12:03 +0000 (12:12 -0800)]
rgw: access key uses url safe chars

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoceph: valgrind trumps coverage when picking a flavor
Sage Weil [Mon, 20 Feb 2012 23:17:52 +0000 (15:17 -0800)]
ceph: valgrind trumps coverage when picking a flavor

valgrind will crash if we don't use notcmalloc; coverage will silently
fail to collect coverage info.

13 years agoceph.conf: no lockdep by default
Sage Weil [Mon, 20 Feb 2012 22:54:10 +0000 (14:54 -0800)]
ceph.conf: no lockdep by default

13 years agosuite.results: include test duration in output
Sage Weil [Mon, 20 Feb 2012 21:38:06 +0000 (13:38 -0800)]
suite.results: include test duration in output

13 years agocfuse -> ceph-fuse
Sage Weil [Mon, 20 Feb 2012 15:12:53 +0000 (07:12 -0800)]
cfuse -> ceph-fuse

13 years agoceph: allow valgrind per-type (not just per-name)
Sage Weil [Mon, 20 Feb 2012 15:04:45 +0000 (07:04 -0800)]
ceph: allow valgrind per-type (not just per-name)

13 years agolost_unfound: mark osds in when we revive them
Sage Weil [Mon, 20 Feb 2012 03:40:45 +0000 (19:40 -0800)]
lost_unfound: mark osds in when we revive them

so that we test what we meant to.  It also lets us actually go clean at the
very end.

13 years agoceph_manager: ignore stale states when counting
Sage Weil [Sat, 18 Feb 2012 22:44:53 +0000 (14:44 -0800)]
ceph_manager: ignore stale states when counting

also remove assumptions about ordering of states

13 years agowait_till_clean -> wait_for_clean and wait_for_recovery
Sage Weil [Sat, 18 Feb 2012 05:53:25 +0000 (21:53 -0800)]
wait_till_clean -> wait_for_clean and wait_for_recovery

Clean now also means the correct number of replicas, whereas recovered
means we have done all the work we can do given the replicas/osds we have.
For example, degraded and clean are now mutually exclusive.

Also move away from 'till'.

13 years agobackfill: wait for clean before writing+blackholing
Sage Weil [Tue, 14 Feb 2012 23:24:11 +0000 (15:24 -0800)]
backfill: wait for clean before writing+blackholing

If we have straggler pgs and blackhole osd.1, we can deadlock because we
need info from that osd to repeer and continue.  Make sure we're clean, and
then start the write + blackhole + kill test.

13 years agonuke: nuke testrados too
Sage Weil [Tue, 14 Feb 2012 23:23:19 +0000 (15:23 -0800)]
nuke: nuke testrados too

Slightly fewer nuke -r's

13 years agoceph_manager: mark in a bit more often than out
Sage Weil [Sun, 12 Feb 2012 22:36:11 +0000 (14:36 -0800)]
ceph_manager: mark in a bit more often than out

Otherwise we can get into cases where many/most nodes are out, and things
don't work as well.  e.g., crush may start to fail.

13 years agoceph: use any fs, not just btrfs, on scratch devices
Sage Weil [Sat, 11 Feb 2012 22:24:39 +0000 (14:24 -0800)]
ceph: use any fs, not just btrfs, on scratch devices

The

  btrfs: true

syntax is replaced with

  fs: btrfs

or ext4, xfs.

13 years agonuke: nuke testrados and rados processes, too
Sage Weil [Sat, 11 Feb 2012 22:20:41 +0000 (14:20 -0800)]
nuke: nuke testrados and rados processes, too

So that -r is needed slightly less often.

13 years agomisc: make get_scratch_devices look for (almost) any disk that's not mounted
Sage Weil [Sat, 11 Feb 2012 22:20:18 +0000 (14:20 -0800)]
misc: make get_scratch_devices look for (almost) any disk that's not mounted

13 years agohammer.sh: assume path is set
Sage Weil [Sat, 11 Feb 2012 22:19:49 +0000 (14:19 -0800)]
hammer.sh: assume path is set

13 years agoceph: always add logger for daemons
Josh Durgin [Thu, 2 Feb 2012 17:29:03 +0000 (09:29 -0800)]
ceph: always add logger for daemons

The extra log function added redundant info and didn't allow different
levels.

13 years agoceph: rename type parameter to type_
Josh Durgin [Thu, 2 Feb 2012 17:27:11 +0000 (09:27 -0800)]
ceph: rename type parameter to type_

type is a built-in and shouldn't be aliased.

13 years agoceph: use the correct comparison operator
Josh Durgin [Thu, 2 Feb 2012 17:27:04 +0000 (09:27 -0800)]
ceph: use the correct comparison operator

is compares identity (i.e. address in cpython), not value.

13 years agoceph: sync before unmounting btrfs devices
Josh Durgin [Thu, 2 Feb 2012 17:26:45 +0000 (09:26 -0800)]
ceph: sync before unmounting btrfs devices

There may still be writes in flight, since the osds may not have
shutdown cleanly. This should prevent EBUSY when unmounting.

Fixes: #1997
13 years agoceph: delay raising exceptions until all daemons are stopped
Josh Durgin [Thu, 2 Feb 2012 17:26:25 +0000 (09:26 -0800)]
ceph: delay raising exceptions until all daemons are stopped

If a daemon crashes, the exception is raised when we stop it. This
caused some daemons to continue running during cleanup, since the rest
of the daemons of the same type would not be shut down. Also log each
daemon that crashed, for easier debugging.

Fixes: #1744
13 years agoadd backfill task
Sage Weil [Wed, 1 Feb 2012 00:25:53 +0000 (16:25 -0800)]
add backfill task

This does a basic test of backfill functionality, including a divergent
log on a backfill target (#1983).

13 years agoceph_manager: add manager.blackhole_kill_osd()
Sage Weil [Wed, 1 Feb 2012 00:13:59 +0000 (16:13 -0800)]
ceph_manager: add manager.blackhole_kill_osd()

This will suspend disk writes for a couple seconds and then kill the
daemon.  It helps us similute a hardware failure.

13 years agoAllow user to disable lock checking.
Tommi Virtanen [Tue, 31 Jan 2012 16:05:36 +0000 (08:05 -0800)]
Allow user to disable lock checking.

The new plana hardware isn't in the old sepia lock database,
and the machine pools are risky to merge as nothing in the
software guarantees allocation from just one pool. This allows
us to hand-allocate machines temporarily.

13 years agoAllow user to provide flavor to use.
Tommi Virtanen [Tue, 31 Jan 2012 15:59:26 +0000 (07:59 -0800)]
Allow user to provide flavor to use.

With this, you can use Ubuntu 11.10 machines with teuthology by saying::

  tasks:
  - ceph:
      flavor: oneiric
  ...