git.apps.os.sepia.ceph.com Git

]> git.apps.os.sepia.ceph.com Git - ceph.git/log

Dan Mick [Thu, 7 Jun 2012 20:20:02 +0000 (13:20 -0700)]

--summary: add total counts, also note free machines

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Dan Mick [Thu, 7 Jun 2012 03:29:28 +0000 (20:29 -0700)]

new variable lock hid lock() function

commit | commitdiff | tree

Dan Mick [Wed, 6 Jun 2012 22:15:47 +0000 (15:15 -0700)]

teuthology-lock: add --summary and --brief options

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Sage Weil [Wed, 6 Jun 2012 23:00:55 +0000 (16:00 -0700)]

pull s3-tests.git using git, not http

commit | commitdiff | tree

Sage Weil [Wed, 6 Jun 2012 20:32:56 +0000 (13:32 -0700)]

ceph: simplify 'cluster' mon log handling

It's not a special file in the mon_data directory anymore, but intead
something in archive that will get slurped up normally. Make sure we
grep for badness from the proper location.

commit | commitdiff | tree

Dan Mick [Wed, 6 Jun 2012 01:41:45 +0000 (18:41 -0700)]

Pass up unmodified exceptions from connection.connect()

This allows useful errors to be reported from things like
mismatched hostkeys, etc.

commit | commitdiff | tree

Dan Mick [Wed, 6 Jun 2012 01:33:36 +0000 (18:33 -0700)]

More shortnames fixes:
- Allow shortnames in teuthology-updatekeys as well
- Use list comprehensions instead of map()

commit | commitdiff | tree

Eleanor Cawthon [Tue, 5 Jun 2012 22:30:51 +0000 (15:30 -0700)]

task/: Added object map benchmarking test

Signed-off-by: Eleanor Cawthon <eleanor.cawthon@inktank.com>

commit | commitdiff | tree

Dan Mick [Tue, 5 Jun 2012 00:41:57 +0000 (17:41 -0700)]

Allow short names to teuthology-lock (e.g. "plana14")

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>

commit | commitdiff | tree

Sage Weil [Fri, 1 Jun 2012 04:39:33 +0000 (21:39 -0700)]

fix up dist var

This lets you override the default (now precise) in the ceph config yaml,
e.g.

- ceph:
dist: oneiric
branch: master

commit | commitdiff | tree

Dan Mick [Fri, 1 Jun 2012 00:09:20 +0000 (17:09 -0700)]

Change hardcoded oneiric to precise

Signed-off-by: Dan Mick <dan.mick@inktank.com>

commit | commitdiff | tree

Sam Zaydel [Thu, 24 May 2012 16:37:46 +0000 (09:37 -0700)]

Added python-dev to list of required packages.

commit | commitdiff | tree

Sage Weil [Mon, 21 May 2012 03:50:19 +0000 (20:50 -0700)]

rbd.xfstests: default to 250mb instead of 100mb

commit | commitdiff | tree

Sage Weil [Sun, 6 May 2012 04:22:40 +0000 (21:22 -0700)]

schedule_suite: fix 'slow request' whitelist

commit | commitdiff | tree

Sage Weil [Sun, 6 May 2012 04:22:30 +0000 (21:22 -0700)]

rbd_fsx: resize to byte boundaries (not object multiples)

commit | commitdiff | tree

Sage Weil [Sat, 5 May 2012 16:30:41 +0000 (09:30 -0700)]

ceph.newdream.net -> ceph.com

commit | commitdiff | tree

Sage Weil [Wed, 2 May 2012 05:26:03 +0000 (22:26 -0700)]

ignore syslog cron noise

commit | commitdiff | tree

Sage Weil [Mon, 30 Apr 2012 18:13:02 +0000 (11:13 -0700)]

osd_recovery: test no* osdmap flags

commit | commitdiff | tree

Josh Durgin [Wed, 25 Apr 2012 00:51:16 +0000 (17:51 -0700)]

nuke: refactor to run in parallel and add unlock option

nuke-on-error already did this, but now teuthology-nuke does it
too. Also outputs targets that couldn't be nuked at the end.

commit | commitdiff | tree

Josh Durgin [Wed, 25 Apr 2012 00:47:51 +0000 (17:47 -0700)]

parallel: obey iterator protocol

Once it raises StopIteration, it must continue to do so on subsequent calls to next().

commit | commitdiff | tree

Sage Weil [Mon, 23 Apr 2012 16:21:02 +0000 (09:21 -0700)]

nuke: ignore ntpdate errors

We keep seeing a race between ntpd startup and our stop + ntpdate + start
sequence. Ignore errors here.

commit | commitdiff | tree

Sage Weil [Sat, 21 Apr 2012 20:36:27 +0000 (13:36 -0700)]

filestore_idempotent: url has changed

commit | commitdiff | tree

Sage Weil [Fri, 20 Apr 2012 18:32:30 +0000 (11:32 -0700)]

hammer.sh: -a to archive each run

Signed-off-by: Sage Weil <sage@newdream.net>

commit | commitdiff | tree

Sage Weil [Thu, 19 Apr 2012 20:32:01 +0000 (13:32 -0700)]

rbd_fsx: show progress

The updated fsx takes this arg.

Signed-off-by: Sage Weil <sage@newdream.net>

commit | commitdiff | tree

Sage Weil [Thu, 19 Apr 2012 19:43:54 +0000 (12:43 -0700)]

fix misc checks that wait for N osds to be up

These all cut&pasted broken code, blah!

commit | commitdiff | tree

Sage Weil [Wed, 18 Apr 2012 18:21:10 +0000 (11:21 -0700)]

whitelist xfs_fsr syslog noise

Ignore lines like

2012-04-17T13:44:11-07:00 plana59 fsr[5454]: DEBUG: fsize=450560 blsz_dio=450560 d_min=512 d_max=2147483136 pgsz=4096

commit | commitdiff | tree

Josh Durgin [Thu, 12 Apr 2012 01:03:44 +0000 (18:03 -0700)]

Add task for running fsx on an rbd image.

commit | commitdiff | tree

Sage Weil [Sat, 14 Apr 2012 21:06:12 +0000 (14:06 -0700)]

filestore_idempotent: use new sequence-based tester

random seed, inject at 50-300.

commit | commitdiff | tree

Sage Weil [Sat, 14 Apr 2012 05:28:05 +0000 (22:28 -0700)]

rbd.py: add xfstests functionality

Add tasks for running xfstests over a pair of rbd volumes.  The main
one is called xfstests, and it sets up rbd volumes of specified size
and runs a set of likely-to-be-successful tests.  The other one is
used by the first, and is called run_xfstests.  This provides a
generic (device rather than rbd device oriented) interface to
xfstests, and should probably be made standalone and distinct from
rbd at some point.

Using multiple rbd devices required the rbd udev rule manipulation
to ignore errors, since it appears that each device caused the a
teardown attempt, which leads to failures the second time around.
There's probably a more robust solution, but this works for now.

Signed-off-by: Alex Elder <elder@dreamhost.com>

commit | commitdiff | tree

Josh Durgin [Tue, 10 Apr 2012 23:23:58 +0000 (16:23 -0700)]

ceph_manager: don't try to start greenlet twice

spawn already scheduled it. Trying to start it again hits an assert.

commit | commitdiff | tree

Sage Weil [Tue, 10 Apr 2012 20:41:16 +0000 (13:41 -0700)]

kernel: kludge around mysterious 0-byte .git/HEAD files

No idea where these are coming from, but they break nodes with behavior
like

ubuntu@plana08:~$ sudo install -d -m0755 /lib/firmware/updates && cd /lib/firmware/updates && sudo git init
Reinitialized existing Git repository in /lib/firmware/updates/.git/
ubuntu@plana08:/lib/firmware/updates$ sudo git --git-dir=/lib/firmware/updates/.git config --get remote.origin.url >/dev/null || sudo git --git-dir=/lib/firmware/updates/.git remote add origin git://ceph.newdream.net/git/linux-firmware.git
ubuntu@plana08:/lib/firmware/updates$ cd /lib/firmware/updates && sudo git pull origin master
fatal: Not a git repository (or any of the parent directories): .git

where the .git directory looks like

total 32
drwxr-xr-x 7 root root 4096 2012-04-10 12:52 .
drwxr-xr-x 3 root root 4096 2012-04-06 13:54 ..
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 branches
-rwxr--r-- 1 root root  236 2012-04-10 11:33 config
-rw-r--r-- 1 root root    0 2012-04-10 12:52 config.lock
-rw-r--r-- 1 root root    0 2012-04-06 13:54 description
-rw-r--r-- 1 root root    0 2012-04-06 13:54 FETCH_HEAD
-rw-r--r-- 1 root root    0 2012-04-06 13:54 HEAD
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 hooks
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 info
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 objects
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 refs

Hopefully someone can figure out what is causing this and revert this
later.

commit | commitdiff | tree

Sage Weil [Tue, 10 Apr 2012 16:17:24 +0000 (09:17 -0700)]

kernel: reset to remote firmware branch; don't pull

Pull might merge if upstream rebases. Just make our branch match the
remote one.

commit | commitdiff | tree

Sage Weil [Tue, 10 Apr 2012 16:12:01 +0000 (09:12 -0700)]

kernel: change git incantation for firmware pull

The 'git pull <uri>' seemed to consistently fail on some nodes. Can't be
sure this was really the problem with them all down now, but this is more
common, and works.

commit | commitdiff | tree

Sage Weil [Tue, 10 Apr 2012 15:59:47 +0000 (08:59 -0700)]

ls: another newline

commit | commitdiff | tree

Sage Weil [Tue, 10 Apr 2012 15:57:19 +0000 (08:57 -0700)]

ls: remote stray newline

commit | commitdiff | tree

Dan Mick [Mon, 9 Apr 2012 23:58:59 +0000 (16:58 -0700)]

Change to local mirror of linux-firmware repo to try to stop failures

commit | commitdiff | tree

Mark Nelson [Tue, 27 Mar 2012 22:25:41 +0000 (17:25 -0500)]

Kernel: Pull linux-firmware from git

Signed-off-by: Mark Nelson <nhm@clusterfaq.org>

commit | commitdiff | tree

Sage Weil [Wed, 4 Apr 2012 20:56:10 +0000 (13:56 -0700)]

cleanup-and-unlock.sh: helper to nuke and then unlock a set of nodes

I usually do something like

teuthology-lock --list-targets --owner scheduled_sage@metropolis > /tmp/b
./cleanup-and-unlock.sh /tmp/b scheduled_sage@metropolis

It's a huge headache when some of the nodes are down, though. A better
thing would be if nuke had an --unlock option, and would continue with the
nodes that didn't error out.

But, this is still useful as is.

commit | commitdiff | tree

Sage Weil [Wed, 4 Apr 2012 20:54:43 +0000 (13:54 -0700)]

schedule_suite.sh: helper to schedule a suite

There's a bunch of stuff hardcoded in here, similar to the nightly, but
it's a useful starting point.

commit | commitdiff | tree

Mark Nelson [Tue, 3 Apr 2012 21:53:17 +0000 (14:53 -0700)]

Added assertion to check that targets > roles

Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>

commit | commitdiff | tree

Sage Weil [Tue, 3 Apr 2012 22:56:36 +0000 (15:56 -0700)]

nuke: don't run umount when no xargs args

Gets rid of this noise:

INFO:teuthology.nuke:Unmount any osd data directories...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err:       umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err:       umount [-d] [-f] [-r] [-n] [-v] special | node...
...

commit | commitdiff | tree

Sage Weil [Fri, 30 Mar 2012 23:15:20 +0000 (16:15 -0700)]

ceph.conf: enable 'osd recover clone overlap'

to test the recovery cloning in qa. this was redone, but forgot to enable
it in qa.

commit | commitdiff | tree

Samuel Just [Fri, 30 Mar 2012 01:07:30 +0000 (18:07 -0700)]

make Thrasher not inherit from Greenlet

commit | commitdiff | tree

Samuel Just [Fri, 30 Mar 2012 01:07:10 +0000 (18:07 -0700)]

Add test for object source marked down

commit | commitdiff | tree

Samuel Just [Tue, 27 Mar 2012 22:05:11 +0000 (15:05 -0700)]

allow use of a separate journal block device

commit | commitdiff | tree

Josh Durgin [Mon, 26 Mar 2012 18:54:49 +0000 (11:54 -0700)]

rbd: fix typo in default config

pyflakes would have caught this if 'all' weren't a built-in function

commit | commitdiff | tree

Sage Weil [Sat, 24 Mar 2012 23:42:47 +0000 (16:42 -0700)]

add osd_recovery task to test divergent osd logs

commit | commitdiff | tree

Sage Weil [Sat, 24 Mar 2012 23:43:19 +0000 (16:43 -0700)]

backfill: use 'rbd' pool instead of 'data'

(data has a replay interval, which makes writes take longer to resume
after repeering)

commit | commitdiff | tree

Sage Weil [Sat, 24 Mar 2012 23:05:11 +0000 (16:05 -0700)]

rename backfill -> osd_backfill

commit | commitdiff | tree

Sage Weil [Sat, 24 Mar 2012 22:35:43 +0000 (15:35 -0700)]

put filestore xattr option in [global]

...for test_filestore_idempotent's benefit

commit | commitdiff | tree

Josh Durgin [Wed, 21 Mar 2012 19:00:55 +0000 (12:00 -0700)]

suite: add missing print statement

commit | commitdiff | tree

Josh Durgin [Wed, 21 Mar 2012 18:58:17 +0000 (11:58 -0700)]

suite: fix print statement when summary doesn't exist

commit | commitdiff | tree

Samuel Just [Wed, 21 Mar 2012 01:56:20 +0000 (18:56 -0700)]

Add watch op to rados.py

Signed-off-by: Samuel Just <sam.just@dreamhost.com>

commit | commitdiff | tree

Josh Durgin [Tue, 20 Mar 2012 14:48:45 +0000 (07:48 -0700)]

suite: failed runs might not have durations

This was one cause of emails not being sent - stale /tmp/cephtest dirs
fail without recording a duration.

commit | commitdiff | tree

Josh Durgin [Mon, 19 Mar 2012 21:16:14 +0000 (14:16 -0700)]

suite, coverage: use absolute dirs for isdir checks

This fixes the results to wait for all jobs to complete again.

commit | commitdiff | tree

Josh Durgin [Mon, 19 Mar 2012 18:57:02 +0000 (11:57 -0700)]

filestore_idempotent: get coverage and coredumps

commit | commitdiff | tree

Josh Durgin [Mon, 19 Mar 2012 18:31:33 +0000 (11:31 -0700)]

suite: more results logging

commit | commitdiff | tree

Sage Weil [Sun, 18 Mar 2012 18:56:18 +0000 (11:56 -0700)]

ceph.conf: no comment

commit | commitdiff | tree

Sage Weil [Sun, 18 Mar 2012 18:06:05 +0000 (11:06 -0700)]

ceph.conf: set 'filestore xattr use omap = true'

commit | commitdiff | tree

Sage Weil [Sun, 18 Mar 2012 17:50:17 +0000 (10:50 -0700)]

fix teuthology-ls isdir check

commit | commitdiff | tree

Sage Weil [Wed, 14 Mar 2012 20:20:54 +0000 (13:20 -0700)]

run valgrind with cwd set to /tmp/cephtest/archive/coredump

This lets us capture the vgcore.* files, which always go to valgrind's
cwd.

Fixes: #1953

commit | commitdiff | tree

Josh Durgin [Fri, 16 Mar 2012 18:40:17 +0000 (11:40 -0700)]

suite: log results and coverage generation

Need to figure out where and when results emails are failing.

commit | commitdiff | tree

Josh Durgin [Thu, 15 Mar 2012 23:21:33 +0000 (16:21 -0700)]

results: make sure email is sent before anything else fails

commit | commitdiff | tree

Mark Nelson [Wed, 14 Mar 2012 20:32:23 +0000 (15:32 -0500)]

Merge branch 'master' of github.com:ceph/teuthology

commit | commitdiff | tree

Sage Weil [Tue, 13 Mar 2012 17:09:18 +0000 (10:09 -0700)]

gitbuilder: put flavor last

in case we refine the field later

commit | commitdiff | tree

Sage Weil [Tue, 13 Mar 2012 17:02:26 +0000 (10:02 -0700)]

Pull from new gitbuilder.ceph.com locations.

Simplifies the flavor stuff into a tuple of

<package,type,flavor,dist,arch>

where package is ceph, kenrel, etc.
type is tarball, deb
flavor is basic, gcov, notcmalloc
arch is x86_64, i686 (uname -m)
dist is oneiric, etc. (lsb_release -s -c)

commit | commitdiff | tree

Mark Nelson [Mon, 12 Mar 2012 20:13:36 +0000 (15:13 -0500)]

Made the example better with multiple roles.

commit | commitdiff | tree

Mark Nelson [Mon, 12 Mar 2012 19:33:10 +0000 (14:33 -0500)]

Added some example yaml files and an example parallel execution task.

commit | commitdiff | tree

Sage Weil [Sun, 11 Mar 2012 03:15:21 +0000 (19:15 -0800)]

autotest: pull from github.com/ceph/autotest

commit | commitdiff | tree

Sage Weil [Sat, 10 Mar 2012 23:34:19 +0000 (15:34 -0800)]

workunit: include python2.7 path too

commit | commitdiff | tree

Samuel Just [Fri, 17 Feb 2012 00:10:45 +0000 (16:10 -0800)]

rados.py: include setattr and rmattr

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>

commit | commitdiff | tree

Mark Nelson [Wed, 7 Mar 2012 16:34:55 +0000 (08:34 -0800)]

lock: Improved logging when there aren't enough nodes available to lock-many.

commit | commitdiff | tree

Mark Nelson [Wed, 7 Mar 2012 17:02:39 +0000 (09:02 -0800)]

lock: Added a --locked flag to teuthology-lock.

Can be used to restrict searches based on lock status, e.g.
'teuthology-lock --list -a --locked false --status up' shows available nodes.

commit | commitdiff | tree

Sage Weil [Tue, 6 Mar 2012 17:34:38 +0000 (09:34 -0800)]

nuke: unmount osd data directories

This helps us avoid reboot to clean up osd data directories that are left
mounted.

commit | commitdiff | tree

Josh Durgin [Mon, 5 Mar 2012 18:28:35 +0000 (10:28 -0800)]

Use non-zero exit status if any tests failed

Fixes: #1989

commit | commitdiff | tree

Sage Weil [Fri, 2 Mar 2012 18:55:19 +0000 (10:55 -0800)]

github.com/NewDreamNetwork -> github.com/ceph

commit | commitdiff | tree

Josh Durgin [Wed, 29 Feb 2012 23:47:17 +0000 (15:47 -0800)]

dump_stuck: note required ceph configuration

commit | commitdiff | tree

Josh Durgin [Tue, 28 Feb 2012 21:55:46 +0000 (13:55 -0800)]

dump_stuck: verify that 'ceph health' mentions the right number of inactive/unclean/stale pgs

commit | commitdiff | tree

Sage Weil [Tue, 28 Feb 2012 17:50:29 +0000 (09:50 -0800)]

peer: ignore +scrubbing portion of pg state

It can cause the mon state and osd states to not match.

commit | commitdiff | tree

Sage Weil [Sun, 26 Feb 2012 05:05:00 +0000 (21:05 -0800)]

peer: wait for peering to complete, or block

We need to wait for peering to either complete, or block because it is
waiting for another PG. _Then_ look at all the PG states and compare the
mon values with what we get from qeurying the OSDs directly.

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 23:01:34 +0000 (15:01 -0800)]

peer: remove unused variable

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 22:55:49 +0000 (14:55 -0800)]

misc: always return a usable result from get_valgrind_args

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 22:55:23 +0000 (14:55 -0800)]

rgw: simplify valgrind args

commit | commitdiff | tree

Sage Weil [Fri, 24 Feb 2012 23:05:17 +0000 (15:05 -0800)]

add peer task

Force a pg to get stuck in 'down' state, verify we can query the peering
state, then start the OSD so it can recover.

commit | commitdiff | tree

Sage Weil [Fri, 24 Feb 2012 19:11:59 +0000 (11:11 -0800)]

lost_unfound: list missing/unfound for each pg and verify the unfound counts

This also tests the pg list_missing functionality.

commit | commitdiff | tree

Sage Weil [Fri, 24 Feb 2012 17:22:03 +0000 (09:22 -0800)]

ceph_manager: list_pg_missing

List missing objects for the given pgid.

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 20:04:58 +0000 (12:04 -0800)]

Whitespace and unnecessary formatting fixes

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 19:21:04 +0000 (11:21 -0800)]

ceph, ceph-fuse: simplify valgrind argument additions

commit | commitdiff | tree

Sage Weil [Wed, 22 Feb 2012 17:18:17 +0000 (09:18 -0800)]

refactor all valgrind users to use a get_valgrind_args() helper

This avoids much annoying, duplicated code.

commit | commitdiff | tree

Sage Weil [Wed, 22 Feb 2012 01:06:50 +0000 (17:06 -0800)]

ceph: always create valgrind logs dir

Other tasks use it too. It's more annoying to conditionally create it.

commit | commitdiff | tree

Sage Weil [Wed, 22 Feb 2012 00:10:37 +0000 (16:10 -0800)]

ceph: always try to process valgrind logs

Check for errors in valgrind logs even if there is no valgrind option
the ceph task config stanza. Other tasks can run via valgrind (ceph-fuse,
rgw). If the logs aren't there, this is harmless.

commit | commitdiff | tree

Sage Weil [Wed, 22 Feb 2012 00:08:21 +0000 (16:08 -0800)]

rgw: add valgrind support

tasks:
- ceph:
- rgw:
client.a:
valgrind: [--tool=memcheck]

commit | commitdiff | tree

Sage Weil [Tue, 21 Feb 2012 23:47:32 +0000 (15:47 -0800)]

rgw: accept dict

e.g.,

tasks:
...
- rgw:
client.0:
client.1:

commit | commitdiff | tree

Sage Weil [Fri, 24 Feb 2012 04:07:24 +0000 (20:07 -0800)]

lost_unfound: new mark_unfound_lost syntax

commit | commitdiff | tree

Josh Durgin [Fri, 24 Feb 2012 01:07:26 +0000 (17:07 -0800)]

dump_stuck: flush stats before waiting for recovery/clean