]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Wed, 18 Jul 2012 18:04:30 +0000 (11:04 -0700)]
nuke: log what pid we are killing when we kill it
Sage Weil [Tue, 17 Jul 2012 17:00:59 +0000 (10:00 -0700)]
ceph: archive mon data to a .tgz
Saves bandwidth, time, and space.
Sage Weil [Mon, 16 Jul 2012 17:53:25 +0000 (10:53 -0700)]
set machine description to ctx.archive when auto-locking machines for a run
Sage Weil [Sat, 14 Jul 2012 20:02:04 +0000 (13:02 -0700)]
schedule/suite: schedule job, suite N times
Sage Weil [Fri, 13 Jul 2012 20:57:22 +0000 (13:57 -0700)]
kernel: fix kernel installation when kdb: is specified
This normalize check would only trigger if a *single* key was specified.
Change it so that it triggers as long as all keys are in the list of valid
keys. This lets us specify both kdb: true and a sha1/branch/tag.
Phew!
Sage Weil [Fri, 13 Jul 2012 19:36:56 +0000 (12:36 -0700)]
schedule_suite.sh: use workunits from ceph commit
Use the workunits from the same ceph branch we are testing.
Sage Weil [Fri, 13 Jul 2012 18:30:21 +0000 (11:30 -0700)]
ceph: add default btrfs mkfs options
Sage Weil [Fri, 13 Jul 2012 18:30:07 +0000 (11:30 -0700)]
ceph: cleanup/simplify mount/mkfs options
Sage Weil [Fri, 13 Jul 2012 18:13:31 +0000 (11:13 -0700)]
workunit: allow overrides
Pull top-level overrides into our config. This lets you do:
overrides:
workunit:
branch: foo
tasks:
...
- workunit:
clients:
all:
- foo
...
Sage Weil [Fri, 13 Jul 2012 18:12:31 +0000 (11:12 -0700)]
workunit: allow branch/sha1/tag to be specified
Pull the workunit(s) from the branch/tag/sha1 specified in the config.
Josh Durgin [Fri, 13 Jul 2012 17:00:50 +0000 (10:00 -0700)]
workunit: pass branch/sha1 to test
Some tests download things from the ceph repo. Let them know which
version to use through the CEPH_REF environment variable.
tamil [Fri, 13 Jul 2012 01:02:29 +0000 (18:02 -0700)]
Added functionality to get mkfs and mount options for file systems
from the config file,if present. Otherwise, default options are used.
The default value for inode size is changed to 2k when creating xfs.
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
tamil [Thu, 12 Jul 2012 23:36:40 +0000 (16:36 -0700)]
fixed typo
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
Sage Weil [Thu, 12 Jul 2012 15:33:29 +0000 (08:33 -0700)]
radosgw-admin: use --bucket instead of old --bucket-id
The --bucket-id support was removed.
Sage Weil [Wed, 11 Jul 2012 21:23:51 +0000 (14:23 -0700)]
nuke: honor 'check-locks: ...' field in targets file
If you are nuking a yaml file with check-locks: false, don't check locks.
Sage Weil [Wed, 11 Jul 2012 21:14:46 +0000 (14:14 -0700)]
internal: archive mon data dirs
These can be useful for debugging, and are usually pretty small.
Fixes: #2714
Sage Weil [Wed, 11 Jul 2012 16:22:50 +0000 (09:22 -0700)]
internal: move pulling archive w/ tar to helper
Sage Weil [Sat, 7 Jul 2012 03:15:55 +0000 (20:15 -0700)]
use sudo to kill teuthology proc
Sage Weil [Thu, 5 Jul 2012 20:43:19 +0000 (13:43 -0700)]
run: make -a short for --archive
Sage Weil [Wed, 4 Jul 2012 21:47:05 +0000 (14:47 -0700)]
watch-suite: stupid script to watch teuth run progress
Sage Weil [Tue, 3 Jul 2012 23:22:38 +0000 (16:22 -0700)]
nuke: be more careful about kill; simplify
If the archive dir is specified, make sure we are killing the right
process.
Also drop the kill_process helper; it's simple enough to open-code.
Sage Weil [Tue, 3 Jul 2012 19:53:08 +0000 (12:53 -0700)]
nuke: nuke based on archive path
Use path/config.yaml for targets, path/pid for pid to kill, and
path/owner for job owner.
Sage Weil [Wed, 4 Jul 2012 21:29:55 +0000 (14:29 -0700)]
valgrind: add strptime suppressions
Precise's strptime triggers valgrind false positives.
Use ship_utilities to push the valgrind.supp file over, which is a bit
slippy.
tamil [Tue, 3 Jul 2012 23:04:12 +0000 (16:04 -0700)]
Added a debug message
The debug message is to print the string that should be JSON.
This is to track a nightly run failure.
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
Sage Weil [Tue, 3 Jul 2012 19:49:39 +0000 (12:49 -0700)]
schedule_suite: use the sha1, not branch name
Keep the entire suite run on the same commit. We were resolving the sha1,
but not using it.
tamil [Tue, 3 Jul 2012 19:22:26 +0000 (12:22 -0700)]
nuke - optionally kill the process hung
Added a function kill_process to kill the process hung in the nightly runs.
It takes in pid as an optional argument.
Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
Sage Weil [Mon, 2 Jul 2012 15:44:38 +0000 (08:44 -0700)]
ceph: fix valgrind error check
grep all the logs, not the dir... doh!
Mark Nelson [Fri, 29 Jun 2012 19:36:58 +0000 (14:36 -0500)]
Merge branch 'master' of github.com:ceph/teuthology
Mark Nelson [Fri, 29 Jun 2012 19:36:30 +0000 (14:36 -0500)]
Now using daemon-helper
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
Sage Weil [Thu, 28 Jun 2012 18:14:15 +0000 (11:14 -0700)]
add cleanup-user.sh script
big hammer, use with care
Sage Weil [Tue, 26 Jun 2012 16:26:03 +0000 (09:26 -0700)]
schedule_suite.sh: drop -x
Mark Nelson [Thu, 28 Jun 2012 16:47:16 +0000 (11:47 -0500)]
cleaned up commented code
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
Mark Nelson [Thu, 28 Jun 2012 00:38:12 +0000 (19:38 -0500)]
Added blktrace task
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
Sage Weil [Mon, 25 Jun 2012 22:20:19 +0000 (15:20 -0700)]
ignore DEADLOCK line inside lockdep splat
Josh Durgin [Fri, 22 Jun 2012 02:23:42 +0000 (19:23 -0700)]
Add script to create a vm image with extra packages
Josh Durgin [Tue, 19 Jun 2012 21:13:39 +0000 (14:13 -0700)]
Add a task to run a test against rbd inside of qemu.
For now this task does not setup networking for the vm,
and simply runs an executable downloaded from a specified url.
It does support adding multiple rbd devices, but making use
of that with e.g. xfstests requires a bit more work.
Dan Mick [Thu, 21 Jun 2012 21:32:51 +0000 (14:32 -0700)]
Check for machine args based on local, not ctx.machines
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Thu, 21 Jun 2012 20:20:18 +0000 (13:20 -0700)]
whitelist current lockdep warnings in syslog
These are causing too much noise in the qa runs to leave, and #2617 is
sufficiently non-trivial to do this in the interim. Putting a better
mechanism in place will include removing these coarse whitelist items and
replacing with something that specifically matches the failures we want
to ignore.
Sage Weil [Wed, 20 Jun 2012 18:35:43 +0000 (11:35 -0700)]
record owner at start of run
So that we can clean up easily even when we don't finish and there is no
summary.yaml.
Josh Durgin [Wed, 20 Jun 2012 17:13:48 +0000 (10:13 -0700)]
teuthology-ls: tolerate non-existent 'success' key in summary file
Sage Weil [Wed, 20 Jun 2012 00:29:32 +0000 (17:29 -0700)]
schedule_suite: enable kdb
Among other things, you can attach to the console after the fact and type
'dmesg' to see wtf happened.
Sage Weil [Wed, 20 Jun 2012 00:24:01 +0000 (17:24 -0700)]
kernel: enable/disable kdb
This hard-codes ttyS1, which is what we use on sepia.
Yehuda Sadeh [Tue, 19 Jun 2012 21:30:00 +0000 (14:30 -0700)]
add usage log tests to radosgw-admin tasks
tests 'usage show' and 'usage trim'
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Sun, 17 Jun 2012 19:16:25 +0000 (12:16 -0700)]
sync clock at start of every run
Sage Weil [Sun, 17 Jun 2012 03:14:35 +0000 (20:14 -0700)]
tolerate 250ms clock drift
Sage Weil [Sat, 16 Jun 2012 22:05:46 +0000 (15:05 -0700)]
include suite in archive dir
Sage Weil [Sat, 16 Jun 2012 20:59:46 +0000 (13:59 -0700)]
whitelist 'slow request' in qa runs
Sage Weil [Thu, 14 Jun 2012 21:03:39 +0000 (14:03 -0700)]
radosgw-admin: fix for non-numeric bucket ids
Sage Weil [Thu, 14 Jun 2012 21:03:29 +0000 (14:03 -0700)]
radosgw-admin: test max buckets limit
Sage Weil [Thu, 14 Jun 2012 21:02:40 +0000 (14:02 -0700)]
radosgw-admin: remove buckets before user
Otherwise user delete will fail.
Sage Weil [Thu, 14 Jun 2012 21:00:57 +0000 (14:00 -0700)]
radosgw-admin: fix swift subuser/key tests
Need to do 'subuser (add|rm)', not 'key (add|rm)'.
Sage Weil [Thu, 14 Jun 2012 20:23:24 +0000 (13:23 -0700)]
schedule_suite.sh: add flavors, check/fix sha1s, optional templates
This should be everything we need to use this for the nightlies, with the
exception of updating the git trees, which can happen explicitly in the
crontab.
Josh Durgin [Mon, 11 Jun 2012 19:31:22 +0000 (12:31 -0700)]
workunit: grab 'all' config from the right variable
Josh Durgin [Mon, 11 Jun 2012 01:43:35 +0000 (18:43 -0700)]
workunit: allow setting environment variables
This is useful for e.g. running the same tests against rbd in new and
old formats.
Dan Mick [Thu, 7 Jun 2012 20:20:02 +0000 (13:20 -0700)]
--summary: add total counts, also note free machines
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Dan Mick [Thu, 7 Jun 2012 03:29:28 +0000 (20:29 -0700)]
new variable lock hid lock() function
Dan Mick [Wed, 6 Jun 2012 22:15:47 +0000 (15:15 -0700)]
teuthology-lock: add --summary and --brief options
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Sage Weil [Wed, 6 Jun 2012 23:00:55 +0000 (16:00 -0700)]
pull s3-tests.git using git, not http
Sage Weil [Wed, 6 Jun 2012 20:32:56 +0000 (13:32 -0700)]
ceph: simplify 'cluster' mon log handling
It's not a special file in the mon_data directory anymore, but intead
something in archive that will get slurped up normally. Make sure we
grep for badness from the proper location.
Dan Mick [Wed, 6 Jun 2012 01:41:45 +0000 (18:41 -0700)]
Pass up unmodified exceptions from connection.connect()
This allows useful errors to be reported from things like
mismatched hostkeys, etc.
Dan Mick [Wed, 6 Jun 2012 01:33:36 +0000 (18:33 -0700)]
More shortnames fixes:
- Allow shortnames in teuthology-updatekeys as well
- Use list comprehensions instead of map()
Eleanor Cawthon [Tue, 5 Jun 2012 22:30:51 +0000 (15:30 -0700)]
task/: Added object map benchmarking test
Signed-off-by: Eleanor Cawthon <eleanor.cawthon@inktank.com>
Dan Mick [Tue, 5 Jun 2012 00:41:57 +0000 (17:41 -0700)]
Allow short names to teuthology-lock (e.g. "plana14")
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
Sage Weil [Fri, 1 Jun 2012 04:39:33 +0000 (21:39 -0700)]
fix up dist var
This lets you override the default (now precise) in the ceph config yaml,
e.g.
- ceph:
dist: oneiric
branch: master
Dan Mick [Fri, 1 Jun 2012 00:09:20 +0000 (17:09 -0700)]
Change hardcoded oneiric to precise
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sam Zaydel [Thu, 24 May 2012 16:37:46 +0000 (09:37 -0700)]
Added python-dev to list of required packages.
Sage Weil [Mon, 21 May 2012 03:50:19 +0000 (20:50 -0700)]
rbd.xfstests: default to 250mb instead of 100mb
Sage Weil [Sun, 6 May 2012 04:22:40 +0000 (21:22 -0700)]
schedule_suite: fix 'slow request' whitelist
Sage Weil [Sun, 6 May 2012 04:22:30 +0000 (21:22 -0700)]
rbd_fsx: resize to byte boundaries (not object multiples)
Sage Weil [Sat, 5 May 2012 16:30:41 +0000 (09:30 -0700)]
ceph.newdream.net -> ceph.com
Sage Weil [Wed, 2 May 2012 05:26:03 +0000 (22:26 -0700)]
ignore syslog cron noise
Sage Weil [Mon, 30 Apr 2012 18:13:02 +0000 (11:13 -0700)]
osd_recovery: test no* osdmap flags
Josh Durgin [Wed, 25 Apr 2012 00:51:16 +0000 (17:51 -0700)]
nuke: refactor to run in parallel and add unlock option
nuke-on-error already did this, but now teuthology-nuke does it
too. Also outputs targets that couldn't be nuked at the end.
Josh Durgin [Wed, 25 Apr 2012 00:47:51 +0000 (17:47 -0700)]
parallel: obey iterator protocol
Once it raises StopIteration, it must continue to do so on subsequent calls to next().
Sage Weil [Mon, 23 Apr 2012 16:21:02 +0000 (09:21 -0700)]
nuke: ignore ntpdate errors
We keep seeing a race between ntpd startup and our stop + ntpdate + start
sequence. Ignore errors here.
Sage Weil [Sat, 21 Apr 2012 20:36:27 +0000 (13:36 -0700)]
filestore_idempotent: url has changed
Sage Weil [Fri, 20 Apr 2012 18:32:30 +0000 (11:32 -0700)]
hammer.sh: -a to archive each run
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 19 Apr 2012 20:32:01 +0000 (13:32 -0700)]
rbd_fsx: show progress
The updated fsx takes this arg.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 19 Apr 2012 19:43:54 +0000 (12:43 -0700)]
fix misc checks that wait for N osds to be up
These all cut&pasted broken code, blah!
Sage Weil [Wed, 18 Apr 2012 18:21:10 +0000 (11:21 -0700)]
whitelist xfs_fsr syslog noise
Ignore lines like
2012-04-17T13:44:11-07:00 plana59 fsr[5454]: DEBUG: fsize=450560 blsz_dio=450560 d_min=512 d_max=
2147483136 pgsz=4096
Josh Durgin [Thu, 12 Apr 2012 01:03:44 +0000 (18:03 -0700)]
Add task for running fsx on an rbd image.
Sage Weil [Sat, 14 Apr 2012 21:06:12 +0000 (14:06 -0700)]
filestore_idempotent: use new sequence-based tester
random seed, inject at 50-300.
Sage Weil [Sat, 14 Apr 2012 05:28:05 +0000 (22:28 -0700)]
rbd.py: add xfstests functionality
Add tasks for running xfstests over a pair of rbd volumes. The main
one is called xfstests, and it sets up rbd volumes of specified size
and runs a set of likely-to-be-successful tests. The other one is
used by the first, and is called run_xfstests. This provides a
generic (device rather than rbd device oriented) interface to
xfstests, and should probably be made standalone and distinct from
rbd at some point.
Using multiple rbd devices required the rbd udev rule manipulation
to ignore errors, since it appears that each device caused the a
teardown attempt, which leads to failures the second time around.
There's probably a more robust solution, but this works for now.
Signed-off-by: Alex Elder <elder@dreamhost.com>
Josh Durgin [Tue, 10 Apr 2012 23:23:58 +0000 (16:23 -0700)]
ceph_manager: don't try to start greenlet twice
spawn already scheduled it. Trying to start it again hits an assert.
Sage Weil [Tue, 10 Apr 2012 20:41:16 +0000 (13:41 -0700)]
kernel: kludge around mysterious 0-byte .git/HEAD files
No idea where these are coming from, but they break nodes with behavior
like
ubuntu@plana08:~$ sudo install -d -m0755 /lib/firmware/updates && cd /lib/firmware/updates && sudo git init
Reinitialized existing Git repository in /lib/firmware/updates/.git/
ubuntu@plana08:/lib/firmware/updates$ sudo git --git-dir=/lib/firmware/updates/.git config --get remote.origin.url >/dev/null || sudo git --git-dir=/lib/firmware/updates/.git remote add origin git://ceph.newdream.net/git/linux-firmware.git
ubuntu@plana08:/lib/firmware/updates$ cd /lib/firmware/updates && sudo git pull origin master
fatal: Not a git repository (or any of the parent directories): .git
where the .git directory looks like
total 32
drwxr-xr-x 7 root root 4096 2012-04-10 12:52 .
drwxr-xr-x 3 root root 4096 2012-04-06 13:54 ..
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 branches
-rwxr--r-- 1 root root 236 2012-04-10 11:33 config
-rw-r--r-- 1 root root 0 2012-04-10 12:52 config.lock
-rw-r--r-- 1 root root 0 2012-04-06 13:54 description
-rw-r--r-- 1 root root 0 2012-04-06 13:54 FETCH_HEAD
-rw-r--r-- 1 root root 0 2012-04-06 13:54 HEAD
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 hooks
drwxr-xr-x 2 root root 4096 2012-04-06 13:54 info
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 objects
drwxr-xr-x 4 root root 4096 2012-04-06 13:54 refs
Hopefully someone can figure out what is causing this and revert this
later.
Sage Weil [Tue, 10 Apr 2012 16:17:24 +0000 (09:17 -0700)]
kernel: reset to remote firmware branch; don't pull
Pull might merge if upstream rebases. Just make our branch match the
remote one.
Sage Weil [Tue, 10 Apr 2012 16:12:01 +0000 (09:12 -0700)]
kernel: change git incantation for firmware pull
The 'git pull <uri>' seemed to consistently fail on some nodes. Can't be
sure this was really the problem with them all down now, but this is more
common, and works.
Sage Weil [Tue, 10 Apr 2012 15:59:47 +0000 (08:59 -0700)]
ls: another newline
Sage Weil [Tue, 10 Apr 2012 15:57:19 +0000 (08:57 -0700)]
ls: remote stray newline
Dan Mick [Mon, 9 Apr 2012 23:58:59 +0000 (16:58 -0700)]
Change to local mirror of linux-firmware repo to try to stop failures
Mark Nelson [Tue, 27 Mar 2012 22:25:41 +0000 (17:25 -0500)]
Kernel: Pull linux-firmware from git
Signed-off-by: Mark Nelson <nhm@clusterfaq.org>
Sage Weil [Wed, 4 Apr 2012 20:56:10 +0000 (13:56 -0700)]
cleanup-and-unlock.sh: helper to nuke and then unlock a set of nodes
I usually do something like
teuthology-lock --list-targets --owner scheduled_sage@metropolis > /tmp/b
./cleanup-and-unlock.sh /tmp/b scheduled_sage@metropolis
It's a huge headache when some of the nodes are down, though. A better
thing would be if nuke had an --unlock option, and would continue with the
nodes that didn't error out.
But, this is still useful as is.
Sage Weil [Wed, 4 Apr 2012 20:54:43 +0000 (13:54 -0700)]
schedule_suite.sh: helper to schedule a suite
There's a bunch of stuff hardcoded in here, similar to the nightly, but
it's a useful starting point.
Mark Nelson [Tue, 3 Apr 2012 21:53:17 +0000 (14:53 -0700)]
Added assertion to check that targets > roles
Signed-off-by: Mark Nelson <mark.nelson@dreamhost.com>
Sage Weil [Tue, 3 Apr 2012 22:56:36 +0000 (15:56 -0700)]
nuke: don't run umount when no xargs args
Gets rid of this noise:
INFO:teuthology.nuke:Unmount any osd data directories...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err: umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err: umount [-d] [-f] [-r] [-n] [-v] special | node...
INFO:teuthology.orchestra.run.err:Usage: umount -h | -V
INFO:teuthology.orchestra.run.err: umount -a [-d] [-f] [-r] [-n] [-v] [-t vfstypes] [-O opts]
INFO:teuthology.orchestra.run.err: umount [-d] [-f] [-r] [-n] [-v] special | node...
...
Sage Weil [Fri, 30 Mar 2012 23:15:20 +0000 (16:15 -0700)]
ceph.conf: enable 'osd recover clone overlap'
to test the recovery cloning in qa. this was redone, but forgot to enable
it in qa.
Samuel Just [Fri, 30 Mar 2012 01:07:30 +0000 (18:07 -0700)]
make Thrasher not inherit from Greenlet
Samuel Just [Fri, 30 Mar 2012 01:07:10 +0000 (18:07 -0700)]
Add test for object source marked down
Samuel Just [Tue, 27 Mar 2012 22:05:11 +0000 (15:05 -0700)]
allow use of a separate journal block device
Josh Durgin [Mon, 26 Mar 2012 18:54:49 +0000 (11:54 -0700)]
rbd: fix typo in default config
pyflakes would have caught this if 'all' weren't a built-in function