]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoAdd scripts to analyze coverage for a single teuthology run.
Josh Durgin [Tue, 14 Jun 2011 18:57:29 +0000 (11:57 -0700)]
Add scripts to analyze coverage for a single teuthology run.

14 years agothrasher: improve documentation a little
Greg Farnum [Thu, 25 Aug 2011 22:27:30 +0000 (15:27 -0700)]
thrasher: improve documentation a little

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agothrasher: add option to mark OSDs down instead of out.
Greg Farnum [Thu, 25 Aug 2011 22:19:30 +0000 (15:19 -0700)]
thrasher: add option to mark OSDs down instead of out.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agothrasher: allow a config to set values
Greg Farnum [Thu, 25 Aug 2011 22:18:42 +0000 (15:18 -0700)]
thrasher: allow a config to set values

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agothrasher: remove redundant wait_till_clean()
Greg Farnum [Thu, 25 Aug 2011 21:38:34 +0000 (14:38 -0700)]
thrasher: remove redundant wait_till_clean()

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agocoverage: create dir conditionally
Greg Farnum [Wed, 24 Aug 2011 23:48:14 +0000 (16:48 -0700)]
coverage: create dir conditionally

We don't need to create the dir if we aren't using coverage.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agolockfile: add a lockfile task
Greg Farnum [Wed, 17 Aug 2011 21:44:39 +0000 (14:44 -0700)]
lockfile: add a lockfile task

This allows pretty highly configurable testing of
fcntl locking via a teuthology task.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agolock: --list-targets: list locks and dump result in targets: yaml format.
Sage Weil [Wed, 24 Aug 2011 17:03:43 +0000 (10:03 -0700)]
lock: --list-targets: list locks and dump result in targets: yaml format.

14 years agocheck ceph cluster log for badness (ERR, WRN, SEC)
Sage Weil [Wed, 24 Aug 2011 04:00:26 +0000 (21:00 -0700)]
check ceph cluster log for badness (ERR, WRN, SEC)

14 years agoceph: copy cluster log file to archive/ceph.log
Sage Weil [Tue, 23 Aug 2011 05:04:57 +0000 (22:04 -0700)]
ceph: copy cluster log file to archive/ceph.log

14 years agoworkunits: set CEPH_CONF environment
Sage Weil [Mon, 22 Aug 2011 00:26:15 +0000 (17:26 -0700)]
workunits: set CEPH_CONF environment

This allows any ceph util we run (including the rados-api tests) find
the config and keyrings they need.

14 years agorbd: make default image 10G instead of 1G
Sage Weil [Sun, 21 Aug 2011 22:14:02 +0000 (15:14 -0700)]
rbd: make default image 10G instead of 1G

14 years agosuite: support a suite consisting of multiple collections
Sage Weil [Wed, 10 Aug 2011 20:34:38 +0000 (13:34 -0700)]
suite: support a suite consisting of multiple collections

suite = many collections, and maybe some shared files
collection = a collection of facets
facet = a config fragment

14 years agovalgrind: Document!
Greg Farnum [Wed, 17 Aug 2011 17:35:37 +0000 (10:35 -0700)]
valgrind: Document!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMerge branch 'wip-valgrind'
Greg Farnum [Wed, 17 Aug 2011 17:32:57 +0000 (10:32 -0700)]
Merge branch 'wip-valgrind'

14 years agoinclude log in valgrind log file names
Greg Farnum [Wed, 17 Aug 2011 17:06:58 +0000 (10:06 -0700)]
include log in valgrind log file names

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoceph task: split up arguments a little more
Greg Farnum [Wed, 17 Aug 2011 17:05:13 +0000 (10:05 -0700)]
ceph task: split up arguments a little more

This allows selective daemon kill signal changes. With valgrind
daemons we want term instead of kill, for instance.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agovalgrind: move valgrind logs to log dir
Greg Farnum [Wed, 17 Aug 2011 17:04:31 +0000 (10:04 -0700)]
valgrind: move valgrind logs to log dir

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoceph: split up daemon-running arguments and insert valgrind ones
Greg Farnum [Mon, 15 Aug 2011 22:35:42 +0000 (15:35 -0700)]
ceph: split up daemon-running arguments and insert valgrind ones

This setup should let us insert other kinds of things too, if we
need them.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoceph: Set up valgrind as a flavor, and create a dir for logging.
Greg Farnum [Mon, 15 Aug 2011 22:32:23 +0000 (15:32 -0700)]
ceph: Set up valgrind as a flavor, and create a dir for logging.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoceph task: pass the full config to the daemon startup subs
Greg Farnum [Mon, 15 Aug 2011 22:31:18 +0000 (15:31 -0700)]
ceph task: pass the full config to the daemon startup subs

So far as I can tell there is no reason to reduce them to
the coverage config, and I want the full config for my
soon-to-exist valgrind options.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoAdd assert to catch simple typos in roles list.
Tommi Virtanen [Mon, 15 Aug 2011 16:36:06 +0000 (09:36 -0700)]
Add assert to catch simple typos in roles list.

Input of "roles:\n- [mds,1]" used to make teuthology crash
in a non-obviou way.

14 years agoMerge branch 'wip-nuke'
Greg Farnum [Wed, 10 Aug 2011 23:16:11 +0000 (16:16 -0700)]
Merge branch 'wip-nuke'

Conflicts:
teuthology/task/kernel.py

14 years agomanypools: remove commented-out code
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code

This accidentally got left in from my development.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: split the big main function
Greg Farnum [Wed, 10 Aug 2011 23:06:45 +0000 (16:06 -0700)]
teuthology-nuke: split the big main function

It was getting a bit big, but now all the functions fit on
one screen each.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: move it into its own file.
Greg Farnum [Wed, 10 Aug 2011 22:38:57 +0000 (15:38 -0700)]
teuthology-nuke: move it into its own file.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: identify and reboot machines with kernel mounts
Greg Farnum [Wed, 10 Aug 2011 21:19:23 +0000 (14:19 -0700)]
teuthology-nuke: identify and reboot machines with kernel mounts

This includes untested code for just force-unmounting them
when that works again, but for now it does a full reboot-and-
reconnect cycle.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: use a more robust cfuse mount finder
Greg Farnum [Wed, 10 Aug 2011 17:55:02 +0000 (10:55 -0700)]
teuthology-nuke: use a more robust cfuse mount finder

This way it can remove cfuse mounts in any location on
the system.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: split out different pieces into different loops
Greg Farnum [Wed, 10 Aug 2011 17:47:50 +0000 (10:47 -0700)]
teuthology-nuke: split out different pieces into different loops

This will let us behave more intelligently on things like
nuking kernel mounts.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMove reconnect function from kernel task to misc.py
Greg Farnum [Wed, 10 Aug 2011 17:37:04 +0000 (10:37 -0700)]
Move reconnect function from kernel task to misc.py

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoConfigure grub to default to the right kernel, not the greatest installed one.
Tommi Virtanen [Wed, 10 Aug 2011 20:40:00 +0000 (13:40 -0700)]
Configure grub to default to the right kernel, not the greatest installed one.

This is sticky; that is, even if you install other kernels (manually/via fab/etc),
grub will keep booting up the one that was last enabled via teuthology config.
Use teuthology to switch kernels and it'll just work.

If the kernel the grub default points to is removed, grub will fall back to
booting the kernel with the greatest version number.

Closes: http://tracker.newdream.net/issues/1364
14 years agoHandle socket.timeout when waiting for a reconnect.
Tommi Virtanen [Wed, 10 Aug 2011 20:22:14 +0000 (13:22 -0700)]
Handle socket.timeout when waiting for a reconnect.

Now it gets ignored, just like the other harmless socket errors.

14 years agoWait up to 300 seconds for a reboot.
Tommi Virtanen [Wed, 10 Aug 2011 20:21:39 +0000 (13:21 -0700)]
Wait up to 300 seconds for a reboot.

At least sepia86 was reliably slower than the previous 180 second default.

14 years agoceph: fix max_mds calculation
Sage Weil [Wed, 10 Aug 2011 19:47:20 +0000 (12:47 -0700)]
ceph: fix max_mds calculation

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agokernel: comment reconnect task, clean up reporting
Greg Farnum [Wed, 10 Aug 2011 00:17:08 +0000 (17:17 -0700)]
kernel: comment reconnect task, clean up reporting

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomanypools: remove commented-out code
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code

This accidentally got left in from my development.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMake rbd task use mnt.N not mnt.client.N as mountpoint.
Tommi Virtanen [Tue, 9 Aug 2011 23:25:00 +0000 (16:25 -0700)]
Make rbd task use mnt.N not mnt.client.N as mountpoint.

Everything else expects this, so e.g. workunits wouldn't work with rbd.

14 years agoMake sure workunit task does not create mnt.N by itself.
Tommi Virtanen [Tue, 9 Aug 2011 23:11:32 +0000 (16:11 -0700)]
Make sure workunit task does not create mnt.N by itself.

This used to hide a bug in the rbd task, where rbd
created the mountpoint with the wrong name. The workunits
ended up running against the local filesystem.

14 years agoAdd interactive-on-error, to pause and explore on error.
Tommi Virtanen [Tue, 9 Aug 2011 22:42:17 +0000 (15:42 -0700)]
Add interactive-on-error, to pause and explore on error.

Closes: http://tracker.newdream.net/issues/1291
14 years agoallow s3tests.create_users defaults be overridden
Stephon Striplin [Tue, 9 Aug 2011 20:43:46 +0000 (13:43 -0700)]
allow s3tests.create_users defaults be overridden

14 years agoAdd simple unit test for get_clients.
Tommi Virtanen [Tue, 9 Aug 2011 20:40:56 +0000 (13:40 -0700)]
Add simple unit test for get_clients.

14 years agoRevert "fix get_clients"
Sage Weil [Tue, 9 Aug 2011 20:23:58 +0000 (13:23 -0700)]
Revert "fix get_clients"

This reverts commit 83b6678e79904793bf31e82bbecad7bf16c1b2b5.  The bug I was
hitting was actually fxied by 06e3e69c293b20c0ce5df526fa923a979c1d8cfc.

14 years agoteuthology: add task manypools
Gregory Farnum [Mon, 1 Aug 2011 20:19:15 +0000 (13:19 -0700)]
teuthology: add task manypools

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agonew gitbuilder ref/branch naming
Sage Weil [Fri, 5 Aug 2011 21:35:22 +0000 (14:35 -0700)]
new gitbuilder ref/branch naming

no origin_ prefix

14 years agocfuse, kclient: print remote host
Sage Weil [Thu, 4 Aug 2011 22:03:05 +0000 (15:03 -0700)]
cfuse, kclient: print remote host

14 years agofix get_clients
Sage Weil [Thu, 4 Aug 2011 22:01:49 +0000 (15:01 -0700)]
fix get_clients

Only return the clients that are listed (not _all_ clients).  There might
be a combination of cfuse and kclient (or other) clients here!

14 years agotasks/kclient: don't clobber remote
Sage Weil [Thu, 4 Aug 2011 17:41:50 +0000 (10:41 -0700)]
tasks/kclient: don't clobber remote

14 years agouse coverage_dir
Sage Weil [Thu, 28 Jul 2011 17:28:57 +0000 (10:28 -0700)]
use coverage_dir

14 years agokernel: install in parallel
Josh Durgin [Fri, 5 Aug 2011 18:17:28 +0000 (11:17 -0700)]
kernel: install in parallel

14 years agokernel: debug weird socket exceptions
Josh Durgin [Fri, 5 Aug 2011 18:08:02 +0000 (11:08 -0700)]
kernel: debug weird socket exceptions

14 years agokernel: reboot immediately after installing
Josh Durgin [Fri, 5 Aug 2011 18:07:40 +0000 (11:07 -0700)]
kernel: reboot immediately after installing

This hides the latency of rebooting when installing on many machines.

14 years agoDown machines shouldn't be considered free.
Josh Durgin [Fri, 5 Aug 2011 17:59:16 +0000 (10:59 -0700)]
Down machines shouldn't be considered free.

14 years agoMake scheduled tasks leave some machines free.
Josh Durgin [Fri, 5 Aug 2011 01:32:57 +0000 (18:32 -0700)]
Make scheduled tasks leave some machines free.

14 years agoLog connections to targets
Josh Durgin [Thu, 4 Aug 2011 22:19:13 +0000 (15:19 -0700)]
Log connections to targets

This way you can tell which machines have problems in case of an
error.

14 years agoteuthology-worker: log to a file with timestamps
Josh Durgin [Wed, 3 Aug 2011 22:28:46 +0000 (15:28 -0700)]
teuthology-worker: log to a file with timestamps

14 years agoteuthology-nuke: run in parallel, and print each node being nuked
Josh Durgin [Wed, 3 Aug 2011 21:52:55 +0000 (14:52 -0700)]
teuthology-nuke: run in parallel, and print each node being nuked

14 years agoSet success at the beginning of a run.
Josh Durgin [Wed, 3 Aug 2011 20:59:57 +0000 (13:59 -0700)]
Set success at the beginning of a run.

This way internal tasks like locking can tell whether the run
succeeded, and unlock nodes if it did.

14 years agoteuthology-nuke: reset rsyslog config
Josh Durgin [Wed, 3 Aug 2011 18:21:32 +0000 (11:21 -0700)]
teuthology-nuke: reset rsyslog config

14 years agoteuthology-worker: keep machines locked on error
Josh Durgin [Wed, 3 Aug 2011 00:56:49 +0000 (17:56 -0700)]
teuthology-worker: keep machines locked on error

This prevents a failure to clean up in one case from affecting the
rest of the tests.

14 years agoteuthology-lock: update usage
Josh Durgin [Tue, 2 Aug 2011 23:13:28 +0000 (16:13 -0700)]
teuthology-lock: update usage

14 years agoteuthology-lock: allow list of locks to be filtered by owner and status
Josh Durgin [Tue, 2 Aug 2011 22:53:37 +0000 (15:53 -0700)]
teuthology-lock: allow list of locks to be filtered by owner and status

14 years agoteuthology: convert from bzip2 to gzip.
Greg Farnum [Fri, 29 Jul 2011 17:35:02 +0000 (10:35 -0700)]
teuthology: convert from bzip2 to gzip.

gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoset max_mds based on non-standbys
Sage Weil [Thu, 28 Jul 2011 17:25:30 +0000 (10:25 -0700)]
set max_mds based on non-standbys

14 years agono ++ in python
Sage Weil [Wed, 27 Jul 2011 18:45:20 +0000 (11:45 -0700)]
no ++ in python

14 years agoroles/3-simple: include a standby mds
Sage Weil [Wed, 27 Jul 2011 18:45:13 +0000 (11:45 -0700)]
roles/3-simple: include a standby mds

14 years agoconfigure mds's with -s suffix as standby
Sage Weil [Wed, 27 Jul 2011 17:04:37 +0000 (10:04 -0700)]
configure mds's with -s suffix as standby

14 years agoroles: use letters for mon, mds names
Sage Weil [Wed, 27 Jul 2011 05:06:49 +0000 (22:06 -0700)]
roles: use letters for mon, mds names

14 years agotolerate named (not numbered) mons
Sage Weil [Wed, 27 Jul 2011 04:46:47 +0000 (21:46 -0700)]
tolerate named (not numbered) mons

14 years agospecify and clean up admin socket
Sage Weil [Wed, 27 Jul 2011 04:52:39 +0000 (21:52 -0700)]
specify and clean up admin socket

14 years agolock server: configure for apache with mod_wsgi
Josh Durgin [Wed, 20 Jul 2011 01:37:05 +0000 (18:37 -0700)]
lock server: configure for apache with mod_wsgi

14 years agoSet content-type with PUT.
Josh Durgin [Wed, 20 Jul 2011 01:34:42 +0000 (18:34 -0700)]
Set content-type with PUT.

14 years agoschedule: make default owner different from that of a normal run
Josh Durgin [Wed, 20 Jul 2011 00:24:49 +0000 (17:24 -0700)]
schedule: make default owner different from that of a normal run

This way the machines locked by scheduled jobs aren't confused
with those locked by manual runs, so they're harder to accidentally
unlock.

14 years agoUpdate example targets in readme.
Josh Durgin [Wed, 20 Jul 2011 00:11:12 +0000 (17:11 -0700)]
Update example targets in readme.

14 years agoRemove print that clutters the worker logs.
Josh Durgin [Tue, 19 Jul 2011 23:24:50 +0000 (16:24 -0700)]
Remove print that clutters the worker logs.

14 years agoConnect without using any known_hosts files.
Josh Durgin [Fri, 15 Jul 2011 22:04:08 +0000 (15:04 -0700)]
Connect without using any known_hosts files.

14 years agoMake targets a dictionary mapping hosts to ssh host keys.
Josh Durgin [Thu, 14 Jul 2011 23:47:29 +0000 (16:47 -0700)]
Make targets a dictionary mapping hosts to ssh host keys.

14 years agoAdd command to update ssh hostkeys.
Josh Durgin [Thu, 14 Jul 2011 00:14:52 +0000 (17:14 -0700)]
Add command to update ssh hostkeys.

14 years agolock server: return host pubkeys with locked machine names
Josh Durgin [Thu, 14 Jul 2011 22:26:49 +0000 (15:26 -0700)]
lock server: return host pubkeys with locked machine names

14 years agolock server: allow sshpubkey to be updated
Josh Durgin [Thu, 14 Jul 2011 22:10:50 +0000 (15:10 -0700)]
lock server: allow sshpubkey to be updated

14 years agoUpdate lock db schema.
Josh Durgin [Fri, 15 Jul 2011 21:59:33 +0000 (14:59 -0700)]
Update lock db schema.

14 years agoAdd an overrides section for the ceph task.
Josh Durgin [Sat, 16 Jul 2011 00:15:09 +0000 (17:15 -0700)]
Add an overrides section for the ceph task.

This lets you run a suite against a particular version of ceph, or
with special debug settings.

14 years agoBetter interface for running functions in parallel.
Josh Durgin [Thu, 14 Jul 2011 20:57:07 +0000 (13:57 -0700)]
Better interface for running functions in parallel.

14 years agoMerge branch 'wip-parallel'
Josh Durgin [Thu, 14 Jul 2011 18:15:55 +0000 (11:15 -0700)]
Merge branch 'wip-parallel'

14 years agoceph.conf: remove other random bits
Sage Weil [Tue, 12 Jul 2011 03:37:48 +0000 (20:37 -0700)]
ceph.conf: remove other random bits

obsolete sections, mds tuning.  stick with defaults.

14 years agofusermount runs on a single mount point.
Josh Durgin [Wed, 13 Jul 2011 20:15:28 +0000 (13:15 -0700)]
fusermount runs on a single mount point.

14 years agoDownload ceph binaries in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:57:16 +0000 (10:57 -0700)]
Download ceph binaries in parallel.

14 years agoRun workunits on different clients in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:56:40 +0000 (10:56 -0700)]
Run workunits on different clients in parallel.

14 years agoDownload and run autotests on multiple clients in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:53:10 +0000 (10:53 -0700)]
Download and run autotests on multiple clients in parallel.

These clients must still be on different machines,
or they'll clobber each other's results.

14 years agoAdd a utility for running functions in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:50:09 +0000 (10:50 -0700)]
Add a utility for running functions in parallel.

14 years agoMerge branch 'localdir'
Tommi Virtanen [Wed, 13 Jul 2011 19:38:12 +0000 (12:38 -0700)]
Merge branch 'localdir'

Conflicts:
teuthology/task/ceph.py

14 years agoFeed locally-created binary tarball to remotes in parallel.
Tommi Virtanen [Wed, 13 Jul 2011 19:34:39 +0000 (12:34 -0700)]
Feed locally-created binary tarball to remotes in parallel.

This should be faster as long as we have the bandwidth for it.

14 years agoUse a nameless tempfile for local tarball, avoids cleanup.
Tommi Virtanen [Wed, 13 Jul 2011 19:18:55 +0000 (12:18 -0700)]
Use a nameless tempfile for local tarball, avoids cleanup.

14 years agoMore careful error checking, avoid need for shell quoting.
Tommi Virtanen [Wed, 13 Jul 2011 19:07:36 +0000 (12:07 -0700)]
More careful error checking, avoid need for shell quoting.

14 years agoClean up tarball tmpdir in all cases.
Tommi Virtanen [Wed, 13 Jul 2011 18:32:28 +0000 (11:32 -0700)]
Clean up tarball tmpdir in all cases.

Prefer shutil.rmtree over os.system('rm -rf ...').

14 years agoUse tempfile instead of ad hoc temp dir creation.
Tommi Virtanen [Wed, 13 Jul 2011 17:58:01 +0000 (10:58 -0700)]
Use tempfile instead of ad hoc temp dir creation.

14 years agoRemove TODO note covered by teuthology-nuke.
Tommi Virtanen [Wed, 13 Jul 2011 17:44:33 +0000 (10:44 -0700)]
Remove TODO note covered by teuthology-nuke.

14 years agoAvoid identifier clash with builtin "dir".
Tommi Virtanen [Wed, 13 Jul 2011 17:17:04 +0000 (10:17 -0700)]
Avoid identifier clash with builtin "dir".

14 years agoceph.conf: clean out random debug level changes
Sage Weil [Tue, 12 Jul 2011 03:32:34 +0000 (20:32 -0700)]
ceph.conf: clean out random debug level changes

keep it simple!

14 years agoinclude sha1 in summary
Sage Weil [Tue, 12 Jul 2011 03:32:07 +0000 (20:32 -0700)]
include sha1 in summary

Redundant (there's also a ceph-sha1 file), but convenient.

14 years agols: mention directories without summary.yaml
Sage Weil [Tue, 12 Jul 2011 03:31:37 +0000 (20:31 -0700)]
ls: mention directories without summary.yaml