]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoceph task: pass the full config to the daemon startup subs
Greg Farnum [Mon, 15 Aug 2011 22:31:18 +0000 (15:31 -0700)]
ceph task: pass the full config to the daemon startup subs

So far as I can tell there is no reason to reduce them to
the coverage config, and I want the full config for my
soon-to-exist valgrind options.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMerge branch 'wip-nuke'
Greg Farnum [Wed, 10 Aug 2011 23:16:11 +0000 (16:16 -0700)]
Merge branch 'wip-nuke'

Conflicts:
teuthology/task/kernel.py

14 years agomanypools: remove commented-out code
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code

This accidentally got left in from my development.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: split the big main function
Greg Farnum [Wed, 10 Aug 2011 23:06:45 +0000 (16:06 -0700)]
teuthology-nuke: split the big main function

It was getting a bit big, but now all the functions fit on
one screen each.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: move it into its own file.
Greg Farnum [Wed, 10 Aug 2011 22:38:57 +0000 (15:38 -0700)]
teuthology-nuke: move it into its own file.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: identify and reboot machines with kernel mounts
Greg Farnum [Wed, 10 Aug 2011 21:19:23 +0000 (14:19 -0700)]
teuthology-nuke: identify and reboot machines with kernel mounts

This includes untested code for just force-unmounting them
when that works again, but for now it does a full reboot-and-
reconnect cycle.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: use a more robust cfuse mount finder
Greg Farnum [Wed, 10 Aug 2011 17:55:02 +0000 (10:55 -0700)]
teuthology-nuke: use a more robust cfuse mount finder

This way it can remove cfuse mounts in any location on
the system.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoteuthology-nuke: split out different pieces into different loops
Greg Farnum [Wed, 10 Aug 2011 17:47:50 +0000 (10:47 -0700)]
teuthology-nuke: split out different pieces into different loops

This will let us behave more intelligently on things like
nuking kernel mounts.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMove reconnect function from kernel task to misc.py
Greg Farnum [Wed, 10 Aug 2011 17:37:04 +0000 (10:37 -0700)]
Move reconnect function from kernel task to misc.py

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoConfigure grub to default to the right kernel, not the greatest installed one.
Tommi Virtanen [Wed, 10 Aug 2011 20:40:00 +0000 (13:40 -0700)]
Configure grub to default to the right kernel, not the greatest installed one.

This is sticky; that is, even if you install other kernels (manually/via fab/etc),
grub will keep booting up the one that was last enabled via teuthology config.
Use teuthology to switch kernels and it'll just work.

If the kernel the grub default points to is removed, grub will fall back to
booting the kernel with the greatest version number.

Closes: http://tracker.newdream.net/issues/1364
14 years agoHandle socket.timeout when waiting for a reconnect.
Tommi Virtanen [Wed, 10 Aug 2011 20:22:14 +0000 (13:22 -0700)]
Handle socket.timeout when waiting for a reconnect.

Now it gets ignored, just like the other harmless socket errors.

14 years agoWait up to 300 seconds for a reboot.
Tommi Virtanen [Wed, 10 Aug 2011 20:21:39 +0000 (13:21 -0700)]
Wait up to 300 seconds for a reboot.

At least sepia86 was reliably slower than the previous 180 second default.

14 years agoceph: fix max_mds calculation
Sage Weil [Wed, 10 Aug 2011 19:47:20 +0000 (12:47 -0700)]
ceph: fix max_mds calculation

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agokernel: comment reconnect task, clean up reporting
Greg Farnum [Wed, 10 Aug 2011 00:17:08 +0000 (17:17 -0700)]
kernel: comment reconnect task, clean up reporting

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomanypools: remove commented-out code
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code

This accidentally got left in from my development.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMake rbd task use mnt.N not mnt.client.N as mountpoint.
Tommi Virtanen [Tue, 9 Aug 2011 23:25:00 +0000 (16:25 -0700)]
Make rbd task use mnt.N not mnt.client.N as mountpoint.

Everything else expects this, so e.g. workunits wouldn't work with rbd.

14 years agoMake sure workunit task does not create mnt.N by itself.
Tommi Virtanen [Tue, 9 Aug 2011 23:11:32 +0000 (16:11 -0700)]
Make sure workunit task does not create mnt.N by itself.

This used to hide a bug in the rbd task, where rbd
created the mountpoint with the wrong name. The workunits
ended up running against the local filesystem.

14 years agoAdd interactive-on-error, to pause and explore on error.
Tommi Virtanen [Tue, 9 Aug 2011 22:42:17 +0000 (15:42 -0700)]
Add interactive-on-error, to pause and explore on error.

Closes: http://tracker.newdream.net/issues/1291
14 years agoallow s3tests.create_users defaults be overridden
Stephon Striplin [Tue, 9 Aug 2011 20:43:46 +0000 (13:43 -0700)]
allow s3tests.create_users defaults be overridden

14 years agoAdd simple unit test for get_clients.
Tommi Virtanen [Tue, 9 Aug 2011 20:40:56 +0000 (13:40 -0700)]
Add simple unit test for get_clients.

14 years agoRevert "fix get_clients"
Sage Weil [Tue, 9 Aug 2011 20:23:58 +0000 (13:23 -0700)]
Revert "fix get_clients"

This reverts commit 83b6678e79904793bf31e82bbecad7bf16c1b2b5.  The bug I was
hitting was actually fxied by 06e3e69c293b20c0ce5df526fa923a979c1d8cfc.

14 years agoteuthology: add task manypools
Gregory Farnum [Mon, 1 Aug 2011 20:19:15 +0000 (13:19 -0700)]
teuthology: add task manypools

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agonew gitbuilder ref/branch naming
Sage Weil [Fri, 5 Aug 2011 21:35:22 +0000 (14:35 -0700)]
new gitbuilder ref/branch naming

no origin_ prefix

14 years agocfuse, kclient: print remote host
Sage Weil [Thu, 4 Aug 2011 22:03:05 +0000 (15:03 -0700)]
cfuse, kclient: print remote host

14 years agofix get_clients
Sage Weil [Thu, 4 Aug 2011 22:01:49 +0000 (15:01 -0700)]
fix get_clients

Only return the clients that are listed (not _all_ clients).  There might
be a combination of cfuse and kclient (or other) clients here!

14 years agotasks/kclient: don't clobber remote
Sage Weil [Thu, 4 Aug 2011 17:41:50 +0000 (10:41 -0700)]
tasks/kclient: don't clobber remote

14 years agouse coverage_dir
Sage Weil [Thu, 28 Jul 2011 17:28:57 +0000 (10:28 -0700)]
use coverage_dir

14 years agokernel: install in parallel
Josh Durgin [Fri, 5 Aug 2011 18:17:28 +0000 (11:17 -0700)]
kernel: install in parallel

14 years agokernel: debug weird socket exceptions
Josh Durgin [Fri, 5 Aug 2011 18:08:02 +0000 (11:08 -0700)]
kernel: debug weird socket exceptions

14 years agokernel: reboot immediately after installing
Josh Durgin [Fri, 5 Aug 2011 18:07:40 +0000 (11:07 -0700)]
kernel: reboot immediately after installing

This hides the latency of rebooting when installing on many machines.

14 years agoDown machines shouldn't be considered free.
Josh Durgin [Fri, 5 Aug 2011 17:59:16 +0000 (10:59 -0700)]
Down machines shouldn't be considered free.

14 years agoMake scheduled tasks leave some machines free.
Josh Durgin [Fri, 5 Aug 2011 01:32:57 +0000 (18:32 -0700)]
Make scheduled tasks leave some machines free.

14 years agoLog connections to targets
Josh Durgin [Thu, 4 Aug 2011 22:19:13 +0000 (15:19 -0700)]
Log connections to targets

This way you can tell which machines have problems in case of an
error.

14 years agoteuthology-worker: log to a file with timestamps
Josh Durgin [Wed, 3 Aug 2011 22:28:46 +0000 (15:28 -0700)]
teuthology-worker: log to a file with timestamps

14 years agoteuthology-nuke: run in parallel, and print each node being nuked
Josh Durgin [Wed, 3 Aug 2011 21:52:55 +0000 (14:52 -0700)]
teuthology-nuke: run in parallel, and print each node being nuked

14 years agoSet success at the beginning of a run.
Josh Durgin [Wed, 3 Aug 2011 20:59:57 +0000 (13:59 -0700)]
Set success at the beginning of a run.

This way internal tasks like locking can tell whether the run
succeeded, and unlock nodes if it did.

14 years agoteuthology-nuke: reset rsyslog config
Josh Durgin [Wed, 3 Aug 2011 18:21:32 +0000 (11:21 -0700)]
teuthology-nuke: reset rsyslog config

14 years agoteuthology-worker: keep machines locked on error
Josh Durgin [Wed, 3 Aug 2011 00:56:49 +0000 (17:56 -0700)]
teuthology-worker: keep machines locked on error

This prevents a failure to clean up in one case from affecting the
rest of the tests.

14 years agoteuthology-lock: update usage
Josh Durgin [Tue, 2 Aug 2011 23:13:28 +0000 (16:13 -0700)]
teuthology-lock: update usage

14 years agoteuthology-lock: allow list of locks to be filtered by owner and status
Josh Durgin [Tue, 2 Aug 2011 22:53:37 +0000 (15:53 -0700)]
teuthology-lock: allow list of locks to be filtered by owner and status

14 years agoteuthology: convert from bzip2 to gzip.
Greg Farnum [Fri, 29 Jul 2011 17:35:02 +0000 (10:35 -0700)]
teuthology: convert from bzip2 to gzip.

gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoset max_mds based on non-standbys
Sage Weil [Thu, 28 Jul 2011 17:25:30 +0000 (10:25 -0700)]
set max_mds based on non-standbys

14 years agono ++ in python
Sage Weil [Wed, 27 Jul 2011 18:45:20 +0000 (11:45 -0700)]
no ++ in python

14 years agoroles/3-simple: include a standby mds
Sage Weil [Wed, 27 Jul 2011 18:45:13 +0000 (11:45 -0700)]
roles/3-simple: include a standby mds

14 years agoconfigure mds's with -s suffix as standby
Sage Weil [Wed, 27 Jul 2011 17:04:37 +0000 (10:04 -0700)]
configure mds's with -s suffix as standby

14 years agoroles: use letters for mon, mds names
Sage Weil [Wed, 27 Jul 2011 05:06:49 +0000 (22:06 -0700)]
roles: use letters for mon, mds names

14 years agotolerate named (not numbered) mons
Sage Weil [Wed, 27 Jul 2011 04:46:47 +0000 (21:46 -0700)]
tolerate named (not numbered) mons

14 years agospecify and clean up admin socket
Sage Weil [Wed, 27 Jul 2011 04:52:39 +0000 (21:52 -0700)]
specify and clean up admin socket

14 years agolock server: configure for apache with mod_wsgi
Josh Durgin [Wed, 20 Jul 2011 01:37:05 +0000 (18:37 -0700)]
lock server: configure for apache with mod_wsgi

14 years agoSet content-type with PUT.
Josh Durgin [Wed, 20 Jul 2011 01:34:42 +0000 (18:34 -0700)]
Set content-type with PUT.

14 years agoschedule: make default owner different from that of a normal run
Josh Durgin [Wed, 20 Jul 2011 00:24:49 +0000 (17:24 -0700)]
schedule: make default owner different from that of a normal run

This way the machines locked by scheduled jobs aren't confused
with those locked by manual runs, so they're harder to accidentally
unlock.

14 years agoUpdate example targets in readme.
Josh Durgin [Wed, 20 Jul 2011 00:11:12 +0000 (17:11 -0700)]
Update example targets in readme.

14 years agoRemove print that clutters the worker logs.
Josh Durgin [Tue, 19 Jul 2011 23:24:50 +0000 (16:24 -0700)]
Remove print that clutters the worker logs.

14 years agoConnect without using any known_hosts files.
Josh Durgin [Fri, 15 Jul 2011 22:04:08 +0000 (15:04 -0700)]
Connect without using any known_hosts files.

14 years agoMake targets a dictionary mapping hosts to ssh host keys.
Josh Durgin [Thu, 14 Jul 2011 23:47:29 +0000 (16:47 -0700)]
Make targets a dictionary mapping hosts to ssh host keys.

14 years agoAdd command to update ssh hostkeys.
Josh Durgin [Thu, 14 Jul 2011 00:14:52 +0000 (17:14 -0700)]
Add command to update ssh hostkeys.

14 years agolock server: return host pubkeys with locked machine names
Josh Durgin [Thu, 14 Jul 2011 22:26:49 +0000 (15:26 -0700)]
lock server: return host pubkeys with locked machine names

14 years agolock server: allow sshpubkey to be updated
Josh Durgin [Thu, 14 Jul 2011 22:10:50 +0000 (15:10 -0700)]
lock server: allow sshpubkey to be updated

14 years agoUpdate lock db schema.
Josh Durgin [Fri, 15 Jul 2011 21:59:33 +0000 (14:59 -0700)]
Update lock db schema.

14 years agoAdd an overrides section for the ceph task.
Josh Durgin [Sat, 16 Jul 2011 00:15:09 +0000 (17:15 -0700)]
Add an overrides section for the ceph task.

This lets you run a suite against a particular version of ceph, or
with special debug settings.

14 years agoBetter interface for running functions in parallel.
Josh Durgin [Thu, 14 Jul 2011 20:57:07 +0000 (13:57 -0700)]
Better interface for running functions in parallel.

14 years agoMerge branch 'wip-parallel'
Josh Durgin [Thu, 14 Jul 2011 18:15:55 +0000 (11:15 -0700)]
Merge branch 'wip-parallel'

14 years agoceph.conf: remove other random bits
Sage Weil [Tue, 12 Jul 2011 03:37:48 +0000 (20:37 -0700)]
ceph.conf: remove other random bits

obsolete sections, mds tuning.  stick with defaults.

14 years agofusermount runs on a single mount point.
Josh Durgin [Wed, 13 Jul 2011 20:15:28 +0000 (13:15 -0700)]
fusermount runs on a single mount point.

14 years agoDownload ceph binaries in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:57:16 +0000 (10:57 -0700)]
Download ceph binaries in parallel.

14 years agoRun workunits on different clients in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:56:40 +0000 (10:56 -0700)]
Run workunits on different clients in parallel.

14 years agoDownload and run autotests on multiple clients in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:53:10 +0000 (10:53 -0700)]
Download and run autotests on multiple clients in parallel.

These clients must still be on different machines,
or they'll clobber each other's results.

14 years agoAdd a utility for running functions in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:50:09 +0000 (10:50 -0700)]
Add a utility for running functions in parallel.

14 years agoMerge branch 'localdir'
Tommi Virtanen [Wed, 13 Jul 2011 19:38:12 +0000 (12:38 -0700)]
Merge branch 'localdir'

Conflicts:
teuthology/task/ceph.py

14 years agoFeed locally-created binary tarball to remotes in parallel.
Tommi Virtanen [Wed, 13 Jul 2011 19:34:39 +0000 (12:34 -0700)]
Feed locally-created binary tarball to remotes in parallel.

This should be faster as long as we have the bandwidth for it.

14 years agoUse a nameless tempfile for local tarball, avoids cleanup.
Tommi Virtanen [Wed, 13 Jul 2011 19:18:55 +0000 (12:18 -0700)]
Use a nameless tempfile for local tarball, avoids cleanup.

14 years agoMore careful error checking, avoid need for shell quoting.
Tommi Virtanen [Wed, 13 Jul 2011 19:07:36 +0000 (12:07 -0700)]
More careful error checking, avoid need for shell quoting.

14 years agoClean up tarball tmpdir in all cases.
Tommi Virtanen [Wed, 13 Jul 2011 18:32:28 +0000 (11:32 -0700)]
Clean up tarball tmpdir in all cases.

Prefer shutil.rmtree over os.system('rm -rf ...').

14 years agoUse tempfile instead of ad hoc temp dir creation.
Tommi Virtanen [Wed, 13 Jul 2011 17:58:01 +0000 (10:58 -0700)]
Use tempfile instead of ad hoc temp dir creation.

14 years agoRemove TODO note covered by teuthology-nuke.
Tommi Virtanen [Wed, 13 Jul 2011 17:44:33 +0000 (10:44 -0700)]
Remove TODO note covered by teuthology-nuke.

14 years agoAvoid identifier clash with builtin "dir".
Tommi Virtanen [Wed, 13 Jul 2011 17:17:04 +0000 (10:17 -0700)]
Avoid identifier clash with builtin "dir".

14 years agoceph.conf: clean out random debug level changes
Sage Weil [Tue, 12 Jul 2011 03:32:34 +0000 (20:32 -0700)]
ceph.conf: clean out random debug level changes

keep it simple!

14 years agoinclude sha1 in summary
Sage Weil [Tue, 12 Jul 2011 03:32:07 +0000 (20:32 -0700)]
include sha1 in summary

Redundant (there's also a ceph-sha1 file), but convenient.

14 years agols: mention directories without summary.yaml
Sage Weil [Tue, 12 Jul 2011 03:31:37 +0000 (20:31 -0700)]
ls: mention directories without summary.yaml

14 years agoClean up from pyflakes.
Josh Durgin [Tue, 12 Jul 2011 01:04:09 +0000 (18:04 -0700)]
Clean up from pyflakes.

14 years agoWhitespace and style cleanup.
Josh Durgin [Tue, 12 Jul 2011 01:00:03 +0000 (18:00 -0700)]
Whitespace and style cleanup.

14 years agoRemove unused variable.
Josh Durgin [Tue, 12 Jul 2011 00:39:10 +0000 (17:39 -0700)]
Remove unused variable.

14 years agoSuccess of test may not have been set yet.
Josh Durgin [Tue, 12 Jul 2011 00:34:36 +0000 (17:34 -0700)]
Success of test may not have been set yet.

14 years agoadd locktest task
Greg Farnum [Mon, 11 Jul 2011 23:40:29 +0000 (16:40 -0700)]
add locktest task

This will retrieve xfstests' locktest and run it on two clients.

I still need to tweak this so the logging output we get is more useful, and
so that we test extra features like wait locks, but it does execute.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agotask ceph: distribute monmap to all nodes, not just mons.
Greg Farnum [Thu, 7 Jul 2011 22:40:37 +0000 (15:40 -0700)]
task ceph: distribute monmap to all nodes, not just mons.

And clean up the monmap, too!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoAdd an option to keep machines locked if a test fails.
Josh Durgin [Mon, 11 Jul 2011 22:48:42 +0000 (15:48 -0700)]
Add an option to keep machines locked if a test fails.

14 years agolock: specify machines as input yaml targets: clause
Sage Weil [Mon, 11 Jul 2011 22:25:36 +0000 (15:25 -0700)]
lock: specify machines as input yaml targets: clause

14 years agoprint --lock-many result as yaml targets: stanza
Sage Weil [Mon, 11 Jul 2011 21:49:53 +0000 (14:49 -0700)]
print --lock-many result as yaml targets: stanza

14 years agoclean up locked machine list
Sage Weil [Mon, 11 Jul 2011 22:27:50 +0000 (15:27 -0700)]
clean up locked machine list

14 years agotell user which machines you locked
Sage Weil [Mon, 11 Jul 2011 21:39:21 +0000 (14:39 -0700)]
tell user which machines you locked

14 years agonuke: use default owner
Sage Weil [Mon, 11 Jul 2011 21:39:04 +0000 (14:39 -0700)]
nuke: use default owner

14 years agomake connect work if no roles are specified
Sage Weil [Mon, 11 Jul 2011 21:23:31 +0000 (14:23 -0700)]
make connect work if no roles are specified

This is useful for -nuke.

14 years agosuite: schedule jobs instead of executing each configuration serially.
Josh Durgin [Mon, 11 Jul 2011 19:52:07 +0000 (12:52 -0700)]
suite: schedule jobs instead of executing each configuration serially.

14 years agoAdd teuthology-schedule and teuthology-worker.
Josh Durgin [Fri, 8 Jul 2011 18:37:20 +0000 (11:37 -0700)]
Add teuthology-schedule and teuthology-worker.

schedule puts jobs in a beanstalk queue, worker takes them out and runs them.

14 years agoAdd httplib2 to setup.py.
Josh Durgin [Fri, 8 Jul 2011 00:06:18 +0000 (17:06 -0700)]
Add httplib2 to setup.py.

14 years agoteuthology-suite: pass --lock and --block to teuthology
Josh Durgin [Thu, 7 Jul 2011 23:19:26 +0000 (16:19 -0700)]
teuthology-suite: pass --lock and --block to teuthology

14 years agoAdd --block option to retry until machines are locked.
Josh Durgin [Thu, 7 Jul 2011 23:15:18 +0000 (16:15 -0700)]
Add --block option to retry until machines are locked.

If there are not enough machines up, fail immediately.

14 years agoCheck more invalid argument combinations for teuthology-lock.
Josh Durgin [Thu, 7 Jul 2011 21:56:12 +0000 (14:56 -0700)]
Check more invalid argument combinations for teuthology-lock.

14 years agoRemove locking from TODO.
Josh Durgin [Thu, 7 Jul 2011 19:16:45 +0000 (12:16 -0700)]
Remove locking from TODO.

14 years agoUpdate readme for locking.
Josh Durgin [Thu, 7 Jul 2011 19:16:10 +0000 (12:16 -0700)]
Update readme for locking.