]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Greg Farnum [Wed, 10 Aug 2011 21:19:23 +0000 (14:19 -0700)]
teuthology-nuke: identify and reboot machines with kernel mounts
This includes untested code for just force-unmounting them
when that works again, but for now it does a full reboot-and-
reconnect cycle.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:55:02 +0000 (10:55 -0700)]
teuthology-nuke: use a more robust cfuse mount finder
This way it can remove cfuse mounts in any location on
the system.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:47:50 +0000 (10:47 -0700)]
teuthology-nuke: split out different pieces into different loops
This will let us behave more intelligently on things like
nuking kernel mounts.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:37:04 +0000 (10:37 -0700)]
Move reconnect function from kernel task to misc.py
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 00:17:08 +0000 (17:17 -0700)]
kernel: comment reconnect task, clean up reporting
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code
This accidentally got left in from my development.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Tue, 9 Aug 2011 23:25:00 +0000 (16:25 -0700)]
Make rbd task use mnt.N not mnt.client.N as mountpoint.
Everything else expects this, so e.g. workunits wouldn't work with rbd.
Tommi Virtanen [Tue, 9 Aug 2011 23:11:32 +0000 (16:11 -0700)]
Make sure workunit task does not create mnt.N by itself.
This used to hide a bug in the rbd task, where rbd
created the mountpoint with the wrong name. The workunits
ended up running against the local filesystem.
Tommi Virtanen [Tue, 9 Aug 2011 22:42:17 +0000 (15:42 -0700)]
Add interactive-on-error, to pause and explore on error.
Closes: http://tracker.newdream.net/issues/1291
Stephon Striplin [Tue, 9 Aug 2011 20:43:46 +0000 (13:43 -0700)]
allow s3tests.create_users defaults be overridden
Tommi Virtanen [Tue, 9 Aug 2011 20:40:56 +0000 (13:40 -0700)]
Add simple unit test for get_clients.
Sage Weil [Tue, 9 Aug 2011 20:23:58 +0000 (13:23 -0700)]
Revert "fix get_clients"
This reverts commit
83b6678e79904793bf31e82bbecad7bf16c1b2b5 . The bug I was
hitting was actually fxied by
06e3e69c293b20c0ce5df526fa923a979c1d8cfc .
Gregory Farnum [Mon, 1 Aug 2011 20:19:15 +0000 (13:19 -0700)]
teuthology: add task manypools
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Fri, 5 Aug 2011 21:35:22 +0000 (14:35 -0700)]
new gitbuilder ref/branch naming
no origin_ prefix
Sage Weil [Thu, 4 Aug 2011 22:03:05 +0000 (15:03 -0700)]
cfuse, kclient: print remote host
Sage Weil [Thu, 4 Aug 2011 22:01:49 +0000 (15:01 -0700)]
fix get_clients
Only return the clients that are listed (not _all_ clients). There might
be a combination of cfuse and kclient (or other) clients here!
Sage Weil [Thu, 4 Aug 2011 17:41:50 +0000 (10:41 -0700)]
tasks/kclient: don't clobber remote
Sage Weil [Thu, 28 Jul 2011 17:28:57 +0000 (10:28 -0700)]
use coverage_dir
Josh Durgin [Fri, 5 Aug 2011 18:17:28 +0000 (11:17 -0700)]
kernel: install in parallel
Josh Durgin [Fri, 5 Aug 2011 18:08:02 +0000 (11:08 -0700)]
kernel: debug weird socket exceptions
Josh Durgin [Fri, 5 Aug 2011 18:07:40 +0000 (11:07 -0700)]
kernel: reboot immediately after installing
This hides the latency of rebooting when installing on many machines.
Josh Durgin [Fri, 5 Aug 2011 17:59:16 +0000 (10:59 -0700)]
Down machines shouldn't be considered free.
Josh Durgin [Fri, 5 Aug 2011 01:32:57 +0000 (18:32 -0700)]
Make scheduled tasks leave some machines free.
Josh Durgin [Thu, 4 Aug 2011 22:19:13 +0000 (15:19 -0700)]
Log connections to targets
This way you can tell which machines have problems in case of an
error.
Josh Durgin [Wed, 3 Aug 2011 22:28:46 +0000 (15:28 -0700)]
teuthology-worker: log to a file with timestamps
Josh Durgin [Wed, 3 Aug 2011 21:52:55 +0000 (14:52 -0700)]
teuthology-nuke: run in parallel, and print each node being nuked
Josh Durgin [Wed, 3 Aug 2011 20:59:57 +0000 (13:59 -0700)]
Set success at the beginning of a run.
This way internal tasks like locking can tell whether the run
succeeded, and unlock nodes if it did.
Josh Durgin [Wed, 3 Aug 2011 18:21:32 +0000 (11:21 -0700)]
teuthology-nuke: reset rsyslog config
Josh Durgin [Wed, 3 Aug 2011 00:56:49 +0000 (17:56 -0700)]
teuthology-worker: keep machines locked on error
This prevents a failure to clean up in one case from affecting the
rest of the tests.
Josh Durgin [Tue, 2 Aug 2011 23:13:28 +0000 (16:13 -0700)]
teuthology-lock: update usage
Josh Durgin [Tue, 2 Aug 2011 22:53:37 +0000 (15:53 -0700)]
teuthology-lock: allow list of locks to be filtered by owner and status
Greg Farnum [Fri, 29 Jul 2011 17:35:02 +0000 (10:35 -0700)]
teuthology: convert from bzip2 to gzip.
gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Thu, 28 Jul 2011 17:25:30 +0000 (10:25 -0700)]
set max_mds based on non-standbys
Sage Weil [Wed, 27 Jul 2011 18:45:20 +0000 (11:45 -0700)]
no ++ in python
Sage Weil [Wed, 27 Jul 2011 18:45:13 +0000 (11:45 -0700)]
roles/3-simple: include a standby mds
Sage Weil [Wed, 27 Jul 2011 17:04:37 +0000 (10:04 -0700)]
configure mds's with -s suffix as standby
Sage Weil [Wed, 27 Jul 2011 05:06:49 +0000 (22:06 -0700)]
roles: use letters for mon, mds names
Sage Weil [Wed, 27 Jul 2011 04:46:47 +0000 (21:46 -0700)]
tolerate named (not numbered) mons
Sage Weil [Wed, 27 Jul 2011 04:52:39 +0000 (21:52 -0700)]
specify and clean up admin socket
Josh Durgin [Wed, 20 Jul 2011 01:37:05 +0000 (18:37 -0700)]
lock server: configure for apache with mod_wsgi
Josh Durgin [Wed, 20 Jul 2011 01:34:42 +0000 (18:34 -0700)]
Set content-type with PUT.
Josh Durgin [Wed, 20 Jul 2011 00:24:49 +0000 (17:24 -0700)]
schedule: make default owner different from that of a normal run
This way the machines locked by scheduled jobs aren't confused
with those locked by manual runs, so they're harder to accidentally
unlock.
Josh Durgin [Wed, 20 Jul 2011 00:11:12 +0000 (17:11 -0700)]
Update example targets in readme.
Josh Durgin [Tue, 19 Jul 2011 23:24:50 +0000 (16:24 -0700)]
Remove print that clutters the worker logs.
Josh Durgin [Fri, 15 Jul 2011 22:04:08 +0000 (15:04 -0700)]
Connect without using any known_hosts files.
Josh Durgin [Thu, 14 Jul 2011 23:47:29 +0000 (16:47 -0700)]
Make targets a dictionary mapping hosts to ssh host keys.
Josh Durgin [Thu, 14 Jul 2011 00:14:52 +0000 (17:14 -0700)]
Add command to update ssh hostkeys.
Josh Durgin [Thu, 14 Jul 2011 22:26:49 +0000 (15:26 -0700)]
lock server: return host pubkeys with locked machine names
Josh Durgin [Thu, 14 Jul 2011 22:10:50 +0000 (15:10 -0700)]
lock server: allow sshpubkey to be updated
Josh Durgin [Fri, 15 Jul 2011 21:59:33 +0000 (14:59 -0700)]
Update lock db schema.
Josh Durgin [Sat, 16 Jul 2011 00:15:09 +0000 (17:15 -0700)]
Add an overrides section for the ceph task.
This lets you run a suite against a particular version of ceph, or
with special debug settings.
Josh Durgin [Thu, 14 Jul 2011 20:57:07 +0000 (13:57 -0700)]
Better interface for running functions in parallel.
Josh Durgin [Thu, 14 Jul 2011 18:15:55 +0000 (11:15 -0700)]
Merge branch 'wip-parallel'
Sage Weil [Tue, 12 Jul 2011 03:37:48 +0000 (20:37 -0700)]
ceph.conf: remove other random bits
obsolete sections, mds tuning. stick with defaults.
Josh Durgin [Wed, 13 Jul 2011 20:15:28 +0000 (13:15 -0700)]
fusermount runs on a single mount point.
Josh Durgin [Wed, 22 Jun 2011 17:57:16 +0000 (10:57 -0700)]
Download ceph binaries in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:56:40 +0000 (10:56 -0700)]
Run workunits on different clients in parallel.
Josh Durgin [Wed, 22 Jun 2011 17:53:10 +0000 (10:53 -0700)]
Download and run autotests on multiple clients in parallel.
These clients must still be on different machines,
or they'll clobber each other's results.
Josh Durgin [Wed, 22 Jun 2011 17:50:09 +0000 (10:50 -0700)]
Add a utility for running functions in parallel.
Tommi Virtanen [Wed, 13 Jul 2011 19:38:12 +0000 (12:38 -0700)]
Merge branch 'localdir'
Conflicts:
teuthology/task/ceph.py
Tommi Virtanen [Wed, 13 Jul 2011 19:34:39 +0000 (12:34 -0700)]
Feed locally-created binary tarball to remotes in parallel.
This should be faster as long as we have the bandwidth for it.
Tommi Virtanen [Wed, 13 Jul 2011 19:18:55 +0000 (12:18 -0700)]
Use a nameless tempfile for local tarball, avoids cleanup.
Tommi Virtanen [Wed, 13 Jul 2011 19:07:36 +0000 (12:07 -0700)]
More careful error checking, avoid need for shell quoting.
Tommi Virtanen [Wed, 13 Jul 2011 18:32:28 +0000 (11:32 -0700)]
Clean up tarball tmpdir in all cases.
Prefer shutil.rmtree over os.system('rm -rf ...').
Tommi Virtanen [Wed, 13 Jul 2011 17:58:01 +0000 (10:58 -0700)]
Use tempfile instead of ad hoc temp dir creation.
Tommi Virtanen [Wed, 13 Jul 2011 17:44:33 +0000 (10:44 -0700)]
Remove TODO note covered by teuthology-nuke.
Tommi Virtanen [Wed, 13 Jul 2011 17:17:04 +0000 (10:17 -0700)]
Avoid identifier clash with builtin "dir".
Sage Weil [Tue, 12 Jul 2011 03:32:34 +0000 (20:32 -0700)]
ceph.conf: clean out random debug level changes
keep it simple!
Sage Weil [Tue, 12 Jul 2011 03:32:07 +0000 (20:32 -0700)]
include sha1 in summary
Redundant (there's also a ceph-sha1 file), but convenient.
Sage Weil [Tue, 12 Jul 2011 03:31:37 +0000 (20:31 -0700)]
ls: mention directories without summary.yaml
Josh Durgin [Tue, 12 Jul 2011 01:04:09 +0000 (18:04 -0700)]
Clean up from pyflakes.
Josh Durgin [Tue, 12 Jul 2011 01:00:03 +0000 (18:00 -0700)]
Whitespace and style cleanup.
Josh Durgin [Tue, 12 Jul 2011 00:39:10 +0000 (17:39 -0700)]
Remove unused variable.
Josh Durgin [Tue, 12 Jul 2011 00:34:36 +0000 (17:34 -0700)]
Success of test may not have been set yet.
Greg Farnum [Mon, 11 Jul 2011 23:40:29 +0000 (16:40 -0700)]
add locktest task
This will retrieve xfstests' locktest and run it on two clients.
I still need to tweak this so the logging output we get is more useful, and
so that we test extra features like wait locks, but it does execute.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 7 Jul 2011 22:40:37 +0000 (15:40 -0700)]
task ceph: distribute monmap to all nodes, not just mons.
And clean up the monmap, too!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Mon, 11 Jul 2011 22:48:42 +0000 (15:48 -0700)]
Add an option to keep machines locked if a test fails.
Sage Weil [Mon, 11 Jul 2011 22:25:36 +0000 (15:25 -0700)]
lock: specify machines as input yaml targets: clause
Sage Weil [Mon, 11 Jul 2011 21:49:53 +0000 (14:49 -0700)]
print --lock-many result as yaml targets: stanza
Sage Weil [Mon, 11 Jul 2011 22:27:50 +0000 (15:27 -0700)]
clean up locked machine list
Sage Weil [Mon, 11 Jul 2011 21:39:21 +0000 (14:39 -0700)]
tell user which machines you locked
Sage Weil [Mon, 11 Jul 2011 21:39:04 +0000 (14:39 -0700)]
nuke: use default owner
Sage Weil [Mon, 11 Jul 2011 21:23:31 +0000 (14:23 -0700)]
make connect work if no roles are specified
This is useful for -nuke.
Josh Durgin [Mon, 11 Jul 2011 19:52:07 +0000 (12:52 -0700)]
suite: schedule jobs instead of executing each configuration serially.
Josh Durgin [Fri, 8 Jul 2011 18:37:20 +0000 (11:37 -0700)]
Add teuthology-schedule and teuthology-worker.
schedule puts jobs in a beanstalk queue, worker takes them out and runs them.
Josh Durgin [Fri, 8 Jul 2011 00:06:18 +0000 (17:06 -0700)]
Add httplib2 to setup.py.
Josh Durgin [Thu, 7 Jul 2011 23:19:26 +0000 (16:19 -0700)]
teuthology-suite: pass --lock and --block to teuthology
Josh Durgin [Thu, 7 Jul 2011 23:15:18 +0000 (16:15 -0700)]
Add --block option to retry until machines are locked.
If there are not enough machines up, fail immediately.
Josh Durgin [Thu, 7 Jul 2011 21:56:12 +0000 (14:56 -0700)]
Check more invalid argument combinations for teuthology-lock.
Josh Durgin [Thu, 7 Jul 2011 19:16:45 +0000 (12:16 -0700)]
Remove locking from TODO.
Josh Durgin [Thu, 7 Jul 2011 19:16:10 +0000 (12:16 -0700)]
Update readme for locking.
Josh Durgin [Thu, 7 Jul 2011 18:43:35 +0000 (11:43 -0700)]
Read lock server from ~/teuthology.yaml.
Josh Durgin [Wed, 6 Jul 2011 22:55:17 +0000 (15:55 -0700)]
Verify that machines are locked before nuking them.
Josh Durgin [Wed, 6 Jul 2011 21:22:43 +0000 (14:22 -0700)]
Check that all machines are locked, and add an option to lock machines instead of providing targets.
Josh Durgin [Sat, 2 Jul 2011 01:18:03 +0000 (18:18 -0700)]
Add command line tool for locking machines.
Josh Durgin [Sat, 2 Jul 2011 01:15:52 +0000 (18:15 -0700)]
Move username to a utility method.
Josh Durgin [Wed, 6 Jul 2011 00:16:08 +0000 (17:16 -0700)]
Add simple lock server HTTP interface.
Greg Farnum [Wed, 6 Jul 2011 23:44:46 +0000 (16:44 -0700)]
task ceph: set_max_mds so multiple MDS nodes are used
The current check will be insufficient when we handle standby-replays,
standbys, etc, but it's a lot better than the current situation where
it starts up all the daemons but only one is active!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 6 Jul 2011 22:28:52 +0000 (15:28 -0700)]
workunits task: clean up properly if there's an error.
Previously it would fail out and leave the workunits directory, causing
final cleanup to fail.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Wed, 6 Jul 2011 21:17:24 +0000 (14:17 -0700)]
Skip s3-tests marked fails_on_rgw, they will fail anyway.