]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Josh Durgin [Thu, 1 Sep 2011 17:44:46 +0000 (10:44 -0700)]
nuke: remove unused import
Josh Durgin [Thu, 1 Sep 2011 17:33:20 +0000 (10:33 -0700)]
nuke: localize again imports so they occur after gevent monkey-patching
This is necessary to make ssh work properly.
Josh Durgin [Thu, 1 Sep 2011 02:46:10 +0000 (19:46 -0700)]
nuke: reboot if rbd is mounted
Josh Durgin [Thu, 1 Sep 2011 00:43:14 +0000 (17:43 -0700)]
schedule: add a way to delete jobs from the queue
Josh Durgin [Thu, 1 Sep 2011 00:13:06 +0000 (17:13 -0700)]
parallel: don't hang if no tasks were spawned
This makes
6d919152178cfbd69dc5d50cdab40fc99db166a6 work.
Josh Durgin [Wed, 31 Aug 2011 23:48:58 +0000 (16:48 -0700)]
workunits: remove unused variable
Josh Durgin [Wed, 31 Aug 2011 21:36:32 +0000 (14:36 -0700)]
nuke: add option to reboot all nodes
Josh Durgin [Wed, 31 Aug 2011 21:36:01 +0000 (14:36 -0700)]
Fix pyflakes warnings.
Josh Durgin [Wed, 31 Aug 2011 00:21:36 +0000 (17:21 -0700)]
coverage: remove debugging
Josh Durgin [Wed, 31 Aug 2011 00:12:14 +0000 (17:12 -0700)]
workunit: save coverage and coredumps
Anything that runs a ceph utility should be using these commands.
Greg Farnum [Tue, 30 Aug 2011 22:48:58 +0000 (15:48 -0700)]
workunits: rework a little bit to allow "all" clients in a run
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 21:07:11 +0000 (14:07 -0700)]
cfuse: support running through valgrind
Also switch up the config code so we can take per-client options.
Greg Farnum [Mon, 29 Aug 2011 23:47:22 +0000 (16:47 -0700)]
valgrind: don't run valgrind_post if there's no valgrind
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 20:58:09 +0000 (13:58 -0700)]
valgrind: scan logs for bad results
It's not sophisticated but it will warn you about a node
if at least one node has issues.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 19:39:38 +0000 (12:39 -0700)]
valgrind: use xml output for tools that support it
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Mon, 29 Aug 2011 19:42:45 +0000 (12:42 -0700)]
suite: add option to send an email if the entire suite passed
Josh Durgin [Fri, 26 Aug 2011 00:11:33 +0000 (17:11 -0700)]
Generate coverage at the end of a suite run,
and optionally email failures and ongoing jobs.
Josh Durgin [Fri, 26 Aug 2011 00:09:03 +0000 (17:09 -0700)]
queue: delete every job when it finishes, so only running jobs are buried
Josh Durgin [Thu, 4 Aug 2011 01:08:14 +0000 (18:08 -0700)]
Add teuthology-coverage for analyzing test coverage for a suite run.
Josh Durgin [Tue, 14 Jun 2011 18:57:29 +0000 (11:57 -0700)]
Add scripts to analyze coverage for a single teuthology run.
Greg Farnum [Thu, 25 Aug 2011 22:27:30 +0000 (15:27 -0700)]
thrasher: improve documentation a little
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:19:30 +0000 (15:19 -0700)]
thrasher: add option to mark OSDs down instead of out.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:18:42 +0000 (15:18 -0700)]
thrasher: allow a config to set values
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 21:38:34 +0000 (14:38 -0700)]
thrasher: remove redundant wait_till_clean()
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 24 Aug 2011 23:48:14 +0000 (16:48 -0700)]
coverage: create dir conditionally
We don't need to create the dir if we aren't using coverage.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 21:44:39 +0000 (14:44 -0700)]
lockfile: add a lockfile task
This allows pretty highly configurable testing of
fcntl locking via a teuthology task.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 17:03:43 +0000 (10:03 -0700)]
lock: --list-targets: list locks and dump result in targets: yaml format.
Sage Weil [Wed, 24 Aug 2011 04:00:26 +0000 (21:00 -0700)]
check ceph cluster log for badness (ERR, WRN, SEC)
Sage Weil [Tue, 23 Aug 2011 05:04:57 +0000 (22:04 -0700)]
ceph: copy cluster log file to archive/ceph.log
Sage Weil [Mon, 22 Aug 2011 00:26:15 +0000 (17:26 -0700)]
workunits: set CEPH_CONF environment
This allows any ceph util we run (including the rados-api tests) find
the config and keyrings they need.
Sage Weil [Sun, 21 Aug 2011 22:14:02 +0000 (15:14 -0700)]
rbd: make default image 10G instead of 1G
Sage Weil [Wed, 10 Aug 2011 20:34:38 +0000 (13:34 -0700)]
suite: support a suite consisting of multiple collections
suite = many collections, and maybe some shared files
collection = a collection of facets
facet = a config fragment
Greg Farnum [Wed, 17 Aug 2011 17:35:37 +0000 (10:35 -0700)]
valgrind: Document!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:32:57 +0000 (10:32 -0700)]
Merge branch 'wip-valgrind'
Greg Farnum [Wed, 17 Aug 2011 17:06:58 +0000 (10:06 -0700)]
include log in valgrind log file names
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:05:13 +0000 (10:05 -0700)]
ceph task: split up arguments a little more
This allows selective daemon kill signal changes. With valgrind
daemons we want term instead of kill, for instance.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:04:31 +0000 (10:04 -0700)]
valgrind: move valgrind logs to log dir
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 15 Aug 2011 22:35:42 +0000 (15:35 -0700)]
ceph: split up daemon-running arguments and insert valgrind ones
This setup should let us insert other kinds of things too, if we
need them.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 15 Aug 2011 22:32:23 +0000 (15:32 -0700)]
ceph: Set up valgrind as a flavor, and create a dir for logging.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 15 Aug 2011 22:31:18 +0000 (15:31 -0700)]
ceph task: pass the full config to the daemon startup subs
So far as I can tell there is no reason to reduce them to
the coverage config, and I want the full config for my
soon-to-exist valgrind options.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Mon, 15 Aug 2011 16:36:06 +0000 (09:36 -0700)]
Add assert to catch simple typos in roles list.
Input of "roles:\n- [mds,1]" used to make teuthology crash
in a non-obviou way.
Greg Farnum [Wed, 10 Aug 2011 23:16:11 +0000 (16:16 -0700)]
Merge branch 'wip-nuke'
Conflicts:
teuthology/task/kernel.py
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code
This accidentally got left in from my development.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 23:06:45 +0000 (16:06 -0700)]
teuthology-nuke: split the big main function
It was getting a bit big, but now all the functions fit on
one screen each.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 22:38:57 +0000 (15:38 -0700)]
teuthology-nuke: move it into its own file.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 21:19:23 +0000 (14:19 -0700)]
teuthology-nuke: identify and reboot machines with kernel mounts
This includes untested code for just force-unmounting them
when that works again, but for now it does a full reboot-and-
reconnect cycle.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:55:02 +0000 (10:55 -0700)]
teuthology-nuke: use a more robust cfuse mount finder
This way it can remove cfuse mounts in any location on
the system.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:47:50 +0000 (10:47 -0700)]
teuthology-nuke: split out different pieces into different loops
This will let us behave more intelligently on things like
nuking kernel mounts.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 10 Aug 2011 17:37:04 +0000 (10:37 -0700)]
Move reconnect function from kernel task to misc.py
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Wed, 10 Aug 2011 20:40:00 +0000 (13:40 -0700)]
Configure grub to default to the right kernel, not the greatest installed one.
This is sticky; that is, even if you install other kernels (manually/via fab/etc),
grub will keep booting up the one that was last enabled via teuthology config.
Use teuthology to switch kernels and it'll just work.
If the kernel the grub default points to is removed, grub will fall back to
booting the kernel with the greatest version number.
Closes: http://tracker.newdream.net/issues/1364
Tommi Virtanen [Wed, 10 Aug 2011 20:22:14 +0000 (13:22 -0700)]
Handle socket.timeout when waiting for a reconnect.
Now it gets ignored, just like the other harmless socket errors.
Tommi Virtanen [Wed, 10 Aug 2011 20:21:39 +0000 (13:21 -0700)]
Wait up to 300 seconds for a reboot.
At least sepia86 was reliably slower than the previous 180 second default.
Sage Weil [Wed, 10 Aug 2011 19:47:20 +0000 (12:47 -0700)]
ceph: fix max_mds calculation
Signed-off-by: Sage Weil <sage@newdream.net>
Greg Farnum [Wed, 10 Aug 2011 00:17:08 +0000 (17:17 -0700)]
kernel: comment reconnect task, clean up reporting
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Tue, 9 Aug 2011 20:30:47 +0000 (13:30 -0700)]
manypools: remove commented-out code
This accidentally got left in from my development.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Tue, 9 Aug 2011 23:25:00 +0000 (16:25 -0700)]
Make rbd task use mnt.N not mnt.client.N as mountpoint.
Everything else expects this, so e.g. workunits wouldn't work with rbd.
Tommi Virtanen [Tue, 9 Aug 2011 23:11:32 +0000 (16:11 -0700)]
Make sure workunit task does not create mnt.N by itself.
This used to hide a bug in the rbd task, where rbd
created the mountpoint with the wrong name. The workunits
ended up running against the local filesystem.
Tommi Virtanen [Tue, 9 Aug 2011 22:42:17 +0000 (15:42 -0700)]
Add interactive-on-error, to pause and explore on error.
Closes: http://tracker.newdream.net/issues/1291
Stephon Striplin [Tue, 9 Aug 2011 20:43:46 +0000 (13:43 -0700)]
allow s3tests.create_users defaults be overridden
Tommi Virtanen [Tue, 9 Aug 2011 20:40:56 +0000 (13:40 -0700)]
Add simple unit test for get_clients.
Sage Weil [Tue, 9 Aug 2011 20:23:58 +0000 (13:23 -0700)]
Revert "fix get_clients"
This reverts commit
83b6678e79904793bf31e82bbecad7bf16c1b2b5 . The bug I was
hitting was actually fxied by
06e3e69c293b20c0ce5df526fa923a979c1d8cfc .
Gregory Farnum [Mon, 1 Aug 2011 20:19:15 +0000 (13:19 -0700)]
teuthology: add task manypools
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Fri, 5 Aug 2011 21:35:22 +0000 (14:35 -0700)]
new gitbuilder ref/branch naming
no origin_ prefix
Sage Weil [Thu, 4 Aug 2011 22:03:05 +0000 (15:03 -0700)]
cfuse, kclient: print remote host
Sage Weil [Thu, 4 Aug 2011 22:01:49 +0000 (15:01 -0700)]
fix get_clients
Only return the clients that are listed (not _all_ clients). There might
be a combination of cfuse and kclient (or other) clients here!
Sage Weil [Thu, 4 Aug 2011 17:41:50 +0000 (10:41 -0700)]
tasks/kclient: don't clobber remote
Sage Weil [Thu, 28 Jul 2011 17:28:57 +0000 (10:28 -0700)]
use coverage_dir
Josh Durgin [Fri, 5 Aug 2011 18:17:28 +0000 (11:17 -0700)]
kernel: install in parallel
Josh Durgin [Fri, 5 Aug 2011 18:08:02 +0000 (11:08 -0700)]
kernel: debug weird socket exceptions
Josh Durgin [Fri, 5 Aug 2011 18:07:40 +0000 (11:07 -0700)]
kernel: reboot immediately after installing
This hides the latency of rebooting when installing on many machines.
Josh Durgin [Fri, 5 Aug 2011 17:59:16 +0000 (10:59 -0700)]
Down machines shouldn't be considered free.
Josh Durgin [Fri, 5 Aug 2011 01:32:57 +0000 (18:32 -0700)]
Make scheduled tasks leave some machines free.
Josh Durgin [Thu, 4 Aug 2011 22:19:13 +0000 (15:19 -0700)]
Log connections to targets
This way you can tell which machines have problems in case of an
error.
Josh Durgin [Wed, 3 Aug 2011 22:28:46 +0000 (15:28 -0700)]
teuthology-worker: log to a file with timestamps
Josh Durgin [Wed, 3 Aug 2011 21:52:55 +0000 (14:52 -0700)]
teuthology-nuke: run in parallel, and print each node being nuked
Josh Durgin [Wed, 3 Aug 2011 20:59:57 +0000 (13:59 -0700)]
Set success at the beginning of a run.
This way internal tasks like locking can tell whether the run
succeeded, and unlock nodes if it did.
Josh Durgin [Wed, 3 Aug 2011 18:21:32 +0000 (11:21 -0700)]
teuthology-nuke: reset rsyslog config
Josh Durgin [Wed, 3 Aug 2011 00:56:49 +0000 (17:56 -0700)]
teuthology-worker: keep machines locked on error
This prevents a failure to clean up in one case from affecting the
rest of the tests.
Josh Durgin [Tue, 2 Aug 2011 23:13:28 +0000 (16:13 -0700)]
teuthology-lock: update usage
Josh Durgin [Tue, 2 Aug 2011 22:53:37 +0000 (15:53 -0700)]
teuthology-lock: allow list of locks to be filtered by owner and status
Greg Farnum [Fri, 29 Jul 2011 17:35:02 +0000 (10:35 -0700)]
teuthology: convert from bzip2 to gzip.
gzip is much, much faster on large log files. With a 7.7GB client log, gzip
took 2:45 to compress it to 624MB. bzip2 took 34:38 to compress it to
366MB. For our purposes the space savings are not worth the time loss.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Thu, 28 Jul 2011 17:25:30 +0000 (10:25 -0700)]
set max_mds based on non-standbys
Sage Weil [Wed, 27 Jul 2011 18:45:20 +0000 (11:45 -0700)]
no ++ in python
Sage Weil [Wed, 27 Jul 2011 18:45:13 +0000 (11:45 -0700)]
roles/3-simple: include a standby mds
Sage Weil [Wed, 27 Jul 2011 17:04:37 +0000 (10:04 -0700)]
configure mds's with -s suffix as standby
Sage Weil [Wed, 27 Jul 2011 05:06:49 +0000 (22:06 -0700)]
roles: use letters for mon, mds names
Sage Weil [Wed, 27 Jul 2011 04:46:47 +0000 (21:46 -0700)]
tolerate named (not numbered) mons
Sage Weil [Wed, 27 Jul 2011 04:52:39 +0000 (21:52 -0700)]
specify and clean up admin socket
Josh Durgin [Wed, 20 Jul 2011 01:37:05 +0000 (18:37 -0700)]
lock server: configure for apache with mod_wsgi
Josh Durgin [Wed, 20 Jul 2011 01:34:42 +0000 (18:34 -0700)]
Set content-type with PUT.
Josh Durgin [Wed, 20 Jul 2011 00:24:49 +0000 (17:24 -0700)]
schedule: make default owner different from that of a normal run
This way the machines locked by scheduled jobs aren't confused
with those locked by manual runs, so they're harder to accidentally
unlock.
Josh Durgin [Wed, 20 Jul 2011 00:11:12 +0000 (17:11 -0700)]
Update example targets in readme.
Josh Durgin [Tue, 19 Jul 2011 23:24:50 +0000 (16:24 -0700)]
Remove print that clutters the worker logs.
Josh Durgin [Fri, 15 Jul 2011 22:04:08 +0000 (15:04 -0700)]
Connect without using any known_hosts files.
Josh Durgin [Thu, 14 Jul 2011 23:47:29 +0000 (16:47 -0700)]
Make targets a dictionary mapping hosts to ssh host keys.
Josh Durgin [Thu, 14 Jul 2011 00:14:52 +0000 (17:14 -0700)]
Add command to update ssh hostkeys.
Josh Durgin [Thu, 14 Jul 2011 22:26:49 +0000 (15:26 -0700)]
lock server: return host pubkeys with locked machine names
Josh Durgin [Thu, 14 Jul 2011 22:10:50 +0000 (15:10 -0700)]
lock server: allow sshpubkey to be updated
Josh Durgin [Fri, 15 Jul 2011 21:59:33 +0000 (14:59 -0700)]
Update lock db schema.
Josh Durgin [Sat, 16 Jul 2011 00:15:09 +0000 (17:15 -0700)]
Add an overrides section for the ceph task.
This lets you run a suite against a particular version of ceph, or
with special debug settings.