]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Tommi Virtanen [Tue, 22 Nov 2011 00:00:19 +0000 (16:00 -0800)]
Properly handle case where first error is inside a context manager __exit__.
Closes: http://tracker.newdream.net/issues/1743
Sage Weil [Sun, 20 Nov 2011 04:56:26 +0000 (20:56 -0800)]
nuke: don't specify full path
/tmp/cephtest/binary may have been removed; kill stray daemons by name
only. we really don't care about false positives here!
Sage Weil [Thu, 17 Nov 2011 21:52:17 +0000 (13:52 -0800)]
ceph_manager: %
Josh Durgin [Fri, 18 Nov 2011 21:53:51 +0000 (13:53 -0800)]
Save summary after nuking machines.
This way you can tell when tests are entirely finished running.
Josh Durgin [Fri, 18 Nov 2011 20:22:18 +0000 (12:22 -0800)]
Add an example overrides file for running regression tests.
Josh Durgin [Fri, 18 Nov 2011 01:26:21 +0000 (17:26 -0800)]
suite: put common config before facets
This lets you add tasks to the beginning of a run, like the chef task.
Josh Durgin [Fri, 18 Nov 2011 01:14:05 +0000 (17:14 -0800)]
suite: schedule a list of collections for running instead of a single suite directory
Yehuda Sadeh [Fri, 18 Nov 2011 00:53:21 +0000 (16:53 -0800)]
testswift: fix config
Tommi Virtanen [Fri, 18 Nov 2011 01:00:44 +0000 (17:00 -0800)]
Clean up C++isms.
Tommi Virtanen [Fri, 18 Nov 2011 00:49:47 +0000 (16:49 -0800)]
Add a task for easily running chef-solo on all the nodes.
Sage Weil [Thu, 17 Nov 2011 21:46:02 +0000 (13:46 -0800)]
ceph_manager: fix logging
Josh Durgin [Thu, 17 Nov 2011 21:07:03 +0000 (13:07 -0800)]
ceph: deep merge overrides, so e.g. log whitelists can be overridden
Josh Durgin [Thu, 17 Nov 2011 21:06:36 +0000 (13:06 -0800)]
misc: move deep_merge out of the MergeConfig class - it's generic
Josh Durgin [Thu, 17 Nov 2011 19:57:07 +0000 (11:57 -0800)]
Save config after locking nodes, so targets are included.
Josh Durgin [Thu, 17 Nov 2011 19:18:24 +0000 (11:18 -0800)]
filestore_idempotent: remove unused import
Josh Durgin [Thu, 17 Nov 2011 19:15:47 +0000 (11:15 -0800)]
mon_recovery: remove unused code and import
Josh Durgin [Thu, 17 Nov 2011 19:11:33 +0000 (11:11 -0800)]
thrashosds: timeout for every clean check, not just the last one
Josh Durgin [Thu, 17 Nov 2011 19:05:12 +0000 (11:05 -0800)]
ceph_manager: add a default timeout of 5 minutes for mon quorum
Josh Durgin [Thu, 17 Nov 2011 18:45:19 +0000 (10:45 -0800)]
ceph_manager: log mon quorum status so the logs show progress (or lack thereof)
Yehuda Sadeh [Thu, 17 Nov 2011 00:00:01 +0000 (16:00 -0800)]
rgw: add swift task
still not completely working (for some reason it skips all the tests)
Sage Weil [Fri, 11 Nov 2011 05:35:11 +0000 (21:35 -0800)]
filestore_idempotent.py: simple task to test non-idempotent osd ops
Write some non-idempotent events to the osd. Simulate a failure. Verify
the result is correct on replay.
This must be preceeded by the ceph task just so that we get the binaries
installed. Should clean this up later if/when the installation gets
factored out of ceph.py.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 10 Nov 2011 22:13:24 +0000 (14:13 -0800)]
misc: allow >1 monitor per role in get_mon_names()
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 9 Nov 2011 21:37:02 +0000 (13:37 -0800)]
add hammer.sh
simple script to repeat a test until it fails. can probably do something much more sophisticated
here, but this works.
Josh Durgin [Wed, 9 Nov 2011 18:39:56 +0000 (10:39 -0800)]
nuke: increase reboot timeout
Some sepia nodes are very slow to reboot.
Sage Weil [Wed, 9 Nov 2011 06:06:43 +0000 (22:06 -0800)]
mon_recovery: add task to test monitor cluster failure recovery
Some simple tests to start with. We still need some sort of mon cluster
thrashing.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 9 Nov 2011 06:02:58 +0000 (22:02 -0800)]
ceph_manager: manipulate monitors
Sage Weil [Wed, 9 Nov 2011 06:00:32 +0000 (22:00 -0800)]
ceph: keep ceph.conf at ctx.ceph.conf
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Wed, 9 Nov 2011 00:06:33 +0000 (16:06 -0800)]
Remove unused imports and variable.
Josh Durgin [Wed, 9 Nov 2011 00:01:39 +0000 (16:01 -0800)]
Add nuke-on-error option.
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down machine won't
keep others from being nuked and unlocked.
Tommi Virtanen [Mon, 7 Nov 2011 21:05:14 +0000 (13:05 -0800)]
Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12b6bd1818e155d1d8ed5134dbd8fd4a ,
no idea how it stayed hidden this long.
Josh Durgin [Thu, 3 Nov 2011 20:27:44 +0000 (13:27 -0700)]
ceph_manager: log ceph -s output so progress is visible in the logs
Josh Durgin [Thu, 3 Nov 2011 20:08:39 +0000 (13:08 -0700)]
Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
Josh Durgin [Thu, 3 Nov 2011 20:07:21 +0000 (13:07 -0700)]
connection: allow the caller to specify whether keep-alive should be used
Josh Durgin [Thu, 3 Nov 2011 18:26:45 +0000 (11:26 -0700)]
locker: fix race in locking
The isolation level is lower than I thought. This made it possible for
two clients to think they both locked the same machines, since the
update would still be modifying each row to change the locked_since
time.
Samuel Just [Wed, 2 Nov 2011 18:33:37 +0000 (11:33 -0700)]
testrados: set CEPH_CLIENT_ID without a ;
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Mon, 31 Oct 2011 21:26:41 +0000 (14:26 -0700)]
testrados: specify CEPH_CONF directly
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Yehuda Sadeh [Thu, 27 Oct 2011 19:11:28 +0000 (12:11 -0700)]
rgw: add user suspend/enable test
Yehuda Sadeh [Thu, 27 Oct 2011 18:32:12 +0000 (11:32 -0700)]
rgw: log-to-stderr is now a binary flag
Samuel Just [Mon, 24 Oct 2011 21:23:48 +0000 (14:23 -0700)]
testrados: rename testsnaps to testrados and make snap testing optional
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Mon, 24 Oct 2011 20:52:29 +0000 (13:52 -0700)]
workunit: set PYTHONPATH so we can test python bindings
Sage Weil [Sun, 23 Oct 2011 17:30:27 +0000 (10:30 -0700)]
ceph.conf: python parser doens't like ; comments
Sage Weil [Sun, 23 Oct 2011 05:16:39 +0000 (22:16 -0700)]
ceph.conf: more frequent osd scrubbing; remove old cruft
Sage Weil [Wed, 19 Oct 2011 17:04:07 +0000 (10:04 -0700)]
ceph_manager: count active+clean+<somjething else> as active+clean
In my case, one pg was active+clean+scrubbing.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Thu, 20 Oct 2011 23:28:29 +0000 (16:28 -0700)]
coverage: don't remove ceph tarball
We want to keep it for examining core files, and we're already
fetching it here, once per suite run.
Sage Weil [Mon, 17 Oct 2011 22:32:22 +0000 (15:32 -0700)]
add lost_unfound task
Also some misc useful bits to ceph_manager.
Josh Durgin [Mon, 17 Oct 2011 21:42:03 +0000 (14:42 -0700)]
ceph: add whitelist for cluster log errors
Some messages are expected when thrashing osds or creating unfound
objects.
Fixes: #1622
Josh Durgin [Mon, 17 Oct 2011 17:40:16 +0000 (10:40 -0700)]
nuke: reset syslog configuration after rebooting
Previously we removed a file and rebooted without syncing, so the file
was never deleted.
Yehuda Sadeh [Wed, 12 Oct 2011 22:37:33 +0000 (15:37 -0700)]
radosgw-admin: test swift keys creation/removal
Josh Durgin [Fri, 7 Oct 2011 21:51:46 +0000 (14:51 -0700)]
teuthology-worker: remove --keep-locked-on-error
Josh Durgin [Fri, 7 Oct 2011 21:45:01 +0000 (14:45 -0700)]
Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
Josh Durgin [Fri, 7 Oct 2011 00:18:35 +0000 (17:18 -0700)]
reconnect: ignore SSHExceptions before the timeout expires
Fixes: #1587
Samuel Just [Thu, 6 Oct 2011 20:33:17 +0000 (13:33 -0700)]
task/watch_notify_stress: watch_notify_stress now thrashes clients
This should exercise the watch notify timeout code.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 5 Oct 2011 22:54:57 +0000 (15:54 -0700)]
rgw: keep radosgw in foreground
It defaults to a daemon now.
Josh Durgin [Wed, 5 Oct 2011 00:19:56 +0000 (17:19 -0700)]
Retry listing machines if the lock server goes down.
Sage Weil [Tue, 4 Oct 2011 23:09:32 +0000 (16:09 -0700)]
rgw: use normal logging mechanism
Keep capturing stdout/err, even though it should end up empty.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Tue, 4 Oct 2011 19:32:58 +0000 (12:32 -0700)]
teuthology-worker: clean up last_in_suite jobs
There's no reason not to delete them once they start.
Josh Durgin [Tue, 4 Oct 2011 19:16:30 +0000 (12:16 -0700)]
daemon-helper: detect the signal actually sent
I thought I fixed this when I implemented coverage collection, but I
guess it got lost in a rebase or something.
Josh Durgin [Tue, 4 Oct 2011 00:49:53 +0000 (17:49 -0700)]
ceph_manager: remove unused raw_pg_status method
Josh Durgin [Tue, 4 Oct 2011 00:49:13 +0000 (17:49 -0700)]
ceph_manager: run ceph -s as a normal program
This allows failures from it to be detected better.
Josh Durgin [Tue, 4 Oct 2011 00:05:33 +0000 (17:05 -0700)]
teuthology-results: include passed tests in email
Josh Durgin [Tue, 4 Oct 2011 00:00:45 +0000 (17:00 -0700)]
teuthology-results: include reasons for failure in email
Josh Durgin [Mon, 3 Oct 2011 23:32:42 +0000 (16:32 -0700)]
teuthology-ls: show reasons for failures with -v
Josh Durgin [Mon, 3 Oct 2011 23:08:49 +0000 (16:08 -0700)]
Add failure_reason to summary for the first failure detected.
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
Josh Durgin [Mon, 3 Oct 2011 23:41:17 +0000 (16:41 -0700)]
radosbench: get coverage and cores
Samuel Just [Mon, 3 Oct 2011 21:04:53 +0000 (14:04 -0700)]
watch_notify_stress.py: add ceph flags option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Mon, 3 Oct 2011 21:03:36 +0000 (14:03 -0700)]
ceph.py: add btrfs option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Mon, 3 Oct 2011 16:55:58 +0000 (09:55 -0700)]
nuke: keep up with renaming cfuse -> ceph-fuse
Sage Weil [Fri, 30 Sep 2011 16:12:45 +0000 (09:12 -0700)]
radosgw-admin: test additional keys, log list/show/rm
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 29 Sep 2011 05:20:38 +0000 (22:20 -0700)]
tasks/radosgw-admin: test radosgw-admin tool
Not yet complete...
Sage Weil [Thu, 29 Sep 2011 03:50:24 +0000 (20:50 -0700)]
nuke: killall apache2 and radosgw too
Greg Farnum [Fri, 30 Sep 2011 16:26:42 +0000 (09:26 -0700)]
s3-tests: use radosgw-admin instead of radosgw_admin
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Thu, 29 Sep 2011 16:09:31 +0000 (09:09 -0700)]
ceph_manager: parse osd numbers with dots
This is necessary since wip-dot-names was merged.
Sage Weil [Fri, 23 Sep 2011 15:57:18 +0000 (08:57 -0700)]
rename c* -> ceph-*
Leave cfuse task name unchanged for now...
Josh Durgin [Fri, 23 Sep 2011 01:23:36 +0000 (18:23 -0700)]
queue: results_timeout needs to be converted to a string
Samuel Just [Thu, 22 Sep 2011 20:23:05 +0000 (13:23 -0700)]
task/watch_notify_stress.py: add simple watch_notify stress test
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Wed, 21 Sep 2011 18:05:18 +0000 (11:05 -0700)]
schedule: put results timeout in the job
The default was always being used instead.
Greg Farnum [Tue, 20 Sep 2011 17:04:01 +0000 (10:04 -0700)]
lockfile: increase interval to prevent incorrect locking orders
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 15 Sep 2011 16:24:52 +0000 (09:24 -0700)]
lockfile: don't fail cleanup if no lock procs exist
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:32:15 +0000 (11:32 -0700)]
workunit: Fetch source from github.
Needed an elaborate dance because Github won't let us download
an archive of a subdirectory.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:09:45 +0000 (11:09 -0700)]
s3tests: Clone repository from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:08:38 +0000 (11:08 -0700)]
coverage: Fetch source from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Samuel Just [Fri, 16 Sep 2011 00:26:03 +0000 (17:26 -0700)]
ceph.py: remove unused variables mds_daemons and mon_daemons
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:31:58 +0000 (16:31 -0700)]
ceph.py/cephmanager.py: add ctx.daemons for restarting daemons
ctx.daemons will now be an instance of CephState.
ctx.daemons.get_daemon(role, id).stop() to stop daemon, retart() to
restart the daemon, etc.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:28:06 +0000 (16:28 -0700)]
testsnaps: LD_PRELOAD needed for librados
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Tommi Virtanen [Tue, 13 Sep 2011 21:53:02 +0000 (14:53 -0700)]
Move orchestra to teuthology.orchestra so there's just one top-level package.
Tommi Virtanen [Tue, 13 Sep 2011 21:10:12 +0000 (14:10 -0700)]
Merge orchestra into teuthology.
There are too many things called Orchestra out there,
including Ubuntu's new multi-machine service orchestration
framework. The code might still be beneficial outside of
teuthology, but it can be spun off at that time.
Conflicts:
bootstrap
requirements.txt
setup.py
Tommi Virtanen [Fri, 9 Sep 2011 20:22:03 +0000 (13:22 -0700)]
Callers of task s3tests.create_users don't need to provide dummy "fixtures" dict.
Josh Durgin [Fri, 9 Sep 2011 17:31:08 +0000 (10:31 -0700)]
thrashosds: fix timeout when no options are specified
Josh Durgin [Fri, 9 Sep 2011 01:09:11 +0000 (18:09 -0700)]
thrashosds: fail if cluster doesn't finally become clean in 5 minutes
Josh Durgin [Thu, 8 Sep 2011 21:09:13 +0000 (14:09 -0700)]
thrasher: get coverage and cores from calling ceph commands
Josh Durgin [Thu, 8 Sep 2011 21:07:23 +0000 (14:07 -0700)]
thrashosds: wait for every pg to go active and clean before exiting
Josh Durgin [Thu, 8 Sep 2011 19:54:23 +0000 (12:54 -0700)]
thrasher: clean up a bit
Josh Durgin [Thu, 8 Sep 2011 00:50:12 +0000 (17:50 -0700)]
autotest: allow tests to be run on all clients
Josh Durgin [Wed, 7 Sep 2011 23:54:24 +0000 (16:54 -0700)]
rbd: allow specifying all clients
Greg Farnum [Tue, 6 Sep 2011 18:29:04 +0000 (11:29 -0700)]
locktest: don't fail cleanup if the dir doesn't exist
We're doing this the cheapest way possible: make the dir!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Sat, 3 Sep 2011 22:07:21 +0000 (15:07 -0700)]
teuthology: do a deep merge of input yaml fragments
Concatenate lists, and recursively combine dicts.
If you specify inputs like
foo:
- a
- b
and
foo:
- c
you should get
foo:
- a
- b
- c
Dicts should also be merged (last one wins), and the merging is deep. E.g.
foo:
a:
b:
c: 1
and
foo:
a:
b:
c: 2
is
foo:
a:
b:
c: 2
Fixes: #1497
Josh Durgin [Sat, 3 Sep 2011 02:12:16 +0000 (19:12 -0700)]
lock: default to only listing machines you have locked
--all removes this restriction
Josh Durgin [Sat, 3 Sep 2011 00:58:19 +0000 (17:58 -0700)]
rgw: run as an external fastcgi server to match dho
Sage Weil [Fri, 2 Sep 2011 18:07:10 +0000 (11:07 -0700)]
don't eat exceptions for breakfast
fixes
0c2bee1514c1b1e65ca5d52459062e5a45da2d7b
Greg Farnum [Wed, 31 Aug 2011 21:40:55 +0000 (14:40 -0700)]
locktest: make it actually run the executable test
This was missing an argument (the file to run on!) and apparently
that didn't cause the command to output a failure return code.
Additionally, the ceph wrappers were blocking a crash and falsely
reporting success back to teuthology. (Yikes!)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>