]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Josh Durgin [Wed, 9 Nov 2011 00:01:39 +0000 (16:01 -0800)]
Add nuke-on-error option.
This lets automated jobs nuke and unlock machines after failed
tests. Each machine is nuke individually, so one down machine won't
keep others from being nuked and unlocked.
Tommi Virtanen [Mon, 7 Nov 2011 21:05:14 +0000 (13:05 -0800)]
Fix leftover orchestra import clause.
This seems to be a leftover from
a2372fce12b6bd1818e155d1d8ed5134dbd8fd4a ,
no idea how it stayed hidden this long.
Josh Durgin [Thu, 3 Nov 2011 20:27:44 +0000 (13:27 -0700)]
ceph_manager: log ceph -s output so progress is visible in the logs
Josh Durgin [Thu, 3 Nov 2011 20:08:39 +0000 (13:08 -0700)]
Keep each ssh connection alive.
With long-running jobs like thrashing, ssh connections were timing
out.
Josh Durgin [Thu, 3 Nov 2011 20:07:21 +0000 (13:07 -0700)]
connection: allow the caller to specify whether keep-alive should be used
Josh Durgin [Thu, 3 Nov 2011 18:26:45 +0000 (11:26 -0700)]
locker: fix race in locking
The isolation level is lower than I thought. This made it possible for
two clients to think they both locked the same machines, since the
update would still be modifying each row to change the locked_since
time.
Samuel Just [Wed, 2 Nov 2011 18:33:37 +0000 (11:33 -0700)]
testrados: set CEPH_CLIENT_ID without a ;
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Mon, 31 Oct 2011 21:26:41 +0000 (14:26 -0700)]
testrados: specify CEPH_CONF directly
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Yehuda Sadeh [Thu, 27 Oct 2011 19:11:28 +0000 (12:11 -0700)]
rgw: add user suspend/enable test
Yehuda Sadeh [Thu, 27 Oct 2011 18:32:12 +0000 (11:32 -0700)]
rgw: log-to-stderr is now a binary flag
Samuel Just [Mon, 24 Oct 2011 21:23:48 +0000 (14:23 -0700)]
testrados: rename testsnaps to testrados and make snap testing optional
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Mon, 24 Oct 2011 20:52:29 +0000 (13:52 -0700)]
workunit: set PYTHONPATH so we can test python bindings
Sage Weil [Sun, 23 Oct 2011 17:30:27 +0000 (10:30 -0700)]
ceph.conf: python parser doens't like ; comments
Sage Weil [Sun, 23 Oct 2011 05:16:39 +0000 (22:16 -0700)]
ceph.conf: more frequent osd scrubbing; remove old cruft
Sage Weil [Wed, 19 Oct 2011 17:04:07 +0000 (10:04 -0700)]
ceph_manager: count active+clean+<somjething else> as active+clean
In my case, one pg was active+clean+scrubbing.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Thu, 20 Oct 2011 23:28:29 +0000 (16:28 -0700)]
coverage: don't remove ceph tarball
We want to keep it for examining core files, and we're already
fetching it here, once per suite run.
Sage Weil [Mon, 17 Oct 2011 22:32:22 +0000 (15:32 -0700)]
add lost_unfound task
Also some misc useful bits to ceph_manager.
Josh Durgin [Mon, 17 Oct 2011 21:42:03 +0000 (14:42 -0700)]
ceph: add whitelist for cluster log errors
Some messages are expected when thrashing osds or creating unfound
objects.
Fixes: #1622
Josh Durgin [Mon, 17 Oct 2011 17:40:16 +0000 (10:40 -0700)]
nuke: reset syslog configuration after rebooting
Previously we removed a file and rebooted without syncing, so the file
was never deleted.
Yehuda Sadeh [Wed, 12 Oct 2011 22:37:33 +0000 (15:37 -0700)]
radosgw-admin: test swift keys creation/removal
Josh Durgin [Fri, 7 Oct 2011 21:51:46 +0000 (14:51 -0700)]
teuthology-worker: remove --keep-locked-on-error
Josh Durgin [Fri, 7 Oct 2011 21:45:01 +0000 (14:45 -0700)]
Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
Josh Durgin [Fri, 7 Oct 2011 00:18:35 +0000 (17:18 -0700)]
reconnect: ignore SSHExceptions before the timeout expires
Fixes: #1587
Samuel Just [Thu, 6 Oct 2011 20:33:17 +0000 (13:33 -0700)]
task/watch_notify_stress: watch_notify_stress now thrashes clients
This should exercise the watch notify timeout code.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 5 Oct 2011 22:54:57 +0000 (15:54 -0700)]
rgw: keep radosgw in foreground
It defaults to a daemon now.
Josh Durgin [Wed, 5 Oct 2011 00:19:56 +0000 (17:19 -0700)]
Retry listing machines if the lock server goes down.
Sage Weil [Tue, 4 Oct 2011 23:09:32 +0000 (16:09 -0700)]
rgw: use normal logging mechanism
Keep capturing stdout/err, even though it should end up empty.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Tue, 4 Oct 2011 19:32:58 +0000 (12:32 -0700)]
teuthology-worker: clean up last_in_suite jobs
There's no reason not to delete them once they start.
Josh Durgin [Tue, 4 Oct 2011 19:16:30 +0000 (12:16 -0700)]
daemon-helper: detect the signal actually sent
I thought I fixed this when I implemented coverage collection, but I
guess it got lost in a rebase or something.
Josh Durgin [Tue, 4 Oct 2011 00:49:53 +0000 (17:49 -0700)]
ceph_manager: remove unused raw_pg_status method
Josh Durgin [Tue, 4 Oct 2011 00:49:13 +0000 (17:49 -0700)]
ceph_manager: run ceph -s as a normal program
This allows failures from it to be detected better.
Josh Durgin [Tue, 4 Oct 2011 00:05:33 +0000 (17:05 -0700)]
teuthology-results: include passed tests in email
Josh Durgin [Tue, 4 Oct 2011 00:00:45 +0000 (17:00 -0700)]
teuthology-results: include reasons for failure in email
Josh Durgin [Mon, 3 Oct 2011 23:32:42 +0000 (16:32 -0700)]
teuthology-ls: show reasons for failures with -v
Josh Durgin [Mon, 3 Oct 2011 23:08:49 +0000 (16:08 -0700)]
Add failure_reason to summary for the first failure detected.
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
Josh Durgin [Mon, 3 Oct 2011 23:41:17 +0000 (16:41 -0700)]
radosbench: get coverage and cores
Samuel Just [Mon, 3 Oct 2011 21:04:53 +0000 (14:04 -0700)]
watch_notify_stress.py: add ceph flags option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Mon, 3 Oct 2011 21:03:36 +0000 (14:03 -0700)]
ceph.py: add btrfs option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Mon, 3 Oct 2011 16:55:58 +0000 (09:55 -0700)]
nuke: keep up with renaming cfuse -> ceph-fuse
Sage Weil [Fri, 30 Sep 2011 16:12:45 +0000 (09:12 -0700)]
radosgw-admin: test additional keys, log list/show/rm
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 29 Sep 2011 05:20:38 +0000 (22:20 -0700)]
tasks/radosgw-admin: test radosgw-admin tool
Not yet complete...
Sage Weil [Thu, 29 Sep 2011 03:50:24 +0000 (20:50 -0700)]
nuke: killall apache2 and radosgw too
Greg Farnum [Fri, 30 Sep 2011 16:26:42 +0000 (09:26 -0700)]
s3-tests: use radosgw-admin instead of radosgw_admin
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Thu, 29 Sep 2011 16:09:31 +0000 (09:09 -0700)]
ceph_manager: parse osd numbers with dots
This is necessary since wip-dot-names was merged.
Sage Weil [Fri, 23 Sep 2011 15:57:18 +0000 (08:57 -0700)]
rename c* -> ceph-*
Leave cfuse task name unchanged for now...
Josh Durgin [Fri, 23 Sep 2011 01:23:36 +0000 (18:23 -0700)]
queue: results_timeout needs to be converted to a string
Samuel Just [Thu, 22 Sep 2011 20:23:05 +0000 (13:23 -0700)]
task/watch_notify_stress.py: add simple watch_notify stress test
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Wed, 21 Sep 2011 18:05:18 +0000 (11:05 -0700)]
schedule: put results timeout in the job
The default was always being used instead.
Greg Farnum [Tue, 20 Sep 2011 17:04:01 +0000 (10:04 -0700)]
lockfile: increase interval to prevent incorrect locking orders
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 15 Sep 2011 16:24:52 +0000 (09:24 -0700)]
lockfile: don't fail cleanup if no lock procs exist
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:32:15 +0000 (11:32 -0700)]
workunit: Fetch source from github.
Needed an elaborate dance because Github won't let us download
an archive of a subdirectory.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:09:45 +0000 (11:09 -0700)]
s3tests: Clone repository from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:08:38 +0000 (11:08 -0700)]
coverage: Fetch source from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Samuel Just [Fri, 16 Sep 2011 00:26:03 +0000 (17:26 -0700)]
ceph.py: remove unused variables mds_daemons and mon_daemons
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:31:58 +0000 (16:31 -0700)]
ceph.py/cephmanager.py: add ctx.daemons for restarting daemons
ctx.daemons will now be an instance of CephState.
ctx.daemons.get_daemon(role, id).stop() to stop daemon, retart() to
restart the daemon, etc.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:28:06 +0000 (16:28 -0700)]
testsnaps: LD_PRELOAD needed for librados
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Tommi Virtanen [Tue, 13 Sep 2011 21:53:02 +0000 (14:53 -0700)]
Move orchestra to teuthology.orchestra so there's just one top-level package.
Tommi Virtanen [Tue, 13 Sep 2011 21:10:12 +0000 (14:10 -0700)]
Merge orchestra into teuthology.
There are too many things called Orchestra out there,
including Ubuntu's new multi-machine service orchestration
framework. The code might still be beneficial outside of
teuthology, but it can be spun off at that time.
Conflicts:
bootstrap
requirements.txt
setup.py
Tommi Virtanen [Fri, 9 Sep 2011 20:22:03 +0000 (13:22 -0700)]
Callers of task s3tests.create_users don't need to provide dummy "fixtures" dict.
Josh Durgin [Fri, 9 Sep 2011 17:31:08 +0000 (10:31 -0700)]
thrashosds: fix timeout when no options are specified
Josh Durgin [Fri, 9 Sep 2011 01:09:11 +0000 (18:09 -0700)]
thrashosds: fail if cluster doesn't finally become clean in 5 minutes
Josh Durgin [Thu, 8 Sep 2011 21:09:13 +0000 (14:09 -0700)]
thrasher: get coverage and cores from calling ceph commands
Josh Durgin [Thu, 8 Sep 2011 21:07:23 +0000 (14:07 -0700)]
thrashosds: wait for every pg to go active and clean before exiting
Josh Durgin [Thu, 8 Sep 2011 19:54:23 +0000 (12:54 -0700)]
thrasher: clean up a bit
Josh Durgin [Thu, 8 Sep 2011 00:50:12 +0000 (17:50 -0700)]
autotest: allow tests to be run on all clients
Josh Durgin [Wed, 7 Sep 2011 23:54:24 +0000 (16:54 -0700)]
rbd: allow specifying all clients
Greg Farnum [Tue, 6 Sep 2011 18:29:04 +0000 (11:29 -0700)]
locktest: don't fail cleanup if the dir doesn't exist
We're doing this the cheapest way possible: make the dir!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Sat, 3 Sep 2011 22:07:21 +0000 (15:07 -0700)]
teuthology: do a deep merge of input yaml fragments
Concatenate lists, and recursively combine dicts.
If you specify inputs like
foo:
- a
- b
and
foo:
- c
you should get
foo:
- a
- b
- c
Dicts should also be merged (last one wins), and the merging is deep. E.g.
foo:
a:
b:
c: 1
and
foo:
a:
b:
c: 2
is
foo:
a:
b:
c: 2
Fixes: #1497
Josh Durgin [Sat, 3 Sep 2011 02:12:16 +0000 (19:12 -0700)]
lock: default to only listing machines you have locked
--all removes this restriction
Josh Durgin [Sat, 3 Sep 2011 00:58:19 +0000 (17:58 -0700)]
rgw: run as an external fastcgi server to match dho
Sage Weil [Fri, 2 Sep 2011 18:07:10 +0000 (11:07 -0700)]
don't eat exceptions for breakfast
fixes
0c2bee1514c1b1e65ca5d52459062e5a45da2d7b
Greg Farnum [Wed, 31 Aug 2011 21:40:55 +0000 (14:40 -0700)]
locktest: make it actually run the executable test
This was missing an argument (the file to run on!) and apparently
that didn't cause the command to output a failure return code.
Additionally, the ceph wrappers were blocking a crash and falsely
reporting success back to teuthology. (Yikes!)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Thu, 1 Sep 2011 22:35:27 +0000 (15:35 -0700)]
nuke: synchronize clocks after reboot, and optionally synchronize all clocks
Sage Weil [Wed, 31 Aug 2011 20:56:42 +0000 (13:56 -0700)]
thrashosds: make it work when first mon isn't mon.0
Sage Weil [Wed, 31 Aug 2011 20:21:30 +0000 (13:21 -0700)]
thrashosds: no camelcaps, add some whitespace
Josh Durgin [Thu, 1 Sep 2011 17:44:46 +0000 (10:44 -0700)]
nuke: remove unused import
Josh Durgin [Thu, 1 Sep 2011 17:33:20 +0000 (10:33 -0700)]
nuke: localize again imports so they occur after gevent monkey-patching
This is necessary to make ssh work properly.
Josh Durgin [Thu, 1 Sep 2011 02:46:10 +0000 (19:46 -0700)]
nuke: reboot if rbd is mounted
Josh Durgin [Thu, 1 Sep 2011 00:43:14 +0000 (17:43 -0700)]
schedule: add a way to delete jobs from the queue
Josh Durgin [Thu, 1 Sep 2011 00:13:06 +0000 (17:13 -0700)]
parallel: don't hang if no tasks were spawned
This makes
6d919152178cfbd69dc5d50cdab40fc99db166a6 work.
Josh Durgin [Wed, 31 Aug 2011 23:48:58 +0000 (16:48 -0700)]
workunits: remove unused variable
Josh Durgin [Wed, 31 Aug 2011 21:36:32 +0000 (14:36 -0700)]
nuke: add option to reboot all nodes
Josh Durgin [Wed, 31 Aug 2011 21:36:01 +0000 (14:36 -0700)]
Fix pyflakes warnings.
Josh Durgin [Wed, 31 Aug 2011 00:21:36 +0000 (17:21 -0700)]
coverage: remove debugging
Josh Durgin [Wed, 31 Aug 2011 00:12:14 +0000 (17:12 -0700)]
workunit: save coverage and coredumps
Anything that runs a ceph utility should be using these commands.
Greg Farnum [Tue, 30 Aug 2011 22:48:58 +0000 (15:48 -0700)]
workunits: rework a little bit to allow "all" clients in a run
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 21:07:11 +0000 (14:07 -0700)]
cfuse: support running through valgrind
Also switch up the config code so we can take per-client options.
Greg Farnum [Mon, 29 Aug 2011 23:47:22 +0000 (16:47 -0700)]
valgrind: don't run valgrind_post if there's no valgrind
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 20:58:09 +0000 (13:58 -0700)]
valgrind: scan logs for bad results
It's not sophisticated but it will warn you about a node
if at least one node has issues.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 19:39:38 +0000 (12:39 -0700)]
valgrind: use xml output for tools that support it
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Mon, 29 Aug 2011 19:42:45 +0000 (12:42 -0700)]
suite: add option to send an email if the entire suite passed
Josh Durgin [Fri, 26 Aug 2011 00:11:33 +0000 (17:11 -0700)]
Generate coverage at the end of a suite run,
and optionally email failures and ongoing jobs.
Josh Durgin [Fri, 26 Aug 2011 00:09:03 +0000 (17:09 -0700)]
queue: delete every job when it finishes, so only running jobs are buried
Josh Durgin [Thu, 4 Aug 2011 01:08:14 +0000 (18:08 -0700)]
Add teuthology-coverage for analyzing test coverage for a suite run.
Josh Durgin [Tue, 14 Jun 2011 18:57:29 +0000 (11:57 -0700)]
Add scripts to analyze coverage for a single teuthology run.
Greg Farnum [Thu, 25 Aug 2011 22:27:30 +0000 (15:27 -0700)]
thrasher: improve documentation a little
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:19:30 +0000 (15:19 -0700)]
thrasher: add option to mark OSDs down instead of out.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:18:42 +0000 (15:18 -0700)]
thrasher: allow a config to set values
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 21:38:34 +0000 (14:38 -0700)]
thrasher: remove redundant wait_till_clean()
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 24 Aug 2011 23:48:14 +0000 (16:48 -0700)]
coverage: create dir conditionally
We don't need to create the dir if we aren't using coverage.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>