]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Sun, 23 Oct 2011 17:30:27 +0000 (10:30 -0700)]
ceph.conf: python parser doens't like ; comments
Sage Weil [Sun, 23 Oct 2011 05:16:39 +0000 (22:16 -0700)]
ceph.conf: more frequent osd scrubbing; remove old cruft
Sage Weil [Wed, 19 Oct 2011 17:04:07 +0000 (10:04 -0700)]
ceph_manager: count active+clean+<somjething else> as active+clean
In my case, one pg was active+clean+scrubbing.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Thu, 20 Oct 2011 23:28:29 +0000 (16:28 -0700)]
coverage: don't remove ceph tarball
We want to keep it for examining core files, and we're already
fetching it here, once per suite run.
Sage Weil [Mon, 17 Oct 2011 22:32:22 +0000 (15:32 -0700)]
add lost_unfound task
Also some misc useful bits to ceph_manager.
Josh Durgin [Mon, 17 Oct 2011 21:42:03 +0000 (14:42 -0700)]
ceph: add whitelist for cluster log errors
Some messages are expected when thrashing osds or creating unfound
objects.
Fixes: #1622
Josh Durgin [Mon, 17 Oct 2011 17:40:16 +0000 (10:40 -0700)]
nuke: reset syslog configuration after rebooting
Previously we removed a file and rebooted without syncing, so the file
was never deleted.
Yehuda Sadeh [Wed, 12 Oct 2011 22:37:33 +0000 (15:37 -0700)]
radosgw-admin: test swift keys creation/removal
Josh Durgin [Fri, 7 Oct 2011 21:51:46 +0000 (14:51 -0700)]
teuthology-worker: remove --keep-locked-on-error
Josh Durgin [Fri, 7 Oct 2011 21:45:01 +0000 (14:45 -0700)]
Remove --keep-locked-on-error, and behave as if it were specified
This will help prevent machines with cephtest dirs still present from
being used. It's easy to unlock machines - the targets yaml fragment
is output during a run.
Josh Durgin [Fri, 7 Oct 2011 00:18:35 +0000 (17:18 -0700)]
reconnect: ignore SSHExceptions before the timeout expires
Fixes: #1587
Samuel Just [Thu, 6 Oct 2011 20:33:17 +0000 (13:33 -0700)]
task/watch_notify_stress: watch_notify_stress now thrashes clients
This should exercise the watch notify timeout code.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Wed, 5 Oct 2011 22:54:57 +0000 (15:54 -0700)]
rgw: keep radosgw in foreground
It defaults to a daemon now.
Josh Durgin [Wed, 5 Oct 2011 00:19:56 +0000 (17:19 -0700)]
Retry listing machines if the lock server goes down.
Sage Weil [Tue, 4 Oct 2011 23:09:32 +0000 (16:09 -0700)]
rgw: use normal logging mechanism
Keep capturing stdout/err, even though it should end up empty.
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Tue, 4 Oct 2011 19:32:58 +0000 (12:32 -0700)]
teuthology-worker: clean up last_in_suite jobs
There's no reason not to delete them once they start.
Josh Durgin [Tue, 4 Oct 2011 19:16:30 +0000 (12:16 -0700)]
daemon-helper: detect the signal actually sent
I thought I fixed this when I implemented coverage collection, but I
guess it got lost in a rebase or something.
Josh Durgin [Tue, 4 Oct 2011 00:49:53 +0000 (17:49 -0700)]
ceph_manager: remove unused raw_pg_status method
Josh Durgin [Tue, 4 Oct 2011 00:49:13 +0000 (17:49 -0700)]
ceph_manager: run ceph -s as a normal program
This allows failures from it to be detected better.
Josh Durgin [Tue, 4 Oct 2011 00:05:33 +0000 (17:05 -0700)]
teuthology-results: include passed tests in email
Josh Durgin [Tue, 4 Oct 2011 00:00:45 +0000 (17:00 -0700)]
teuthology-results: include reasons for failure in email
Josh Durgin [Mon, 3 Oct 2011 23:32:42 +0000 (16:32 -0700)]
teuthology-ls: show reasons for failures with -v
Josh Durgin [Mon, 3 Oct 2011 23:08:49 +0000 (16:08 -0700)]
Add failure_reason to summary for the first failure detected.
For now, this is the exception raised during a task, the error found
in the central log, or coredumps found. More specific errors
(i.e. s3-tests had 3 failures) can be added later as exceptions raised
by tasks.
Josh Durgin [Mon, 3 Oct 2011 23:41:17 +0000 (16:41 -0700)]
radosbench: get coverage and cores
Samuel Just [Mon, 3 Oct 2011 21:04:53 +0000 (14:04 -0700)]
watch_notify_stress.py: add ceph flags option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Mon, 3 Oct 2011 21:03:36 +0000 (14:03 -0700)]
ceph.py: add btrfs option
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Mon, 3 Oct 2011 16:55:58 +0000 (09:55 -0700)]
nuke: keep up with renaming cfuse -> ceph-fuse
Sage Weil [Fri, 30 Sep 2011 16:12:45 +0000 (09:12 -0700)]
radosgw-admin: test additional keys, log list/show/rm
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 29 Sep 2011 05:20:38 +0000 (22:20 -0700)]
tasks/radosgw-admin: test radosgw-admin tool
Not yet complete...
Sage Weil [Thu, 29 Sep 2011 03:50:24 +0000 (20:50 -0700)]
nuke: killall apache2 and radosgw too
Greg Farnum [Fri, 30 Sep 2011 16:26:42 +0000 (09:26 -0700)]
s3-tests: use radosgw-admin instead of radosgw_admin
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Thu, 29 Sep 2011 16:09:31 +0000 (09:09 -0700)]
ceph_manager: parse osd numbers with dots
This is necessary since wip-dot-names was merged.
Sage Weil [Fri, 23 Sep 2011 15:57:18 +0000 (08:57 -0700)]
rename c* -> ceph-*
Leave cfuse task name unchanged for now...
Josh Durgin [Fri, 23 Sep 2011 01:23:36 +0000 (18:23 -0700)]
queue: results_timeout needs to be converted to a string
Samuel Just [Thu, 22 Sep 2011 20:23:05 +0000 (13:23 -0700)]
task/watch_notify_stress.py: add simple watch_notify stress test
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Wed, 21 Sep 2011 18:05:18 +0000 (11:05 -0700)]
schedule: put results timeout in the job
The default was always being used instead.
Greg Farnum [Tue, 20 Sep 2011 17:04:01 +0000 (10:04 -0700)]
lockfile: increase interval to prevent incorrect locking orders
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 15 Sep 2011 16:24:52 +0000 (09:24 -0700)]
lockfile: don't fail cleanup if no lock procs exist
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:32:15 +0000 (11:32 -0700)]
workunit: Fetch source from github.
Needed an elaborate dance because Github won't let us download
an archive of a subdirectory.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:09:45 +0000 (11:09 -0700)]
s3tests: Clone repository from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Tommi Virtanen [Fri, 16 Sep 2011 18:08:38 +0000 (11:08 -0700)]
coverage: Fetch source from github.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Samuel Just [Fri, 16 Sep 2011 00:26:03 +0000 (17:26 -0700)]
ceph.py: remove unused variables mds_daemons and mon_daemons
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:31:58 +0000 (16:31 -0700)]
ceph.py/cephmanager.py: add ctx.daemons for restarting daemons
ctx.daemons will now be an instance of CephState.
ctx.daemons.get_daemon(role, id).stop() to stop daemon, retart() to
restart the daemon, etc.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Wed, 14 Sep 2011 23:28:06 +0000 (16:28 -0700)]
testsnaps: LD_PRELOAD needed for librados
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Tommi Virtanen [Tue, 13 Sep 2011 21:53:02 +0000 (14:53 -0700)]
Move orchestra to teuthology.orchestra so there's just one top-level package.
Tommi Virtanen [Tue, 13 Sep 2011 21:10:12 +0000 (14:10 -0700)]
Merge orchestra into teuthology.
There are too many things called Orchestra out there,
including Ubuntu's new multi-machine service orchestration
framework. The code might still be beneficial outside of
teuthology, but it can be spun off at that time.
Conflicts:
bootstrap
requirements.txt
setup.py
Tommi Virtanen [Fri, 9 Sep 2011 20:22:03 +0000 (13:22 -0700)]
Callers of task s3tests.create_users don't need to provide dummy "fixtures" dict.
Josh Durgin [Fri, 9 Sep 2011 17:31:08 +0000 (10:31 -0700)]
thrashosds: fix timeout when no options are specified
Josh Durgin [Fri, 9 Sep 2011 01:09:11 +0000 (18:09 -0700)]
thrashosds: fail if cluster doesn't finally become clean in 5 minutes
Josh Durgin [Thu, 8 Sep 2011 21:09:13 +0000 (14:09 -0700)]
thrasher: get coverage and cores from calling ceph commands
Josh Durgin [Thu, 8 Sep 2011 21:07:23 +0000 (14:07 -0700)]
thrashosds: wait for every pg to go active and clean before exiting
Josh Durgin [Thu, 8 Sep 2011 19:54:23 +0000 (12:54 -0700)]
thrasher: clean up a bit
Josh Durgin [Thu, 8 Sep 2011 00:50:12 +0000 (17:50 -0700)]
autotest: allow tests to be run on all clients
Josh Durgin [Wed, 7 Sep 2011 23:54:24 +0000 (16:54 -0700)]
rbd: allow specifying all clients
Greg Farnum [Tue, 6 Sep 2011 18:29:04 +0000 (11:29 -0700)]
locktest: don't fail cleanup if the dir doesn't exist
We're doing this the cheapest way possible: make the dir!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Sat, 3 Sep 2011 22:07:21 +0000 (15:07 -0700)]
teuthology: do a deep merge of input yaml fragments
Concatenate lists, and recursively combine dicts.
If you specify inputs like
foo:
- a
- b
and
foo:
- c
you should get
foo:
- a
- b
- c
Dicts should also be merged (last one wins), and the merging is deep. E.g.
foo:
a:
b:
c: 1
and
foo:
a:
b:
c: 2
is
foo:
a:
b:
c: 2
Fixes: #1497
Josh Durgin [Sat, 3 Sep 2011 02:12:16 +0000 (19:12 -0700)]
lock: default to only listing machines you have locked
--all removes this restriction
Josh Durgin [Sat, 3 Sep 2011 00:58:19 +0000 (17:58 -0700)]
rgw: run as an external fastcgi server to match dho
Sage Weil [Fri, 2 Sep 2011 18:07:10 +0000 (11:07 -0700)]
don't eat exceptions for breakfast
fixes
0c2bee1514c1b1e65ca5d52459062e5a45da2d7b
Greg Farnum [Wed, 31 Aug 2011 21:40:55 +0000 (14:40 -0700)]
locktest: make it actually run the executable test
This was missing an argument (the file to run on!) and apparently
that didn't cause the command to output a failure return code.
Additionally, the ceph wrappers were blocking a crash and falsely
reporting success back to teuthology. (Yikes!)
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Thu, 1 Sep 2011 22:35:27 +0000 (15:35 -0700)]
nuke: synchronize clocks after reboot, and optionally synchronize all clocks
Sage Weil [Wed, 31 Aug 2011 20:56:42 +0000 (13:56 -0700)]
thrashosds: make it work when first mon isn't mon.0
Sage Weil [Wed, 31 Aug 2011 20:21:30 +0000 (13:21 -0700)]
thrashosds: no camelcaps, add some whitespace
Josh Durgin [Thu, 1 Sep 2011 17:44:46 +0000 (10:44 -0700)]
nuke: remove unused import
Josh Durgin [Thu, 1 Sep 2011 17:33:20 +0000 (10:33 -0700)]
nuke: localize again imports so they occur after gevent monkey-patching
This is necessary to make ssh work properly.
Josh Durgin [Thu, 1 Sep 2011 02:46:10 +0000 (19:46 -0700)]
nuke: reboot if rbd is mounted
Josh Durgin [Thu, 1 Sep 2011 00:43:14 +0000 (17:43 -0700)]
schedule: add a way to delete jobs from the queue
Josh Durgin [Thu, 1 Sep 2011 00:13:06 +0000 (17:13 -0700)]
parallel: don't hang if no tasks were spawned
This makes
6d919152178cfbd69dc5d50cdab40fc99db166a6 work.
Josh Durgin [Wed, 31 Aug 2011 23:48:58 +0000 (16:48 -0700)]
workunits: remove unused variable
Josh Durgin [Wed, 31 Aug 2011 21:36:32 +0000 (14:36 -0700)]
nuke: add option to reboot all nodes
Josh Durgin [Wed, 31 Aug 2011 21:36:01 +0000 (14:36 -0700)]
Fix pyflakes warnings.
Josh Durgin [Wed, 31 Aug 2011 00:21:36 +0000 (17:21 -0700)]
coverage: remove debugging
Josh Durgin [Wed, 31 Aug 2011 00:12:14 +0000 (17:12 -0700)]
workunit: save coverage and coredumps
Anything that runs a ceph utility should be using these commands.
Greg Farnum [Tue, 30 Aug 2011 22:48:58 +0000 (15:48 -0700)]
workunits: rework a little bit to allow "all" clients in a run
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 21:07:11 +0000 (14:07 -0700)]
cfuse: support running through valgrind
Also switch up the config code so we can take per-client options.
Greg Farnum [Mon, 29 Aug 2011 23:47:22 +0000 (16:47 -0700)]
valgrind: don't run valgrind_post if there's no valgrind
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 20:58:09 +0000 (13:58 -0700)]
valgrind: scan logs for bad results
It's not sophisticated but it will warn you about a node
if at least one node has issues.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Mon, 29 Aug 2011 19:39:38 +0000 (12:39 -0700)]
valgrind: use xml output for tools that support it
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Josh Durgin [Mon, 29 Aug 2011 19:42:45 +0000 (12:42 -0700)]
suite: add option to send an email if the entire suite passed
Josh Durgin [Fri, 26 Aug 2011 00:11:33 +0000 (17:11 -0700)]
Generate coverage at the end of a suite run,
and optionally email failures and ongoing jobs.
Josh Durgin [Fri, 26 Aug 2011 00:09:03 +0000 (17:09 -0700)]
queue: delete every job when it finishes, so only running jobs are buried
Josh Durgin [Thu, 4 Aug 2011 01:08:14 +0000 (18:08 -0700)]
Add teuthology-coverage for analyzing test coverage for a suite run.
Josh Durgin [Tue, 14 Jun 2011 18:57:29 +0000 (11:57 -0700)]
Add scripts to analyze coverage for a single teuthology run.
Greg Farnum [Thu, 25 Aug 2011 22:27:30 +0000 (15:27 -0700)]
thrasher: improve documentation a little
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:19:30 +0000 (15:19 -0700)]
thrasher: add option to mark OSDs down instead of out.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 22:18:42 +0000 (15:18 -0700)]
thrasher: allow a config to set values
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Thu, 25 Aug 2011 21:38:34 +0000 (14:38 -0700)]
thrasher: remove redundant wait_till_clean()
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 24 Aug 2011 23:48:14 +0000 (16:48 -0700)]
coverage: create dir conditionally
We don't need to create the dir if we aren't using coverage.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 21:44:39 +0000 (14:44 -0700)]
lockfile: add a lockfile task
This allows pretty highly configurable testing of
fcntl locking via a teuthology task.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 24 Aug 2011 17:03:43 +0000 (10:03 -0700)]
lock: --list-targets: list locks and dump result in targets: yaml format.
Sage Weil [Wed, 24 Aug 2011 04:00:26 +0000 (21:00 -0700)]
check ceph cluster log for badness (ERR, WRN, SEC)
Sage Weil [Tue, 23 Aug 2011 05:04:57 +0000 (22:04 -0700)]
ceph: copy cluster log file to archive/ceph.log
Sage Weil [Mon, 22 Aug 2011 00:26:15 +0000 (17:26 -0700)]
workunits: set CEPH_CONF environment
This allows any ceph util we run (including the rados-api tests) find
the config and keyrings they need.
Sage Weil [Sun, 21 Aug 2011 22:14:02 +0000 (15:14 -0700)]
rbd: make default image 10G instead of 1G
Sage Weil [Wed, 10 Aug 2011 20:34:38 +0000 (13:34 -0700)]
suite: support a suite consisting of multiple collections
suite = many collections, and maybe some shared files
collection = a collection of facets
facet = a config fragment
Greg Farnum [Wed, 17 Aug 2011 17:35:37 +0000 (10:35 -0700)]
valgrind: Document!
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:32:57 +0000 (10:32 -0700)]
Merge branch 'wip-valgrind'
Greg Farnum [Wed, 17 Aug 2011 17:06:58 +0000 (10:06 -0700)]
include log in valgrind log file names
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:05:13 +0000 (10:05 -0700)]
ceph task: split up arguments a little more
This allows selective daemon kill signal changes. With valgrind
daemons we want term instead of kill, for instance.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Greg Farnum [Wed, 17 Aug 2011 17:04:31 +0000 (10:04 -0700)]
valgrind: move valgrind logs to log dir
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>