]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoMerge branch 'next'
Sage Weil [Mon, 22 Apr 2013 20:01:11 +0000 (13:01 -0700)]
Merge branch 'next'

12 years agoceph-deploy: fix stop command
Sage Weil [Mon, 22 Apr 2013 20:01:02 +0000 (13:01 -0700)]
ceph-deploy: fix stop command

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoadded ceph.client.admin.keyring on the client to run rbd and rados tests
tamil [Sat, 20 Apr 2013 01:23:54 +0000 (18:23 -0700)]
added ceph.client.admin.keyring on the client to run rbd and rados tests

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agoadded extra packages required by ceph-deploy for rbd and rados tests
tamil [Sat, 20 Apr 2013 01:13:01 +0000 (18:13 -0700)]
added extra packages required by ceph-deploy for rbd and rados tests

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Thu, 18 Apr 2013 18:11:38 +0000 (11:11 -0700)]
Merge branch 'next'

12 years agoceph-deploy: stop daemons, archive, then purge[data]
Sage Weil [Thu, 18 Apr 2013 15:06:52 +0000 (08:06 -0700)]
ceph-deploy: stop daemons, archive, then purge[data]

Purge removes logs, and we want to archive those, so explicitly shut down
all daemons before doing the archiving step.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph.conf: lower mon disk avail warning threshold
Sage Weil [Wed, 3 Apr 2013 15:38:52 +0000 (08:38 -0700)]
ceph.conf: lower mon disk avail warning threshold

Only wanr when we hit 90% instead of default 70%

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cf4bf09b2c4dae034332fb893cf96ed31adb7a4b)

12 years agoMerge branch 'next'
Sam Lang [Wed, 17 Apr 2013 23:09:39 +0000 (18:09 -0500)]
Merge branch 'next'

Conflicts:
teuthology/lock.py
teuthology/lockstatus.py
teuthology/misc.py
teuthology/task/install.py

12 years agomisc: Fix for case status['description'] == None
Sam Lang [Wed, 17 Apr 2013 22:38:36 +0000 (17:38 -0500)]
misc: Fix for case status['description'] == None

Skip the machine that has a description, but the
value is None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Warren Usui <warren.usui@inktank.com>
12 years agoradosgw-admin-rest: Add task for RESTful admin api.
caleb miles [Wed, 17 Apr 2013 15:45:59 +0000 (08:45 -0700)]
radosgw-admin-rest: Add task for RESTful admin api.

Signed-off-by caleb miles <caleb.miles@inktank.com>

12 years agoradosgw-admin-rest: Add task for RESTful admin api.
caleb miles [Wed, 17 Apr 2013 15:45:59 +0000 (08:45 -0700)]
radosgw-admin-rest: Add task for RESTful admin api.

Signed-off-by caleb miles <caleb.miles@inktank.com>

12 years agomisc: Check for 'None' string from yaml
Sam Lang [Wed, 17 Apr 2013 00:08:45 +0000 (19:08 -0500)]
misc: Check for 'None' string from yaml

The description attribute from the machines yaml returned by the
locker might be the string 'None'.  Need to explicitly check for
that to avoid using a test dir of /tmp/cephtest/None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agolock: Fix import cycle breakage
Sam Lang [Fri, 12 Apr 2013 17:55:54 +0000 (12:55 -0500)]
lock: Fix import cycle breakage

fa2049f caused an import cycle between lock.py and misc.py.  Move the
needed functions from lock.py to lockstatus.py so that we can avoid the
import cycle.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Conflicts:
teuthology/lock.py

12 years agoRevert "Revert "Install.py: Prevent prompts from breaking apt""
Dan Mick [Mon, 15 Apr 2013 18:24:31 +0000 (11:24 -0700)]
Revert "Revert "Install.py: Prevent prompts from breaking apt""

This reverts commit 67a616a97927efdc4fbcc5edb0d0cf4a724d90e2.

Sigh.  As it turns out, /etc/default/grub being hacked also
causes the same problem.  I think there's a way to fix that cleanly
as well, but until then, replacing the "accept installed version"
hack here so jobs can run.

12 years agoRevert "Install.py: Prevent prompts from breaking apt"
Dan Mick [Fri, 12 Apr 2013 17:56:14 +0000 (10:56 -0700)]
Revert "Install.py: Prevent prompts from breaking apt"

This reverts commit 5995ae7e78dd19f4036f891db9db9fec97d6eab5.

With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing.  Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit c2b0828b19a426c6d73cb2f614333200ae66bac3)

12 years agoInstall.py: Prevent prompts from breaking apt
Sandon Van Ness [Fri, 5 Apr 2013 02:15:14 +0000 (19:15 -0700)]
Install.py: Prevent prompts from breaking apt

Change apt commands to prevent prompts from coming up (forcing
non-interactive mode) so things like grub or other stuff doesn't
break teuthology runs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agomisc: Check for 'None' string from yaml
Sam Lang [Wed, 17 Apr 2013 00:08:45 +0000 (19:08 -0500)]
misc: Check for 'None' string from yaml

The description attribute from the machines yaml returned by the
locker might be the string 'None'.  Need to explicitly check for
that to avoid using a test dir of /tmp/cephtest/None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Use pythonic 'is not None' for jobid case
Sam Lang [Sat, 13 Apr 2013 15:12:45 +0000 (10:12 -0500)]
misc: Use pythonic 'is not None' for jobid case

The conditional 'if global_jobid:' evaluates to true
in some cases even when global_jobid is None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Fix name parsing
Sam Lang [Fri, 12 Apr 2013 22:02:07 +0000 (17:02 -0500)]
misc: Fix name parsing

Use last two digits of year.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agolock: Fix import cycle breakage
Sam Lang [Fri, 12 Apr 2013 17:55:54 +0000 (12:55 -0500)]
lock: Fix import cycle breakage

fa2049f caused an import cycle between lock.py and misc.py.  Move the
needed functions from lock.py to lockstatus.py so that we can avoid the
import cycle.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Conflicts:
teuthology/lock.py

12 years agomisc: Use job id and make short path for testdir
Sam Lang [Thu, 11 Apr 2013 14:23:10 +0000 (09:23 -0500)]
misc: Use job id and make short path for testdir

Nightlies run on teuthology currently use a testdir of
/home/ubuntu/cephtest, but this causes stale job errors occasionally
from the previous tests not getting properly cleaned up, which prevents
the nightlies from running successfully.

The misc.py get_testdir() function can specify a testdir that is
specific to the job, but previously the path was too long and would
cause separate job failures.

This patch does two things to resolve that.  First, it uses the job id
from the teuthology run if one exists.  This should be a relatively
short number that will identify the job run effectively.  Second,
if the job id isn't available, it creates a shortened form of the
job's name, for example the job name:

teuthology-2013-04-09_23:51:49-rgw-next-testing-basic

becomes:

te1304092351rntb

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoceph-deploy: purge before archiving
Sage Weil [Wed, 17 Apr 2013 03:50:50 +0000 (20:50 -0700)]
ceph-deploy: purge before archiving

Purge will uninstall and (in so doing) stop the daemons. This avoids trying
to tar up the mon data or logs while they are being written to, which
avoids errors like

2013-04-16T20:21:47.103 INFO:teuthology.task.ceph-deploy:Archiving mon data...
2013-04-16T20:21:47.545 INFO:teuthology.orchestra.run.err:tar: ./ceph-mira089/store.db/000009.log: file changed as we read it

Also drop the unnecessary uninstall (it is implied by purge).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4befae4fbea2413c9a8848ba195f336293619938)

12 years agoceph-deploy: purge before archiving
Sage Weil [Wed, 17 Apr 2013 03:50:50 +0000 (20:50 -0700)]
ceph-deploy: purge before archiving

Purge will uninstall and (in so doing) stop the daemons. This avoids trying
to tar up the mon data or logs while they are being written to, which
avoids errors like

2013-04-16T20:21:47.103 INFO:teuthology.task.ceph-deploy:Archiving mon data...
2013-04-16T20:21:47.545 INFO:teuthology.orchestra.run.err:tar: ./ceph-mira089/store.db/000009.log: file changed as we read it

Also drop the unnecessary uninstall (it is implied by purge).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoscheduled_suite.sh: check clock skew at start and end of run
Sage Weil [Wed, 3 Apr 2013 21:00:25 +0000 (14:00 -0700)]
scheduled_suite.sh: check clock skew at start and end of run

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5c80201ec4a4a97367e4d7243cd046a3a8c808fa)

12 years agomisc: Fix close() call to pass in fd
Sam Lang [Mon, 15 Apr 2013 21:26:22 +0000 (16:26 -0500)]
misc: Fix close() call to pass in fd

fd is an int, we need to use os.close().

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Fix bug in calling function remote_mktemp()
Sam Lang [Mon, 15 Apr 2013 21:16:34 +0000 (16:16 -0500)]
misc: Fix bug in calling function remote_mktemp()

Changed the name of the function to get a remote temporary filename,
need to update all the locations where it gets called.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge pull request #11 from ceph/wip-4717
Sam Lang [Mon, 15 Apr 2013 18:44:59 +0000 (11:44 -0700)]
Merge pull request #11 from ceph/wip-4717

misc: Use tempfile.mkstemp() instead of tempnam

12 years agomisc: Use tempfile.mkstemp() instead of tempnam
Sam Lang [Fri, 12 Apr 2013 20:52:47 +0000 (15:52 -0500)]
misc: Use tempfile.mkstemp() instead of tempnam

tempnam() is considered an unsafe security risk because the filename
generated is easy to guess and can be symlinked in advance.  Use
mkstemp() instead.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Joe Buck <jbbuck@gmail.com>
12 years agoRevert "Revert "Install.py: Prevent prompts from breaking apt""
Dan Mick [Mon, 15 Apr 2013 18:24:31 +0000 (11:24 -0700)]
Revert "Revert "Install.py: Prevent prompts from breaking apt""

This reverts commit 67a616a97927efdc4fbcc5edb0d0cf4a724d90e2.

Sigh.  As it turns out, /etc/default/grub being hacked also
causes the same problem.  I think there's a way to fix that cleanly
as well, but until then, replacing the "accept installed version"
hack here so jobs can run.

12 years agomisc: Use pythonic 'is not None' for jobid case
Sam Lang [Sat, 13 Apr 2013 15:12:45 +0000 (10:12 -0500)]
misc: Use pythonic 'is not None' for jobid case

The conditional 'if global_jobid:' evaluates to true
in some cases even when global_jobid is None.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomisc: Fix name parsing
Sam Lang [Fri, 12 Apr 2013 22:02:07 +0000 (17:02 -0500)]
misc: Fix name parsing

Use last two digits of year.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoRevert "Install.py: Prevent prompts from breaking apt"
Dan Mick [Fri, 12 Apr 2013 17:56:14 +0000 (10:56 -0700)]
Revert "Install.py: Prevent prompts from breaking apt"

This reverts commit 5995ae7e78dd19f4036f891db9db9fec97d6eab5.

With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing.  Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit c2b0828b19a426c6d73cb2f614333200ae66bac3)

12 years agokernel.py: put submenu name in 01_ceph_kernel if necessary
Dan Mick [Tue, 9 Apr 2013 22:53:49 +0000 (15:53 -0700)]
kernel.py: put submenu name in 01_ceph_kernel if necessary

We had been writing 01_ceph_kernel with the kernel title, and
relying on the fact that grub.cfg would never have submenus in it
(implemented by a hack to /etc/grub.d/10_linux which neutered its
submenu creation).  However, that hack was modifying a package file,
and got in the way of later apt commands.  Rather than doing it
that way, this divines the title of the submenu and sets the
default variable to "submenu>kernel", which works to select the
desired kernel.

It depends on there being only one level of submenu, and on the
format of the menuentry and submenu commands, dictated by grub2.
None of this is likely to work at all outside Ubuntu.

Fixes: #4496
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 52aec32a7da07ca6e9a22ecedde78dafb4b74dfc)

12 years agopeer.py: we can't assume pg query state will match mon pg state
Samuel Just [Fri, 12 Apr 2013 22:01:04 +0000 (15:01 -0700)]
peer.py: we can't assume pg query state will match mon pg state

The pg state could easily have changed in the mean time,
for example, from recovery_wait to recovering.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge pull request #10 from ceph/wip-fix-importcycle
Sam Lang [Fri, 12 Apr 2013 20:06:55 +0000 (13:06 -0700)]
Merge pull request #10 from ceph/wip-fix-importcycle

lock: Fix import cycle breakage

Reviewed-by: Warren Usui <warren.usui@inktank.com>
12 years agoRevert "Install.py: Prevent prompts from breaking apt"
Dan Mick [Fri, 12 Apr 2013 17:56:14 +0000 (10:56 -0700)]
Revert "Install.py: Prevent prompts from breaking apt"

This reverts commit 5995ae7e78dd19f4036f891db9db9fec97d6eab5.

With the changes to ceph-qa-chef and the teuthology kernel task,
we're no longer touching packaged file /etc/grub.d/10_linux, which
was the reason for this apt forcing.  Remove so that we find other
package problems that might be masked by this; we can always
put it back if there are such problems until we can fix those as well.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agolock: Fix import cycle breakage
Sam Lang [Fri, 12 Apr 2013 17:55:54 +0000 (12:55 -0500)]
lock: Fix import cycle breakage

fa2049f caused an import cycle between lock.py and misc.py.  Move the
needed functions from lock.py to lockstatus.py so that we can avoid the
import cycle.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge pull request #9 from ceph/wip-short-testdir
wusui [Fri, 12 Apr 2013 17:22:15 +0000 (10:22 -0700)]
Merge pull request #9 from ceph/wip-short-testdir

misc: Use job id and make short path for testdir

Reviewed-by: Warren Usui <warren.usui@inktank.com>
12 years agokernel.py: put submenu name in 01_ceph_kernel if necessary
Dan Mick [Tue, 9 Apr 2013 22:53:49 +0000 (15:53 -0700)]
kernel.py: put submenu name in 01_ceph_kernel if necessary

We had been writing 01_ceph_kernel with the kernel title, and
relying on the fact that grub.cfg would never have submenus in it
(implemented by a hack to /etc/grub.d/10_linux which neutered its
submenu creation).  However, that hack was modifying a package file,
and got in the way of later apt commands.  Rather than doing it
that way, this divines the title of the submenu and sets the
default variable to "submenu>kernel", which works to select the
desired kernel.

It depends on there being only one level of submenu, and on the
format of the menuentry and submenu commands, dictated by grub2.
None of this is likely to work at all outside Ubuntu.

Fixes: #4496
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agomisc: Use job id and make short path for testdir
Sam Lang [Thu, 11 Apr 2013 14:23:10 +0000 (09:23 -0500)]
misc: Use job id and make short path for testdir

Nightlies run on teuthology currently use a testdir of
/home/ubuntu/cephtest, but this causes stale job errors occasionally
from the previous tests not getting properly cleaned up, which prevents
the nightlies from running successfully.

The misc.py get_testdir() function can specify a testdir that is
specific to the job, but previously the path was too long and would
cause separate job failures.

This patch does two things to resolve that.  First, it uses the job id
from the teuthology run if one exists.  This should be a relatively
short number that will identify the job run effectively.  Second,
if the job id isn't available, it creates a shortened form of the
job's name, for example the job name:

teuthology-2013-04-09_23:51:49-rgw-next-testing-basic

becomes:

te1304092351rntb

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoFix for kdb: doesn't work on mira nodes
Sandon Van Ness [Tue, 9 Apr 2013 20:12:58 +0000 (13:12 -0700)]
Fix for kdb: doesn't work on mira nodes

This is a fix for issue #4677 which was caused by kdb output being
hard-coded to ttyS1 which is fine for all our hardware except mira
machines. This change just checks to see if mira is in the host's
name and uses ttyS2 instead (simple fix).

12 years agoFix: kdb: doesn't work on mira nodes
Sandon Van Ness [Tue, 9 Apr 2013 20:09:39 +0000 (13:09 -0700)]
Fix: kdb: doesn't work on mira nodes

Change kernel.py to use ttyS2 for kdb output instead of ttyS1 when
the node is a mira machine. This is a fix for issue #4677

12 years agoteuthology: fix for ssh-keys-task
Joe Buck [Fri, 5 Apr 2013 00:47:29 +0000 (17:47 -0700)]
teuthology: fix for ssh-keys-task

Resolves an issue where we
were not properly escaping the generated
public key when doing matches against it.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewd-by: Sam Lang <sam.lang@inktank.com>
12 years agoradosgw-admin: Test bucket list for bucket starting with underscore.
caleb miles [Wed, 3 Apr 2013 13:30:42 +0000 (09:30 -0400)]
radosgw-admin: Test bucket list for bucket starting with underscore.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agoInstall.py: Prevent prompts from breaking apt
Sandon Van Ness [Fri, 5 Apr 2013 02:15:14 +0000 (19:15 -0700)]
Install.py: Prevent prompts from breaking apt

Change apt commands to prevent prompts from coming up (forcing
non-interactive mode) so things like grub or other stuff doesn't
break teuthology runs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoInstall.py: Prevent prompts from breaking apt
Sandon Van Ness [Fri, 5 Apr 2013 02:15:14 +0000 (19:15 -0700)]
Install.py: Prevent prompts from breaking apt

Change apt commands to prevent prompts from coming up (forcing
non-interactive mode) so things like grub or other stuff doesn't
break teuthology runs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoteuthology: extend Hadoop task to support branches
Joe Buck [Wed, 27 Mar 2013 03:11:21 +0000 (20:11 -0700)]
teuthology: extend Hadoop task to support branches

Modify the Hadoop task to support branches
being specified for both the Apache and Inktank
Hadoop branches.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewd-by: Sam Lang <sam.lang@inktank.com>
12 years agoteuthology: remove previous test ssh keys
Joe Buck [Wed, 20 Mar 2013 04:26:16 +0000 (21:26 -0700)]
teuthology: remove previous test ssh keys

Updated the ssh-keys task to cleanup
any left-over keys from previous tasks
(indicated by the user being 'ssh-keys-user').
Also, some of the functions in the ssh_keys task seem
like they could be useful in general.
This patch refactors them into misc.py.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewd-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge pull request #8 from ceph/wip_4510
Sage Weil [Thu, 4 Apr 2013 15:50:27 +0000 (08:50 -0700)]
Merge pull request #8 from ceph/wip_4510

repair_test: add test for repairing read errs and truncations

12 years agoworkunit: sudo rm -rf ...
Sage Weil [Thu, 4 Apr 2013 05:01:01 +0000 (22:01 -0700)]
workunit: sudo rm -rf ...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoscheduled_suite.sh: check clock skew at start and end of run
Sage Weil [Wed, 3 Apr 2013 21:00:25 +0000 (14:00 -0700)]
scheduled_suite.sh: check clock skew at start and end of run

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-teuthologyvminstall-wusui'
Warren Usui [Wed, 3 Apr 2013 19:34:01 +0000 (12:34 -0700)]
Merge branch 'wip-teuthologyvminstall-wusui'

12 years agoImplement full reinstallation of a VM system.
Warren Usui [Wed, 3 Apr 2013 01:27:38 +0000 (18:27 -0700)]
Implement full reinstallation of a VM system.

Downburst create is used to reinstall a VM when it is locked.
Downburst destroy is used to remove a VM when it is unlocked.
Host keys are regenerated on each vm instantiation, so the keys
need to be checked prior to use.
If needed, qa-ceph-chef is run on newly installed systems to insure that
they are fully functional.

Signed-off-by: Warren Usui <warren.usui@inktank.com>
12 years agoceph.conf: lower mon disk avail warning threshold
Sage Weil [Wed, 3 Apr 2013 15:38:52 +0000 (08:38 -0700)]
ceph.conf: lower mon disk avail warning threshold

Only wanr when we hit 90% instead of default 70%

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoradosgw-admin: Add test of duplicate user email specification.
caleb miles [Wed, 3 Apr 2013 14:34:57 +0000 (10:34 -0400)]
radosgw-admin: Add test of duplicate user email specification.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agoradosgw-admin: Test subuser mask durability when creating new key.
caleb miles [Wed, 3 Apr 2013 12:49:58 +0000 (08:49 -0400)]
radosgw-admin: Test subuser mask durability when creating new key.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agoradosgw-admin: cluster info -> zone info
caleb miles [Tue, 2 Apr 2013 03:46:30 +0000 (20:46 -0700)]
radosgw-admin: cluster info -> zone info

Signed-off-by caleb.miles <caleb.miles@inktank.com>

12 years agorepair_test: add test for repairing read errs and truncations
Samuel Just [Wed, 27 Mar 2013 19:11:04 +0000 (12:11 -0700)]
repair_test: add test for repairing read errs and truncations

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorepair_test: add test for repairing read errs and truncations
Samuel Just [Wed, 27 Mar 2013 19:11:04 +0000 (12:11 -0700)]
repair_test: add test for repairing read errs and truncations

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agolocker: try to make up for apache timeouts
Josh Durgin [Fri, 29 Mar 2013 23:33:49 +0000 (16:33 -0700)]
locker: try to make up for apache timeouts

If the lock request succeeds in updating the db, but the client gets a
timeout from apache, they can now try again and get back the machines
they just locked.

Only automatic runs have a description set when locking several
machines, so this does not affect users of teuthology-lock
--lock-many, where no description can be set in the same request.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agodo not archive on pass if 'archive-on-error: True'
Sage Weil [Fri, 29 Mar 2013 19:19:46 +0000 (12:19 -0700)]
do not archive on pass if 'archive-on-error: True'

Optional flag makes us suck down the archive (mostly, the logs, which
might be huge for some debugging tests) unless the test has failed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolocker: log desc too
Sage Weil [Fri, 29 Mar 2013 21:27:04 +0000 (14:27 -0700)]
locker: log desc too

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorun: clean up machine_type thing
Sage Weil [Fri, 29 Mar 2013 19:16:39 +0000 (12:16 -0700)]
run: clean up machine_type thing

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph_manager: retry set_pool_property on EAGAIN
Sage Weil [Thu, 28 Mar 2013 22:24:33 +0000 (15:24 -0700)]
ceph_manager: retry set_pool_property on EAGAIN

Retry indefinitely, for now.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorun: machine-type: foo, not machine_type: foo
Sage Weil [Thu, 28 Mar 2013 17:50:40 +0000 (10:50 -0700)]
run: machine-type: foo, not machine_type: foo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #6 from ceph/wip-mds-thrasher-logging
Sage Weil [Wed, 27 Mar 2013 15:56:04 +0000 (08:56 -0700)]
Merge pull request #6 from ceph/wip-mds-thrasher-logging

task/mds_thrash: Log mds dump after long delay

12 years agotask/mds_thrash: Log mds dump after long delay
Sam Lang [Wed, 27 Mar 2013 13:48:45 +0000 (08:48 -0500)]
task/mds_thrash: Log mds dump after long delay

In cases where the mds thrasher continuously loops
waiting for an mds to be removed from the map, or
for a new mds to become active, we want to start logging
the mds state for debugging.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agolocker: make desc optional
Sage Weil [Tue, 26 Mar 2013 20:27:53 +0000 (13:27 -0700)]
locker: make desc optional

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph.conf: osd debug op order = true
Sage Weil [Thu, 7 Mar 2013 05:35:41 +0000 (21:35 -0800)]
ceph.conf: osd debug op order = true

Debug the osd op ordering by default.  Most of the runs have a small number
of clients, which makes the STL maps cheap.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolocker/api: fix DELETE
Sage Weil [Tue, 26 Mar 2013 18:40:13 +0000 (11:40 -0700)]
locker/api: fix DELETE

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-lock'
Sage Weil [Tue, 26 Mar 2013 18:34:33 +0000 (11:34 -0700)]
Merge branch 'wip-lock'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agolock: pass desc to lock operation; leave on unlock
Sage Weil [Mon, 25 Mar 2013 23:46:48 +0000 (16:46 -0700)]
lock: pass desc to lock operation; leave on unlock

Pass the desc to the lock operation.

The unlock operation now clears desc for us; no need to do it outselves.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolocker: set desc on lock
Sage Weil [Mon, 25 Mar 2013 23:42:59 +0000 (16:42 -0700)]
locker: set desc on lock

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolocker: clear desc on unlock
Sage Weil [Mon, 25 Mar 2013 23:41:15 +0000 (16:41 -0700)]
locker: clear desc on unlock

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agothrashosds: add test_backfill_full
Samuel Just [Thu, 21 Mar 2013 21:37:38 +0000 (14:37 -0700)]
thrashosds: add test_backfill_full

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agothrashosds.py: fix line length
Samuel Just [Thu, 21 Mar 2013 21:10:13 +0000 (14:10 -0700)]
thrashosds.py: fix line length

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agolocker: log updates
Josh Durgin [Mon, 25 Mar 2013 22:01:26 +0000 (15:01 -0700)]
locker: log updates

Note whenever locks are acquired/released, or a machine's description is updated.
Under apache, these will go to error.log.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoceph-deploy: purge /var/lib/ceph data on finish
Sage Weil [Sun, 24 Mar 2013 22:12:59 +0000 (15:12 -0700)]
ceph-deploy: purge /var/lib/ceph data on finish

The install task does this now that the package doesn't; we
need to too.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoverify /var/lib/ceph not present on start
Sage Weil [Sun, 24 Mar 2013 03:58:46 +0000 (20:58 -0700)]
verify /var/lib/ceph not present on start

Verify there is no /var/lib/ceph, just like we do with the cephtest
directory.  We will need to change this (or make it optional) when we
allow runs against an existing cluster, but then a whole bunch of other
things will need to change then as well.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinstall: need sudo when purging /var/lib/ceph
Sage Weil [Sun, 24 Mar 2013 03:53:51 +0000 (20:53 -0700)]
install: need sudo when purging /var/lib/ceph

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinstall, nuke: explicitly purge /var/lib/ceph
Sage Weil [Thu, 21 Mar 2013 05:51:24 +0000 (22:51 -0700)]
install, nuke: explicitly purge /var/lib/ceph

The packages won't do this anymore.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoteuthology: cleanup client dirs for workunit task
Joe Buck [Fri, 22 Mar 2013 18:56:50 +0000 (11:56 -0700)]
teuthology: cleanup client dirs for workunit task

This patch corrects an issue where a workunit task is
not cleaning up generated directories
if the 'all' key is used to specify clients.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agostop ignoring osd leaks
Sage Weil [Fri, 22 Mar 2013 03:40:48 +0000 (20:40 -0700)]
stop ignoring osd leaks

Note that the mds is the only one left that we are ignoring.

12 years agomoving client.keyring creation out of ceph task
tamil [Thu, 21 Mar 2013 23:14:54 +0000 (16:14 -0700)]
moving client.keyring creation out of ceph task

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agolock: make do_summary() respect --machine-type
Dan Mick [Thu, 21 Mar 2013 01:30:29 +0000 (18:30 -0700)]
lock: make do_summary() respect --machine-type

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agostop and restart daemons as restart only starts.
tamil [Thu, 21 Mar 2013 00:40:46 +0000 (17:40 -0700)]
stop and restart daemons as restart only starts.

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agotask/ceph: Revert extra check for running status
Sam Lang [Tue, 19 Mar 2013 20:42:51 +0000 (15:42 -0500)]
task/ceph:  Revert extra check for running status

Don't use exit status info to track daemon state.  We need to find
a better way to do this for the restart task.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agosuite: shorten subject a bit
Sage Weil [Tue, 19 Mar 2013 19:02:14 +0000 (12:02 -0700)]
suite: shorten subject a bit

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-4197'
David Zafman [Tue, 19 Mar 2013 18:30:41 +0000 (11:30 -0700)]
Merge branch 'wip-4197'

12 years agoosd: data loss: low space handling
David Zafman [Fri, 15 Mar 2013 04:53:44 +0000 (21:53 -0700)]
osd: data loss: low space handling

Automated test cases for feature #4197

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reported-by: Sam Just <sam.just@inktank.com>
12 years agoFixed so that installation works on a brand new CentOS system.
Warren Usui [Mon, 18 Mar 2013 22:25:59 +0000 (15:25 -0700)]
Fixed so that installation works on a brand new CentOS system.

Do yum install rather than yum reinstall for CentOS.
When exiting CentOS, yum erase the ceph-release rpm.

Signed-off-by: Warren Usui <warren.usui@inktank.com>
12 years agotask/restart: Handle error from script correctly
Sam Lang [Tue, 19 Mar 2013 13:08:05 +0000 (08:08 -0500)]
task/restart:  Handle error from script correctly

The exitstatus on the process is a gevent.AsyncResult
(not an int).  Use the try/except pattern for handling
errors instead.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agos/dist-upgrade/upgrade
tamil [Mon, 18 Mar 2013 23:29:18 +0000 (16:29 -0700)]
s/dist-upgrade/upgrade

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agoFixed ceph-fuse mount point cleanup bug
Warren Usui [Sat, 16 Mar 2013 01:18:56 +0000 (18:18 -0700)]
Fixed ceph-fuse mount point cleanup bug

Tested for the existence of /sys/fs/fuse/connections/*/abort
before clobbering it.  This problem was generated when all
the machines were virtual CentOS machines.

Signed-off-by: Warren Usui <warren.usui@inktank.com>
12 years agotask/restart: Cleanup in finally
Sam Lang [Mon, 18 Mar 2013 16:28:51 +0000 (11:28 -0500)]
task/restart:  Cleanup in finally

Need to cleanup the files created for this test from
the testdir.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/restart: Fix check for done
Sam Lang [Mon, 18 Mar 2013 16:27:11 +0000 (11:27 -0500)]
task/restart: Fix check for done

The last command a restart script outputs is 'done'
indicating the script does not require being restarted
further.  Handle this case properly.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agotask/restart: Restart task for testing daemon kill
Sam Lang [Mon, 11 Mar 2013 18:22:10 +0000 (13:22 -0500)]
task/restart: Restart task for testing daemon kill

The ceph daemons support being killed at a specific code point
with a config option.  In some cases, we want to test a kill point
only once for a given daemon run (such as replay that only occurs
during daemon startup).  This task allows running a script or executable
and (when the script sends a command to the task) restarting it with
a temporary config that has the appropriate kill point set.  Once
the daemon asserts and gets restarted, the original config is used.

Adds a specific restart_with_args() method to the DaemonState in the
ceph task.

Right now this task follows the workunit task closely, but uses stdout/stdin
to specify when to restart a daemon.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoadded ceph_health check and a few log messages
tamil [Fri, 15 Mar 2013 22:50:52 +0000 (15:50 -0700)]
added ceph_health check and a few log messages

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agoFixed 'clock:' on Centos
Warren Usui [Fri, 15 Mar 2013 01:06:17 +0000 (18:06 -0700)]
Fixed 'clock:' on Centos

ntpdc commands were formerly returning -127 on CentOS

Signed-off-by: Warren Usui <warren.usui@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoadded install.upgrade task
tamil [Fri, 15 Mar 2013 01:26:03 +0000 (18:26 -0700)]
added install.upgrade task

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>