]> git.apps.os.sepia.ceph.com Git - teuthology.git/log
teuthology.git
8 years agoadd_remotes: Correctly map remotes to roles wip-flex-locking 1051/head
Zack Cerza [Fri, 17 Mar 2017 20:27:51 +0000 (14:27 -0600)]
add_remotes: Correctly map remotes to roles

We used to use the 'targets' object to make remotes to roles. This
worked fine before multi-OS locking, but broke down because of the
unordered nature of dicts.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoAllow locking nodes with mixed OSes
Zack Cerza [Tue, 14 Mar 2017 19:13:13 +0000 (13:13 -0600)]
Allow locking nodes with mixed OSes

Instead of either not specifying an OS type/version and just getting
what's available - or requesting the same OS type/version for all nodes
in the job, allow requesting arbitrary mixes of OSes.

The path forward is going to be replacing things like:
roles:
- ['osd.0', 'mon.a']
- ['osd.1', 'osd.2]
- ['client.0']

With things like:
nodes:
- os_type: ubuntu
  os_version: '16.04'
  roles: ['osd.0', 'mon.a']
- os_type: ubuntu
  os_version: '14.04'
  roles: ['osd.1', 'osd.2]
- os_type: centos
  os_version: '7.3'
  roles: ['client.0']

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agolock_machines: Split out some functionality
Zack Cerza [Wed, 15 Mar 2017 22:20:29 +0000 (16:20 -0600)]
lock_machines: Split out some functionality

A little pre-overhaul refactor...

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agolock_machines: Linter fixes
Zack Cerza [Wed, 15 Mar 2017 22:17:59 +0000 (16:17 -0600)]
lock_machines: Linter fixes

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1007 from ceph/wip-rh-internal-task
Zack Cerza [Tue, 21 Mar 2017 17:12:15 +0000 (11:12 -0600)]
Merge pull request #1007 from ceph/wip-rh-internal-task

Add redhat internal tasks that setup repo

8 years agorun redhat internal tasks based on config item 1007/head
Vasu Kulkarni [Mon, 20 Mar 2017 16:55:19 +0000 (09:55 -0700)]
run redhat internal tasks based on config item

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoAlways run tests with latest kernel on RHEL
Vasu Kulkarni [Mon, 20 Mar 2017 16:52:33 +0000 (09:52 -0700)]
Always run tests with latest kernel on RHEL

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoAdd redhat internal task to setup repos
Vasu Kulkarni [Mon, 20 Mar 2017 16:48:10 +0000 (09:48 -0700)]
Add redhat internal task to setup repos

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoMerge pull request #1047 from ceph/wip-daemon-register
Zack Cerza [Fri, 17 Mar 2017 20:00:48 +0000 (14:00 -0600)]
Merge pull request #1047 from ceph/wip-daemon-register

orchestra/daemon: separate register_daemon from add_daemon

8 years agoorchestra/daemon: separate register_daemon from add_daemon 1047/head
Sage Weil [Fri, 17 Mar 2017 19:46:22 +0000 (15:46 -0400)]
orchestra/daemon: separate register_daemon from add_daemon

The former registers without starting; the latter does both (as before).

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #1050 from ceph/wip-pg-remap
Sage Weil [Fri, 17 Mar 2017 18:59:39 +0000 (13:59 -0500)]
Merge pull request #1050 from ceph/wip-pg-remap

ceph.conf: mon osd allow pg remap = true

8 years agoceph.conf: mon osd allow pg remap = true 1050/head
Sage Weil [Fri, 17 Mar 2017 18:58:50 +0000 (14:58 -0400)]
ceph.conf: mon osd allow pg remap = true

Signed-off-by: Sage Weil <sage@redhat.com>
8 years agoMerge pull request #1045 from ceph/wip-daemon
Dan Mick [Thu, 16 Mar 2017 17:39:53 +0000 (10:39 -0700)]
Merge pull request #1045 from ceph/wip-daemon

Fix a couple bugs re: remote daemon execution

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoMerge pull request #1048 from dmick/master
David Galloway [Thu, 16 Mar 2017 00:19:44 +0000 (20:19 -0400)]
Merge pull request #1048 from dmick/master

setup.py: disallow requests 2.13.0

8 years agosetup.py: disallow requests 2.13.0 1048/head
Dan Mick [Thu, 16 Mar 2017 00:01:03 +0000 (17:01 -0700)]
setup.py: disallow requests 2.13.0

python-cinderclient and/or keystoneauth1 apparently don't want
requests 2.13.0, so make it illegal for now

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoorchestra.run: More consistently notice failures 1045/head
Zack Cerza [Fri, 10 Mar 2017 23:14:30 +0000 (16:14 -0700)]
orchestra.run: More consistently notice failures

check_status=True in combination with wait=False resulted in
CommandFailedError never being raised if poll() was used instead of
wait(). Fix that.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoorchestra.run: Notice when short-lived procs exit
Zack Cerza [Fri, 10 Mar 2017 16:41:46 +0000 (09:41 -0700)]
orchestra.run: Notice when short-lived procs exit

If a command exits immediately, there is a race between greenlet
completion (which flushes the ChannelFile buffers) and the call to
exit_status_ready(). Waiting for 0.1s on the greenlets removes the race
condition.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoorchestra.run: Fix bug in RemoteProcess.finished
Zack Cerza [Thu, 9 Mar 2017 22:27:31 +0000 (15:27 -0700)]
orchestra.run: Fix bug in RemoteProcess.finished

Previously, if wait() had not been called, we didn't have the returncode
attribute set. Set it from within the finished property if it isn't
already.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1044 from ceph/wip-vps-docs
Dan Mick [Wed, 8 Mar 2017 23:02:59 +0000 (15:02 -0800)]
Merge pull request #1044 from ceph/wip-vps-docs

docs: Update libvirt conf to match current vps_hosts list

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agodocs: Update libvirt conf to match current vps_hosts list 1044/head
David Galloway [Wed, 8 Mar 2017 20:07:41 +0000 (15:07 -0500)]
docs: Update libvirt conf to match current vps_hosts list

Signed-off-by: David Galloway <dgallowa@redhat.com>
8 years agoMerge pull request #1043 from ceph/wip-fix-1041
Zack Cerza [Tue, 7 Mar 2017 16:50:26 +0000 (09:50 -0700)]
Merge pull request #1043 from ceph/wip-fix-1041

provision.downburst: Fix a bug in PR #1041

8 years agoprovision.downburst: Fix a bug in PR #1041 1043/head
Zack Cerza [Tue, 7 Mar 2017 16:18:56 +0000 (09:18 -0700)]
provision.downburst: Fix a bug in PR #1041

Looks like an argument was dropped to a string format().

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1042 from dmick/wip-downburst-logging
Zack Cerza [Tue, 7 Mar 2017 00:57:30 +0000 (17:57 -0700)]
Merge pull request #1042 from dmick/wip-downburst-logging

provision: ctx does not always have 'config' attribute

8 years agoprovision: ctx does not always have 'config' attribute 1042/head
Dan Mick [Tue, 7 Mar 2017 00:14:30 +0000 (16:14 -0800)]
provision: ctx does not always have 'config' attribute

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoMerge pull request #1041 from dmick/wip-downburst-logging
Zack Cerza [Tue, 7 Mar 2017 00:05:22 +0000 (17:05 -0700)]
Merge pull request #1041 from dmick/wip-downburst-logging

Invoke downburst with logging

8 years agodownburst: always log output and error, check returncode for failure 1041/head
Dan Mick [Mon, 6 Mar 2017 23:41:36 +0000 (15:41 -0800)]
downburst: always log output and error, check returncode for failure

The more info the better; always log everything about the downburst
execution to the teuthology log.  Check for command failure by
checking for returncode != 0 rather than "presence of stderr", since
logging always happens to stderr.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoprovision: invoke downburst with -v and --logfile
Dan Mick [Mon, 6 Mar 2017 23:40:05 +0000 (15:40 -0800)]
provision: invoke downburst with -v and --logfile

Verbose output isn't verbose enough to matter, and can be helpful
tracking down weirdness.  Also, log to private file in case
downburst hangs mid-operation, to avoid having to do any
select() madness in teuthology.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoMerge pull request #1040 from ceph/wip-downburst-unlock
Dan Mick [Mon, 6 Mar 2017 21:43:12 +0000 (13:43 -0800)]
Merge pull request #1040 from ceph/wip-downburst-unlock

provision.downburst: Tweak destroy return code

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoprovision.downburst: Tweak destroy return code 1040/head
Zack Cerza [Mon, 6 Mar 2017 19:35:27 +0000 (19:35 +0000)]
provision.downburst: Tweak destroy return code

Specifically, if the instance doesn't even exist, consider the destroy
op to have succeeded.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1035 from ceph/wip-f25
Zack Cerza [Mon, 6 Mar 2017 16:19:28 +0000 (09:19 -0700)]
Merge pull request #1035 from ceph/wip-f25

bootstrap: Update required package names for Fedora 25

8 years agoMerge pull request #1039 from dmick/wip-clock
Zack Cerza [Fri, 3 Mar 2017 21:38:21 +0000 (14:38 -0700)]
Merge pull request #1039 from dmick/wip-clock

Include missing pieces from ntpd clock task change

8 years agontpdc is deprecated; use ntpq (in the ntp package everywhere) 1039/head
Dan Mick [Mon, 30 Jan 2017 21:25:57 +0000 (13:25 -0800)]
ntpdc is deprecated; use ntpq (in the ntp package everywhere)

See https://www.eecis.udel.edu/~mills/ntp/html/ntpdc.html

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoclock: handle different service names
Dan Mick [Fri, 13 Jan 2017 22:49:05 +0000 (14:49 -0800)]
clock: handle different service names

Restarting ntpd involves two different service names (thanks
again, distro maintainers, much appreciated)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoMerge pull request #1038 from tchaikov/wip-yaml-with-repo
Zack Cerza [Fri, 3 Mar 2017 18:10:46 +0000 (11:10 -0700)]
Merge pull request #1038 from tchaikov/wip-yaml-with-repo

docs/detailed_test_config.rst: fix the sample yaml

8 years agodocs/detailed_test_config.rst: fix the sample yaml 1038/head
Kefu Chai [Fri, 3 Mar 2017 10:54:22 +0000 (18:54 +0800)]
docs/detailed_test_config.rst: fix the sample yaml

so it is able to work with latest teuthology

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #1037 from ceph/wip-cloud-volume-retry
Dan Mick [Wed, 1 Mar 2017 23:43:57 +0000 (15:43 -0800)]
Merge pull request #1037 from ceph/wip-cloud-volume-retry

cloud.openstack: Also retry on BaseHTTPError

8 years agocloud.openstack: Also retry on BaseHTTPError 1037/head
Zack Cerza [Wed, 1 Mar 2017 23:20:35 +0000 (16:20 -0700)]
cloud.openstack: Also retry on BaseHTTPError

We attach volumes immediately after creating them; sometimes they are
still momentarily in the 'creating' state, causing the attach call to
throw a BaseHTTPError. When that happens, simply retry the request
instead of failing node creation, starting the entire cycle all over
again.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agobootstrap: Update required package names for Fedora 25 1035/head
David Galloway [Thu, 16 Feb 2017 22:04:26 +0000 (17:04 -0500)]
bootstrap: Update required package names for Fedora 25

Signed-off-by: David Galloway <dgallowa@redhat.com>
8 years agoMerge pull request #1012 from ceph/wip-libcloud
Dan Mick [Tue, 28 Feb 2017 20:55:54 +0000 (12:55 -0800)]
Merge pull request #1012 from ceph/wip-libcloud

libcloud provisioning backend with OpenStack support

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoMerge pull request #1034 from ceph/wip-nuke-nfsganesha
Zack Cerza [Fri, 24 Feb 2017 20:56:46 +0000 (13:56 -0700)]
Merge pull request #1034 from ceph/wip-nuke-nfsganesha

nuke: Remove nfs-ganesha repos

8 years agonuke: Remove nfs-ganesha repos 1034/head
David Galloway [Fri, 24 Feb 2017 19:01:12 +0000 (14:01 -0500)]
nuke: Remove nfs-ganesha repos

Fixes: http://tracker.ceph.com/issues/18974
Signed-off-by: David Galloway <dgallowa@redhat.com>
8 years agocloud.openstack: Exclude windows-specific sizes 1012/head
Zack Cerza [Tue, 21 Feb 2017 20:17:43 +0000 (13:17 -0700)]
cloud.openstack: Exclude windows-specific sizes

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agonuke: Do not log the namespace object
Zack Cerza [Fri, 10 Feb 2017 21:32:01 +0000 (14:32 -0700)]
nuke: Do not log the namespace object

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agocloud.openstack: Cache authentication tokens
Zack Cerza [Fri, 10 Feb 2017 17:25:53 +0000 (10:25 -0700)]
cloud.openstack: Cache authentication tokens

Constantly causing Keystone to regenerate auth tokens was the cause of
our hitting rate limits during testing. This will let us reuse auth
tokens - including across processes - to avoid hitting those limits.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoAdd cloud.util.AuthToken
Zack Cerza [Thu, 9 Feb 2017 20:50:20 +0000 (13:50 -0700)]
Add cloud.util.AuthToken

This provides a mechanism for caching OpenStack authentication tokens

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMove FileLock out of repo_utils
Zack Cerza [Thu, 9 Feb 2017 20:53:57 +0000 (13:53 -0700)]
Move FileLock out of repo_utils

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agocloud: Volume creation fails node creation
Zack Cerza [Thu, 9 Feb 2017 00:13:51 +0000 (17:13 -0700)]
cloud: Volume creation fails node creation

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agocloud: Retry failed requests in libcloud
Zack Cerza [Mon, 6 Feb 2017 21:16:13 +0000 (14:16 -0700)]
cloud: Retry failed requests in libcloud

It's common to see "429 Rate limit exceeded", at least with OVH. When we
encounter the exception associated with that exception, backoff and
retry for an interval before eventually giving up.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoupdatekeys: Fix argument parsing
Zack Cerza [Fri, 20 Jan 2017 20:00:38 +0000 (13:00 -0700)]
updatekeys: Fix argument parsing

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years ago.gitignore: .idea
Zack Cerza [Wed, 18 Jan 2017 21:03:03 +0000 (14:03 -0700)]
.gitignore: .idea

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoDon't unlock VMs when destroy fails
Zack Cerza [Tue, 10 Jan 2017 19:47:21 +0000 (12:47 -0700)]
Don't unlock VMs when destroy fails

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoUpdate SSH keys when creating VMs
Zack Cerza [Fri, 6 Jan 2017 20:47:59 +0000 (13:47 -0700)]
Update SSH keys when creating VMs

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoHandle VMs missing consoles safely
Zack Cerza [Fri, 6 Jan 2017 20:01:42 +0000 (13:01 -0700)]
Handle VMs missing consoles safely

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMake teuthology.lock a subpackage
Zack Cerza [Fri, 6 Jan 2017 00:04:11 +0000 (17:04 -0700)]
Make teuthology.lock a subpackage

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMore intelligently implement is_vm()
Zack Cerza [Thu, 5 Jan 2017 20:24:11 +0000 (13:24 -0700)]
More intelligently implement is_vm()

Move to using one decent implementation instead of multiple (some naive)
implementations

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoLocking changes needed for libcloud
Zack Cerza [Wed, 4 Jan 2017 17:22:44 +0000 (10:22 -0700)]
Locking changes needed for libcloud

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoAdd libcloud backend
Zack Cerza [Thu, 8 Dec 2016 03:30:57 +0000 (20:30 -0700)]
Add libcloud backend

Initially this supports OpenStack but will grow to support other methods
of cloud-like deployment. Some assuptions are made regarding supporting
infrastructure (FIXME document these)

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1033 from ceph/wip-wait-osd-timeout
Dan Mick [Fri, 24 Feb 2017 01:25:20 +0000 (17:25 -0800)]
Merge pull request #1033 from ceph/wip-wait-osd-timeout

misc.wait_until_osds_up(): timeout after 5min

8 years agomisc.wait_until_osds_up(): timeout after 5min 1033/head
Zack Cerza [Fri, 24 Feb 2017 00:36:05 +0000 (17:36 -0700)]
misc.wait_until_osds_up(): timeout after 5min

It doesn't make any sense to wait more than a few minutes for OSDs to
come up. If they take more than five minutes, fail the job.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1030 from tchaikov/wip-suite-sha1-for-workunit
Zack Cerza [Thu, 23 Feb 2017 23:34:36 +0000 (16:34 -0700)]
Merge pull request #1030 from tchaikov/wip-suite-sha1-for-workunit

suite: use "suite_hash" as the default sha1 for workunit

8 years agoMerge pull request #1032 from ceph/wip-nuke-samba
Zack Cerza [Thu, 23 Feb 2017 17:17:35 +0000 (10:17 -0700)]
Merge pull request #1032 from ceph/wip-nuke-samba

nuke: Remove samba repos

8 years agonuke: Remove samba repos 1032/head
David Galloway [Thu, 23 Feb 2017 16:21:47 +0000 (11:21 -0500)]
nuke: Remove samba repos

Fixes: http://tracker.ceph.com/issues/19061
Signed-off-by: David Galloway <dgallowa@redhat.com>
8 years agorun: allow using alternate suite repo 1030/head
Ilya Dryomov [Fri, 17 Feb 2017 11:56:20 +0000 (12:56 +0100)]
run: allow using alternate suite repo

Do the same thing we do for ceph repo to make ceph.git commit
1f82b9b9446d ("qa/tasks/workunit: use the suite repo for cloning
workunit") work for scheduled jobs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agosuite: use "suite_hash" as the default sha1 for workunit
Kefu Chai [Thu, 16 Feb 2017 06:59:21 +0000 (14:59 +0800)]
suite: use "suite_hash" as the default sha1 for workunit

as "workunits" reside in ceph/qa/workunits, it's more intuitive to
respect suite-branch option when cloning workunits.

Signed-off-by: Kefu Chai <kchai@redhat.com>
8 years agoMerge pull request #1027 from ceph/wip-ceph-ansible-boot
Zack Cerza [Fri, 10 Feb 2017 20:43:36 +0000 (13:43 -0700)]
Merge pull request #1027 from ceph/wip-ceph-ansible-boot

ceph_ansible: update pip and ansible versions

8 years agoMerge pull request #1013 from ceph/wip-clock
Zack Cerza [Thu, 9 Feb 2017 23:08:21 +0000 (16:08 -0700)]
Merge pull request #1013 from ceph/wip-clock

Change to use task 'clock' instead of 'clock.check'

8 years agoupdate default ansible version to 2.2.1 1027/head
Vasu Kulkarni [Thu, 9 Feb 2017 00:16:55 +0000 (16:16 -0800)]
update default ansible version to 2.2.1

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoupgrade pip first before setuptools
Vasu Kulkarni [Thu, 9 Feb 2017 00:15:38 +0000 (16:15 -0800)]
upgrade pip first before setuptools

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoMerge pull request #1018 from jan--f/wip-bootstrap-tumbleweed
Zack Cerza [Wed, 8 Feb 2017 22:11:04 +0000 (15:11 -0700)]
Merge pull request #1018 from jan--f/wip-bootstrap-tumbleweed

bootstrap: add support for opensuse tumbleweed.

8 years agoMerge pull request #1021 from ceph/wip-suite-default
Zack Cerza [Wed, 8 Feb 2017 22:10:22 +0000 (15:10 -0700)]
Merge pull request #1021 from ceph/wip-suite-default

teuthology-suite: Drop default for --ceph

8 years agoMerge pull request #1004 from SUSE/wip-17981
Zack Cerza [Wed, 8 Feb 2017 22:00:19 +0000 (15:00 -0700)]
Merge pull request #1004 from SUSE/wip-17981

nuke: Use pkill -KILL to unconditionally wipe out hadoop processes

8 years agoMerge pull request #1026 from dmick/master
Zack Cerza [Tue, 7 Feb 2017 20:22:40 +0000 (13:22 -0700)]
Merge pull request #1026 from dmick/master

prune: use the shortest time of -p or -r to decide on processing

8 years agoprune: use the shortest time of -p or -r to decide on processing 1026/head
Dan Mick [Tue, 7 Feb 2017 20:07:47 +0000 (12:07 -0800)]
prune: use the shortest time of -p or -r to decide on processing

invoking -p 7 -r 30 was only removing passed jobs 30 days or older.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoMerge pull request #1025 from ceph/wip-console-force-into-spy
Zack Cerza [Tue, 7 Feb 2017 17:06:09 +0000 (10:06 -0700)]
Merge pull request #1025 from ceph/wip-console-force-into-spy

console: force existing connections into spy mode if !readonly

8 years agoconsole: force existing connections into spy mode if !readonly 1025/head
Ilya Dryomov [Tue, 7 Feb 2017 09:55:45 +0000 (10:55 +0100)]
console: force existing connections into spy mode if !readonly

If someone watching the console didn't think of using "console -s", we
end up power cycling the node in an attempt to get the login prompt.
This is futile -- if the watcher is still there after the node comes
back up, our connection will get dropped to spy mode again.

Use -f to temporarily force existing connections into spy mode when we
attach to save a power cycle.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agoMerge pull request #1024 from ceph/wip-kclient-nuke-nosync
Zack Cerza [Fri, 3 Feb 2017 20:55:48 +0000 (13:55 -0700)]
Merge pull request #1024 from ceph/wip-kclient-nuke-nosync

nuke: work around a reboot -n trouble

8 years agonuke: work around a reboot -n trouble 1024/head
Ilya Dryomov [Fri, 3 Feb 2017 10:21:02 +0000 (11:21 +0100)]
nuke: work around a reboot -n trouble

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agonuke: improve stale_kernel_mount() check
Ilya Dryomov [Fri, 3 Feb 2017 09:59:43 +0000 (10:59 +0100)]
nuke: improve stale_kernel_mount() check

Commit 7db9e8b76fd5 ("nuke: bring stale kernel client handling back")
resurrected the check that was removed in commit 1d47a121b385 ("Fix
nuke, redo some cleanup functions").  It isn't sufficient though -- for
example, if a workunit already issued a umount, /etc/mtab won't have
a '^/dev/rbd' entry.

debugfs is enabled and mounted on all distros we care about.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agoMerge pull request #1022 from ceph/wip-unbreak-kclient-nuke
Zack Cerza [Wed, 1 Feb 2017 23:58:59 +0000 (16:58 -0700)]
Merge pull request #1022 from ceph/wip-unbreak-kclient-nuke

nuke: bring stale kernel client handling back

8 years agonuke: drop remove_kernel_mounts() 1022/head
Ilya Dryomov [Wed, 1 Feb 2017 19:37:49 +0000 (20:37 +0100)]
nuke: drop remove_kernel_mounts()

Calling remove_kernel_mounts() after reboot() is pretty useless.  Also,
as explained in the previous commit, there isn't much we can do in the
krbd case, so just drop it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agonuke: bring stale kernel client handling back
Ilya Dryomov [Wed, 1 Feb 2017 19:37:49 +0000 (20:37 +0100)]
nuke: bring stale kernel client handling back

Commit 1d47a121b385 ("Fix nuke, redo some cleanup functions") broke
stale kernel client map/mount handling by dropping reboot arguments.
While for kcephfs we can use 'umount -f' to avoid sync (it used to not
work, but is mostly fixed now, I believe), currently there is nothing
we can do for a local filesystem mounted on top of krbd.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agoteuthology-suite: Drop default for --ceph 1021/head
Zack Cerza [Wed, 25 Jan 2017 22:20:43 +0000 (15:20 -0700)]
teuthology-suite: Drop default for --ceph

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1020 from ceph/wip-waitpid
Dan Mick [Wed, 25 Jan 2017 21:23:10 +0000 (13:23 -0800)]
Merge pull request #1020 from ceph/wip-waitpid

Tell gevent not to patch os.waitpid()

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoTell gevent not to patch os.waitpid() 1020/head
Zack Cerza [Wed, 25 Jan 2017 21:08:35 +0000 (14:08 -0700)]
Tell gevent not to patch os.waitpid()

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agobootstrap: add support for opensuse tumbleweed. 1018/head
Jan Fajerski [Wed, 25 Jan 2017 13:38:39 +0000 (14:38 +0100)]
bootstrap: add support for opensuse tumbleweed.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
8 years agoMerge pull request #1017 from ceph/wip-manhole
Dan Mick [Tue, 24 Jan 2017 19:27:17 +0000 (11:27 -0800)]
Merge pull request #1017 from ceph/wip-manhole

Use manhole to provide a way to debug hung jobs

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoUse manhole to provide a way to debug hung jobs 1017/head
Zack Cerza [Tue, 24 Jan 2017 18:56:49 +0000 (11:56 -0700)]
Use manhole to provide a way to debug hung jobs

https://pypi.python.org/pypi/manhole

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1014 from jcsp/wip-18594
Zack Cerza [Tue, 24 Jan 2017 16:47:29 +0000 (09:47 -0700)]
Merge pull request #1014 from jcsp/wip-18594

pcp: use a timeout when downloading graphite graphs

8 years agopcp: use a timeout when downloading graphite graphs 1014/head
John Spray [Thu, 19 Jan 2017 13:47:59 +0000 (13:47 +0000)]
pcp: use a timeout when downloading graphite graphs

Fixes: http://tracker.ceph.com/issues/18597
Signed-off-by: John Spray <john.spray@redhat.com>
8 years agoMerge pull request #1011 from ceph/wip-kernel-kill-vsplitter
Zack Cerza [Tue, 17 Jan 2017 17:41:41 +0000 (10:41 -0700)]
Merge pull request #1011 from ceph/wip-kernel-kill-vsplitter

kernel: be more flexible about sha1 matching

8 years agokernel: be more flexible about sha1 matching 1011/head
Ilya Dryomov [Tue, 17 Jan 2017 14:03:46 +0000 (15:03 +0100)]
kernel: be more flexible about sha1 matching

Some rpm scripts don't allow dashes in the Release field, so let's
accept both -g and _g.  Kill _vsplitter() as Calxeda is no more.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
8 years agoclock: Fix clock's use of Remote.run() 1013/head
Dan Mick [Tue, 17 Jan 2017 00:11:30 +0000 (16:11 -0800)]
clock: Fix clock's use of Remote.run()

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoUse clock by default (instead of clock.check)
Dan Mick [Fri, 13 Jan 2017 22:49:05 +0000 (14:49 -0800)]
Use clock by default (instead of clock.check)

We're seeing clocks desynchronized.  My theory is that this might
be because ntp can take five minutes or more to actually sync the
clocks, and clock.check doesn't do any setting of the clocks, just
reporting.  clock, OTOH, stops ntpd and does an ntpdate, and then
restarts ntpd, which should kickstart it with a much-closer-to-correct
time.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
8 years agoMerge pull request #1009 from ceph/wip-archive
Dan Mick [Tue, 17 Jan 2017 00:19:41 +0000 (16:19 -0800)]
Merge pull request #1009 from ceph/wip-archive

Avoid a race condition with job archive creation

Reviewed-by: Dan Mick <dmick@redhat.com>
8 years agoFix a conflict with openstack requirements 1009/head
Zack Cerza [Mon, 16 Jan 2017 23:39:49 +0000 (16:39 -0700)]
Fix a conflict with openstack requirements

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoworker: Create job archive directories
Zack Cerza [Mon, 16 Jan 2017 23:16:41 +0000 (16:16 -0700)]
worker: Create job archive directories

... not just run archive directories. This is to resolve a race
condition between the job creating its archive directory and the worker
symlinking its log into that directory.

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agorun: Don't fail if the archive dir exists
Zack Cerza [Mon, 16 Jan 2017 23:14:49 +0000 (16:14 -0700)]
run: Don't fail if the archive dir exists

Signed-off-by: Zack Cerza <zack@redhat.com>
8 years agoMerge pull request #1006 from ceph/wip-cd-debug
Zack Cerza [Tue, 10 Jan 2017 22:47:01 +0000 (15:47 -0700)]
Merge pull request #1006 from ceph/wip-cd-debug

Remove high level of debug logging which is not required

8 years agoRemove high level of debug logging which is not required 1006/head
Vasu Kulkarni [Tue, 10 Jan 2017 17:45:45 +0000 (09:45 -0800)]
Remove high level of debug logging which is not required

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
8 years agoMerge pull request #1005 from ceph/wip-prune-compress
Dan Mick [Wed, 4 Jan 2017 00:09:42 +0000 (16:09 -0800)]
Merge pull request #1005 from ceph/wip-prune-compress

prune: Compress teuthology.log

Reviewed-by: Dan Mick <dmick@redhat.com>