Zack Cerza [Thu, 5 Nov 2015 17:43:13 +0000 (10:43 -0700)]
Increase polling interval
When we used the filesystem, we polled for a run's status every 10s;
that was more aggressive than necessary. Increase to 60s to avoid
overloading paddles.
Loic Dachary [Wed, 4 Nov 2015 08:58:56 +0000 (09:58 +0100)]
openstack: poll every 30 seconds instead of 2
No cloud provider boots an instance within 30 seconds, no need to poll
aggressively. Also wait more than a total of 1200 seconds (3000 seconds)
because some providers may take more than 20 minutes to complete the
boot sequence.
Loic Dachary [Thu, 29 Oct 2015 03:01:49 +0000 (12:01 +0900)]
suite: add the ceph-qa-suite hash to the job config
To be able to re-run a job we need:
* the ceph repository and hash
* the ceph-qa-suite repository and hash
* the job description
When reading the job log from the archive directory, we currently only
have the ceph-qa-suite branch name. If and when the branch name moves
on, the job description may cease to be valid and re-running the job
may no longer be possible.
Loic Dachary [Wed, 28 Oct 2015 01:34:52 +0000 (10:34 +0900)]
openstack: get user-data path relative to the sources
The user-data files are in the sources, in the teuthology/openstack
directory. By default the template path being used is
teuthology/openstack/openstack-{os_type}-{os_version}-user-data.txt and
assumes teuthology-suite / teuthology-openstack is run from the root of
the source directory.
If the user-data path is relative, make it absolute based on the
directory in which the openstack module is found.
suite: add config.suite_verify_ceph_hash when no gitbuilder
The suite_verify_ceph_hash configuration option is added to disable the
gitbuilder package verifications.
If True, teuthology-suite verifies that a package matching the ceph
branch exists in the gitbuilder. If False, no verification is done and
teuthology-suite assumes the packages are either not necessary to run
the task or they are created on demand.
Loic Dachary [Fri, 30 Oct 2015 19:06:58 +0000 (04:06 +0900)]
openstack: more robust server deletion
When removing a server and the volumes attached to it, the following can
happen:
* the server goes in ERROR status instead of being deleted
* if removed before the server is deleted, the volume may fail detach
and consequently the deletion can fail with "still attached, detach
volume first"
* some volumes go in ERROR status instead of being deleted
This will cause problem when and if an instance is assigned the same IP
as an instance in ERROR state. They will both have the same name and
confuse teuthology.
Instead of assuming deletion is a reliable operation, assume it is a
best effort:
* rename the server and volumes to REMOVE-ME
* delete the server and then the volumes: ignore failures
The names instances and volumes in error state no longer conflict with
the name of running instances and can be dealt with at a later time,
and kept around for the cloud provider support to investigate.
Loic Dachary [Wed, 28 Oct 2015 23:20:46 +0000 (08:20 +0900)]
openstack: global server creation lock
An attempt to reduce the instance error rate on OVH. When all workers
are on the same machine, it makes them wait for an instance creation to
complete before running another one. If not, it is possible for 200
workers to run 200 server creation simultaneously. While this should be
throttled by OVH, these bursts may be the cause for occasional instance
creation errors: few tenants are likely to have such a pattern.
Loic Dachary [Mon, 26 Oct 2015 09:09:47 +0000 (18:09 +0900)]
openstack: server show must always return array
If cliff-tablib is not present, the output of openstack server show is a
dictionary, otherwise it is a list of Field/Value pairs. Since
cliff-tablib is not a required dependency of python-openstack client,
the output format of openstack server show may vary. Add cliff-tablib as
a dependency so that the output format is always the same.
Zack Cerza [Fri, 23 Oct 2015 22:00:06 +0000 (16:00 -0600)]
misc.sh(): Don't log.exception() before raise
log.exception() logs the traceback, and raise will also cause it to be
logged. There's no need to have it logged twice; additionally, when sh()
was being called within a try/except clause we were confusingly logging
an expected failure. Callers can choose to log if they want.
Loic Dachary [Thu, 22 Oct 2015 13:57:11 +0000 (15:57 +0200)]
openstack: implement OpenStack.set_provider
Setting the provider name depending on the OS_AUTH_URL content is
generally useful and moved to the OpenStack base class. There is a need
to cope with public OpenStack providers special cases, even if
temporarily (i.e. no volumes on OVH, no security group on RackSpace
etc.).
Loic Dachary [Thu, 22 Oct 2015 23:18:05 +0000 (01:18 +0200)]
openstack: reset the gitbuilder_host on stop
The package-repository instance is destroyed and requests to it will
timeout which takes time. Reverting to the default gitbuilder.ceph.com
is quicker and easier.
Loic Dachary [Thu, 22 Oct 2015 14:00:02 +0000 (16:00 +0200)]
openstack: resources hint is the max of all hints
Exactly one OpenStack resources hint can be included in a given job, as
part of an existing facet. It is error prone because it is sometimes not
trivial to figure out how a given job is composed and if two resources
hint are included only one of them will be taken into account which can
lead to problems difficult to diagnose. Another undesirable side effect
is to artificially increase resources usage. It is easier and more
reliable (from the test maintainer point of view) to increase the
resources of all jobs when a few need more RAM or disk rather than
trying to figure where to write the hints so that they are used by these
jobs and these jobs only.
Instead of being a fixed hint for a given job, the max of all hints
found in each facet is used. For instance, rados/thrash can have a facet
requiring that all jobs are given 3 devices.
And a job composed of rados/thrash/{cluster/openstack.yaml
tasks/bigworkunit.yaml} is aggregated as the max of all resources,
including the default, that is:
Zack Cerza [Thu, 22 Oct 2015 16:39:36 +0000 (10:39 -0600)]
OpenStack.exists(): Don't list every instance
Instead of "openstack server list", dumping the entire tentant's list of
instances, use "openstack server show" to show a single instance. While
"list" can accept a "--name" argument to filter, it does not have an
"--id" argument.