suite: add config.suite_verify_ceph_hash when no gitbuilder
The suite_verify_ceph_hash configuration option is added to disable the
gitbuilder package verifications.
If True, teuthology-suite verifies that a package matching the ceph
branch exists in the gitbuilder. If False, no verification is done and
teuthology-suite assumes the packages are either not necessary to run
the task or they are created on demand.
Loic Dachary [Wed, 28 Oct 2015 23:20:46 +0000 (08:20 +0900)]
openstack: global server creation lock
An attempt to reduce the instance error rate on OVH. When all workers
are on the same machine, it makes them wait for an instance creation to
complete before running another one. If not, it is possible for 200
workers to run 200 server creation simultaneously. While this should be
throttled by OVH, these bursts may be the cause for occasional instance
creation errors: few tenants are likely to have such a pattern.
Loic Dachary [Mon, 26 Oct 2015 09:09:47 +0000 (18:09 +0900)]
openstack: server show must always return array
If cliff-tablib is not present, the output of openstack server show is a
dictionary, otherwise it is a list of Field/Value pairs. Since
cliff-tablib is not a required dependency of python-openstack client,
the output format of openstack server show may vary. Add cliff-tablib as
a dependency so that the output format is always the same.
Zack Cerza [Fri, 23 Oct 2015 22:00:06 +0000 (16:00 -0600)]
misc.sh(): Don't log.exception() before raise
log.exception() logs the traceback, and raise will also cause it to be
logged. There's no need to have it logged twice; additionally, when sh()
was being called within a try/except clause we were confusingly logging
an expected failure. Callers can choose to log if they want.
Loic Dachary [Thu, 22 Oct 2015 13:57:11 +0000 (15:57 +0200)]
openstack: implement OpenStack.set_provider
Setting the provider name depending on the OS_AUTH_URL content is
generally useful and moved to the OpenStack base class. There is a need
to cope with public OpenStack providers special cases, even if
temporarily (i.e. no volumes on OVH, no security group on RackSpace
etc.).
Loic Dachary [Thu, 22 Oct 2015 23:18:05 +0000 (01:18 +0200)]
openstack: reset the gitbuilder_host on stop
The package-repository instance is destroyed and requests to it will
timeout which takes time. Reverting to the default gitbuilder.ceph.com
is quicker and easier.
Loic Dachary [Thu, 22 Oct 2015 14:00:02 +0000 (16:00 +0200)]
openstack: resources hint is the max of all hints
Exactly one OpenStack resources hint can be included in a given job, as
part of an existing facet. It is error prone because it is sometimes not
trivial to figure out how a given job is composed and if two resources
hint are included only one of them will be taken into account which can
lead to problems difficult to diagnose. Another undesirable side effect
is to artificially increase resources usage. It is easier and more
reliable (from the test maintainer point of view) to increase the
resources of all jobs when a few need more RAM or disk rather than
trying to figure where to write the hints so that they are used by these
jobs and these jobs only.
Instead of being a fixed hint for a given job, the max of all hints
found in each facet is used. For instance, rados/thrash can have a facet
requiring that all jobs are given 3 devices.
And a job composed of rados/thrash/{cluster/openstack.yaml
tasks/bigworkunit.yaml} is aggregated as the max of all resources,
including the default, that is:
Zack Cerza [Thu, 22 Oct 2015 16:39:36 +0000 (10:39 -0600)]
OpenStack.exists(): Don't list every instance
Instead of "openstack server list", dumping the entire tentant's list of
instances, use "openstack server show" to show a single instance. While
"list" can accept a "--name" argument to filter, it does not have an
"--id" argument.
Zack Cerza [Mon, 19 Oct 2015 21:35:37 +0000 (15:35 -0600)]
Use safe_while to work around an OVH problem
We're seeing intermittent network failures when running inside OVH; they
present as:
https://github.com/kennethreitz/requests/issues/2364
This should help work around the issue.
openstack: clear the buildpackages directory on stop
When running /etc/init.d/teuthology stop, all OpenStack resources are
destroyed, including the instance hosting the repository where the
buildpackages task artefacts are archived. Remove /tmp/stampsdir so that
everything gets rebuilt.
config: ~/teuthology.yaml check_package_signatures hint
If check_package_signatures is false, the tasks installing
packages (install, ceph-deploy, ...) are authorized to skip the package
signatures verifications.
Set this as the default for a cluster dynamically generated by the
OpenStack backend.
Dan Mick [Fri, 9 Oct 2015 01:22:25 +0000 (18:22 -0700)]
run_tasks.py: fix Sentry URL
I don't know when or why it changed, but the existing URL format, which
uses '/search?q=<id>', fails; what works, by observing the web UI's URL
submission and by testing, is to omit the 'search' part of the path:
'/?q=<id>'
Loic Dachary [Thu, 8 Oct 2015 21:16:45 +0000 (23:16 +0200)]
install: split the upgrade_common function
The upgrade_common function implements a non trivial logic that defines
how overrides are applied to the install.upgrade task, as well as the
way upgrades are applied to the desired targets.
The function is split in two:
upgrade_common which remains the entry point
upgrade_remote_to_config which encapsulates the logic
This allows other parts of teuthology to obey the same logic by calling
the function instead of replicating it.
config: add ceph_git_url and ceph_qa_suite_git_url
The ~/.teuthology.yaml ceph_git_base_url configuration does not allow to
modify the URL of the Ceph repository without also modifying the URL of
the teuthology repository. Although it is frequently needed to point to
an alternate ceph or ceph-qa-suite repository, it is rarely necessary to
point to an alternate teuthology repository.
This is not a blocker: it is enough to mirror the teuthology,
ceph-cm-ansible, ceph-deploy and maybe a few other repositories to
satisfy this requirement. This is however inconvenient because the
exact list of repositories that need to be mirrored is not easily
accessible. In addition, unless the user is careful about updating the
mirrors prior to running teuthology, there is a good chance that an
obsolete version of the repository will be used and this may lead to
problems difficult to diagnose.
The git_ceph_url and git_ceph_qa_suite_url configuration variables are
added to specify the URL of the ceph and ceph-qa-suite repositories
without modifying the git_ceph_base_url value so that all other
repositories retain their default location.
For easier consumption within teuthology and ceph-qa-suite, the
get_git_ceph_url() and get_git_ceph_qa_suite_url() accessors are added
to the config class. They use the user provided value, if available, and
otherwise fallback to constructing the URL with git_ceph_base_url which
is the legacy behavior.
Loic Dachary [Tue, 6 Oct 2015 16:18:44 +0000 (18:18 +0200)]
misc: wait_until_osds_up must verify 'up' in state
It is not enough to count the number of entries in the osds
array, wait_until_osds_up must count which one are actually up by
checking if the string "up" is in the "state" array.
openstack: throttling helps the instance running the cluster
The instance throttling (not launching more than X instances per minute)
helps the instance running the teuthology cluster when running multiple
workers. The workload does not spike when launching a suite and that
allows to run more workers on a machine with the same hardware configuration.
openstack: do not rely on gitbuilder.ceph.com by default
Make it so the default when using the OpenStack backend is to build the
packages transparently using the OpenStack cluster instead of relying on
http://ceph.com/gitbuilder.cgi.
If a buildpackages task is found, ensure it is always before the install
task because it is intended to produce the packages that will be used by
the install task.