Sébastien Han [Mon, 23 Oct 2017 10:03:01 +0000 (12:03 +0200)]
Test ansible 2.4.1
We now test with Ansible 2.4. We had to change testinfra's version since
only recent versions work with 2.4. See:
https://github.com/philpep/testinfra/issues/249
Closes: https://github.com/ceph/ceph-ansible/issues/2087 Signed-off-by: Sébastien Han <seb@redhat.com>
John Fulton [Wed, 25 Oct 2017 23:46:02 +0000 (23:46 +0000)]
Make acls and mode parameters of opentack_keys optional
Only chmod or setfacl the requested keyring(s) in the
opentack_keys data structure when the mode or acls keys
of that data structure exist.
User may specify four permission combinations for the
keyring file(s): 1. only set ACL, 2. only set mode,
3. set neither mode nor ACL, 4. set mode and then ACL.
Sébastien Han [Thu, 26 Oct 2017 12:18:38 +0000 (14:18 +0200)]
purge: do not reboot by default
Rebooting servers is really intrusive and perhaps this is not what the
operator wants. So we disable the reboot by default now. Note that the
reboot might not happen all the time.
It can be enabled by default by running the purge playbook with -e
reboot_osd_node=True
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Mon, 23 Oct 2017 13:57:24 +0000 (14:57 +0100)]
Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
Sébastien Han [Fri, 20 Oct 2017 13:15:38 +0000 (15:15 +0200)]
osd: bring backward compatibility with old Jewel images
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Sébastien Han [Wed, 18 Oct 2017 16:03:30 +0000 (18:03 +0200)]
all: backward compatibility between stable-2.2 and 3.0
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
upgrade: fix upgrade jewel to luminous for nfs nodes
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
upgrade: fix upgrade jewel to luminous for mgr nodes
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
3a58757 introduced an issue for Jewel deployments, since this role is
skipped, `enabled_ceph_mgr_modules.stdout` doesn't exist, therefore, it
ends up with an attribute error.
Uses `.get()` to retrieve `stdout` with a default value so it won't fail
if this attribute doesn't exist (jewel).
In Jewel, we don't use bootstrap-rbd keyring for rbd-mirror nodes, it
results with a socket path/name different according to which ceph
release you are deploying.
Sébastien Han [Thu, 5 Oct 2017 14:22:04 +0000 (16:22 +0200)]
ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
ceph-defaults: fix handlers that are always triggered
Handlers are always triggered in ceph-ansible because ceph.conf file is
generated with a randomly order for the different keys/values pairs
in sections.
In python, a dict is not sorted. It means in our case each time we try
to generate the ceph.conf file it will be rendered with a random order
since the mecanism behind consist of rendering a file from a python dict
with keys/values. Therefore, as a quick workaround, forcing this dict to be
sorted before rendering the configuration file will ensure that it will be
rendered always the same way.
Sébastien Han [Wed, 27 Sep 2017 22:11:53 +0000 (00:11 +0200)]
site-docker.yml try to fetch images in //
The container deployment is serialized, adding this task as a best
effort. If docker is already present we pull the image otherwise we wait
for the role to play.
Sébastien Han [Thu, 12 Oct 2017 21:41:02 +0000 (23:41 +0200)]
ci: reboot with ansible instead of vagrant reload
vagrant is serialized and takes a lot of time compare to simple reboot.
See the benchmarks below for 3 VMs:
[leseb@rick docker]$ time ANSIBLE_SSH_ARGS="-F
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/vagrant_ssh_config" ansible-playbook -i /home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/hosts reboot.yml
PLAY [mons]
****************************************************************************************************************************************************************************************************
[leseb@rick docker]$ time vagrant reload
==> mon0: Halting domain...
==> mon0: Starting domain.
==> mon0: Waiting for domain to get an IP address...
==> mon0: Waiting for SSH to become available...
==> mon0: Creating shared folders metadata...
==> mon0: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon0: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon0: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon1: Halting domain...
==> mon1: Starting domain.
==> mon1: Waiting for domain to get an IP address...
==> mon1: Waiting for SSH to become available...
==> mon1: Creating shared folders metadata...
==> mon1: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon1: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon1: flag to force provisioning. Provisioners marked to run always
will still run.
==> mon2: Halting domain...
==> mon2: Starting domain.
==> mon2: Waiting for domain to get an IP address...
==> mon2: Waiting for SSH to become available...
==> mon2: Creating shared folders metadata...
==> mon2: Rsyncing folder:
/home/leseb/reproduce-ci/tmp.zgGC7d5mIC/build/workspace/ceph-ansible/tests/functional/centos/7/docker/
=> /home/vagrant/sync
==> mon2: Machine already provisioned. Run `vagrant provision` or use
the `--provision`
==> mon2: flag to force provisioning. Provisioners marked to run always
will still run.
real 1m31.850s
user 0m7.387s
sys 0m0.796s
Reboot via Ansible: 0m35.112s
Reboot via vagrant: 1m31.850s
Major Hayden [Wed, 11 Oct 2017 18:21:20 +0000 (13:21 -0500)]
Simplify NTP checks/install
This patch simplifies the checks and installation tasks for NTP.
Debian and Red Hat had a check for NTP's presence but would then
install NTP right afterwards anyways. In addition, there were
tasks for atomic that weren't used anywhere else in the role.
This patch also uses a dynamic include to reduce delays from
skipped tasks.
Major Hayden [Thu, 12 Oct 2017 16:43:29 +0000 (11:43 -0500)]
Enable profile_tasks callback plugin
This patch adds the `profile_tasks` callback plugin to the whitelist
so that we can identify the tasks which are taking the longest amount
of time to run.
Major Hayden [Thu, 12 Oct 2017 16:27:36 +0000 (11:27 -0500)]
Remove jinja2 delimiters from `when` keys
This patch changes the `when:` keys so that they have no jinja2
delimiters. This avoids Ansible warnings which could turn into
errors in a future Ansible release.