the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab1dd3027a4b9932e58f28b86ab46979eb1f1682) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 14 Dec 2017 10:31:28 +0000 (11:31 +0100)]
default: look for the right return code on socket stat in-use
As reported in https://github.com/ceph/ceph-ansible/issues/2254, the
check with fuser is not ideal. If fuser is not available the return code
is 127. Here we want to make sure that we looking for the correct return
code, so 1.
Closes: https://github.com/ceph/ceph-ansible/issues/2254 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7eaf444328c8c381c673883913cf71b8ebe9d064) Signed-off-by: Sébastien Han <seb@redhat.com>
this task hangs because `{{ inventory_hostname }}` doesn't resolv to an
actual ip address.
Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']`
should fix this because it will reach the node with its actual IP
address.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aaaf980140832de694ef0ffe3282dabbf0b90081) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 22 Nov 2017 16:11:50 +0000 (17:11 +0100)]
common: install ceph-common on all the machines
Since some daemons now install their own packages the task checking the
ceph version fails on Debian systems. So the 'ceph-common' package must
be installed on all the machines.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bb7b29a9fcc33e7316bbe7dad3dc3cd5395ef8ab) Signed-off-by: Sébastien Han <seb@redhat.com>
John Fulton [Thu, 16 Nov 2017 16:29:59 +0000 (11:29 -0500)]
Make openstack_keys param support no acls list
A recent change [1] required that the openstack_keys
param always containe an acls list. However, it's
possible it might not contain that list. Thus, this
param sets a default for that list to be empty if it
is not in the structure as defined by the user.
Sébastien Han [Thu, 16 Nov 2017 13:55:08 +0000 (14:55 +0100)]
osd: fix bad activation for dmcrypt
We were activating dmcrypt devices with the wrong command. Basically the
first task execute the wrong activate command. The task fails but
continues because of the 'failed_when: false'. Then the right activation
sequence is being done by the next task.
John Fulton [Mon, 6 Nov 2017 22:24:48 +0000 (17:24 -0500)]
Set permissions and ACLs of OpenStack keys on all ceph-mons
If ceph-ansible deploys a Ceph cluster with "openstack_config: true"
and sets the openstack_keys map to have certain ACLs or permissions,
the requested ACLs or permissions are only set on one of the monitor
nodes [2] when they should be set on all of them.
This patch solves [3] the above issue by having the chmod and setfacl
tasks iterate the list of mon nodes (including the mon node that the
task was delegated to) to apply the chmod of setfacl to the keys in
openstack_keys.
Like 80d32dec, the path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510906 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 44df3f9102773c10011c82b5c1a20e7ae46e0001) Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-ansible is now being testing against ansible2.2 and ansible2.4. We
need to update tox.ini so we use the right version of testinfra
regarding which ansible version we are using.
purge-docker-cluster must remove all osd_disk_prepare logs in
`{{ ceph_osd_docker_run_script_path }}`, otherwise if you purge your
cluster and try to redeploy it, osds will fail to start since because it
will try to retrieve find a partition uuid which doesn't exist.
The path to the fact is not correct.
In any case, we will retrieve the IP address in hostvars, the variable
is the way we get the interface name according where it has been set
(eg.: inventory host file vs. group_vars/)
Add a missing test `test_rbd_mirror_service_is_running_from_luminous()`.
Also using bash -c "<cmd>" to make testinfra aware that later in
the upgrade process we are now running `luminous` ceph release so we
must skip the rbd tests related to `jewel` ceph release.
Sébastien Han [Wed, 8 Nov 2017 00:33:10 +0000 (11:33 +1100)]
osd: do not use dm when osd_auto_discovery
The current code will also return lvm devices such as /dev/dm-2, this
kind of device type is not supported by ceph-disk at the moment. Now we
just ignore them.
Sébastien Han [Thu, 2 Nov 2017 15:17:38 +0000 (16:17 +0100)]
osd: enhance backward compatibility
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.
Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d4ed9a2064e503ac4a4fe978cb9e196ca9150272) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 5 Oct 2017 14:22:04 +0000 (16:22 +0200)]
ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a53aa9e8b41606e2ff996f036a7a86679126cd92) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 23 Oct 2017 10:03:01 +0000 (12:03 +0200)]
Test ansible 2.4.1
We now test with Ansible 2.4. We had to change testinfra's version since
only recent versions work with 2.4. See:
https://github.com/philpep/testinfra/issues/249
Closes: https://github.com/ceph/ceph-ansible/issues/2087 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c4ad2477188c2d226a4ea2e0fa6693967d5b103c) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 27 Oct 2017 09:46:15 +0000 (11:46 +0200)]
default: remove dup variable
ceph_repository_type was declared multiple times. This commit fixes
this.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d2575c7f5e5520f6ee65c5007853b3248d2c7a10) Signed-off-by: Sébastien Han <seb@redhat.com>
John Fulton [Wed, 25 Oct 2017 23:46:02 +0000 (23:46 +0000)]
Make acls and mode parameters of opentack_keys optional
Only chmod or setfacl the requested keyring(s) in the
opentack_keys data structure when the mode or acls keys
of that data structure exist.
User may specify four permission combinations for the
keyring file(s): 1. only set ACL, 2. only set mode,
3. set neither mode nor ACL, 4. set mode and then ACL.
Sébastien Han [Thu, 26 Oct 2017 12:18:38 +0000 (14:18 +0200)]
purge: do not reboot by default
Rebooting servers is really intrusive and perhaps this is not what the
operator wants. So we disable the reboot by default now. Note that the
reboot might not happen all the time.
It can be enabled by default by running the purge playbook with -e
reboot_osd_node=True
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2837d0a22e258cee583f14e402a99d89c9a16cd6) Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Mon, 23 Oct 2017 13:57:24 +0000 (14:57 +0100)]
Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
Sébastien Han [Wed, 25 Oct 2017 13:45:37 +0000 (15:45 +0200)]
rgw/nfs: fix section duplication
Once and for all, hopefully...
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8670b45ef2cfcf35bac5d7f83b93099bfa1d9f9e) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 20 Oct 2017 13:15:38 +0000 (15:15 +0200)]
osd: bring backward compatibility with old Jewel images
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 968ef04324e9064fcecfe88bc5464ad9c2673a13) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 18 Oct 2017 16:03:30 +0000 (18:03 +0200)]
all: backward compatibility between stable-2.2 and 3.0
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4413511b6619e22007b7988ab9929d618e0dcd01) Signed-off-by: Sébastien Han <seb@redhat.com>
upgrade: fix upgrade jewel to luminous for nfs nodes
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 982326373b9474231015639eac8fc52a3b0878a3) Signed-off-by: Sébastien Han <seb@redhat.com>
upgrade: fix upgrade jewel to luminous for mgr nodes
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.