]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agocommon: upgrade/install ceph-test RPM first v3.0.30
Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]
common: upgrade/install ceph-test RPM first

Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.

The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.

In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.

When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.

(cherry picked from commit 3752cc6f38dbf476845e975e6448225c0e103ad6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDeploying without managed monitors failed v3.0.29
Attila Fazekas [Wed, 4 Apr 2018 13:30:55 +0000 (15:30 +0200)]
Deploying without managed monitors failed

Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69ae18317cb0c1cc3e391a0bca5767eb6 .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
(cherry picked from commit ecd3563c2128553d4145a2f9c940ff31458c33b4)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-iscsi: fix certificates generation and distribution
Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]
ceph-iscsi: fix certificates generation and distribution

Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f3caee84605e17f1fdfa4add634f0bf2c2cd510e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodo not delegate facts on client nodes
Guillaume Abrioux [Wed, 21 Mar 2018 18:01:51 +0000 (19:01 +0100)]
do not delegate facts on client nodes

This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977

We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b73be254d249a23ac2eb2f86c4412ef296352a9)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
Randy J. Martinez [Thu, 29 Mar 2018 00:15:19 +0000 (19:15 -0500)]
ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.

This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit ca572a11f1eb7ded5583c8d8b810a42db61cd98f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocleanup osd.conf.j2 in ceph-osd
Ning Yao [Fri, 23 Mar 2018 15:48:16 +0000 (23:48 +0800)]
cleanup osd.conf.j2 in ceph-osd

osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
(cherry picked from commit 691ddf534989b4d27dc41997630b3307436835ea)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Alfredo Deza [Wed, 28 Mar 2018 20:40:04 +0000 (16:40 -0400)]
ceph-osd note that some scenarios use ceph-disk vs. ceph-volume

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 3fcf966803e35d7ba30e7c1b0ba78db94c664594)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-defaults: set is_atomic variable
Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]
ceph-defaults: set is_atomic variable

This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 6cffbd5409353fc1ce05b3a4a6246d6ef244e731)

7 years agoFix config_template to consistently order sections
Andy McCrae [Fri, 16 Mar 2018 15:24:53 +0000 (15:24 +0000)]
Fix config_template to consistently order sections

In ec042219e64a321fa67fce0384af76eeb238c645 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().

(cherry picked from commit fe4ba9d1353abb49775d5541060a55919978f45f)

7 years agocommon: run updatedb task on debian systems only v3.0.28
Sébastien Han [Thu, 1 Mar 2018 16:33:33 +0000 (17:33 +0100)]
common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit cb0f598965d0619dd4f44a8f991af539b67c6f38)

7 years agorgw: add cluster name option to the handler
Sébastien Han [Thu, 1 Mar 2018 15:50:06 +0000 (16:50 +0100)]
rgw: add cluster name option to the handler

If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7f19df81964c669f649d9f6eb5104022b421eea3)

7 years agoci: add copy_admin_key test to container scenario
Sébastien Han [Thu, 1 Mar 2018 15:47:37 +0000 (16:47 +0100)]
ci: add copy_admin_key test to container scenario

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fd94840a6ef130c6e142e9b5c5138bb11c621d37)

7 years agorgw: ability to copy ceph admin key on containerized
Sébastien Han [Thu, 1 Mar 2018 15:47:22 +0000 (16:47 +0100)]
rgw: ability to copy ceph admin key on containerized

If we now set copy_admin_key while running a containerized scenario, the
ceph admin key will be copied on the node.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9c85280602142fa1fb60c6f15c6d0c9e8c62d401)

7 years agorgw: run the handler on a mon host
Sébastien Han [Thu, 1 Mar 2018 15:46:01 +0000 (16:46 +0100)]
rgw: run the handler on a mon host

In case the admin wasn't copied over to the node this command would
fail. So it's safer to run it from a monitor directly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 67f46d8ec362b7b8aacb91e009e528b5e62d48ac)

7 years agotests: make CI jobs using 'ansible.cfg'
Guillaume Abrioux [Mon, 26 Feb 2018 13:35:36 +0000 (14:35 +0100)]
tests: make CI jobs using 'ansible.cfg'

The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1e283bf69be8b9efbc1a7a873d91212ad57c7351)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: use `ceph_uid` fact to set uid/gid on admin key v3.0.27
Guillaume Abrioux [Fri, 16 Feb 2018 08:04:23 +0000 (09:04 +0100)]
client: use `ceph_uid` fact to set uid/gid on admin key

That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d35bc9bde6502ffa81f3c77679cf3f418cd62ca)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomds: fix ansible_service_mgr typo
Grant Slater [Sun, 25 Feb 2018 01:44:07 +0000 (01:44 +0000)]
mds: fix ansible_service_mgr typo

This commit fixes a typo introduced by 4671b9e74e657988137f6723ef12e38c66d9cd40

(cherry picked from commit 1e1b26ca4d6f4ede84756003b9ffad851530e956)

7 years agoMake rule_name optional when defining items in openstack_pools
Giulio Fidente [Thu, 22 Feb 2018 18:57:47 +0000 (19:57 +0100)]
Make rule_name optional when defining items in openstack_pools

Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.

(cherry picked from commit a83e1aeea39b9c7ae2757b166f3def7d4f67f161)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: change ceph_docker_image_tag for 2nd run
Guillaume Abrioux [Fri, 23 Feb 2018 10:23:13 +0000 (11:23 +0100)]
tests: change ceph_docker_image_tag for 2nd run

The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.

The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.

We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a8986459f2ed7e077390162d0df431a3321a478)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: add tripleo scenario testing
Guillaume Abrioux [Fri, 16 Feb 2018 12:53:52 +0000 (13:53 +0100)]
ci: add tripleo scenario testing

This should help to see earlier any failure in a tripleo deployment scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 707458c979f17632d97c205e29524cadc9dec5b3)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRestart services if handler called
Andy McCrae [Wed, 20 Dec 2017 03:49:16 +0000 (13:49 +1000)]
Restart services if handler called

This patch fixes an issue where if hosts have different service lists,
it will prevent restarting changes on services that run later on.

For example, hostA in the mons and rgws group would initiate a config
change and restart of services on all mons and rgws hosts, even though
a separate hostB (which is only in the rgws group) has not had its
configuration changed yet. Additionally, when the second host has its
coniguration changed as part of the ceph-rgw role, it will not initiate
a restart since its inventory name != the first hosts.

To fix this we should run the restart once (using run_once: True)
as long as the host has called the handler. This will ensure that even
if only 1 host has called the handler it will initiate a restart on all
hosts that have called the handler.

Additionally, we add a var that is set when the handler runs, this will
ensure that only hosts that have called the handler get restarted.

Includes minor fix to remove unrequired "inventory_hostname in
play_hosts" when: clause. This is no longer required since the handlers
were changed. The host calling the handler will be in play_hosts
already.

(cherry picked from commit 59a4335a5639c9be12ee8a23805aaa14882b077e)
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1548357
7 years agoAdjust /etc/updatedb.conf to not parse /var/lib/ceph
Andy McCrae [Mon, 19 Feb 2018 18:13:21 +0000 (18:13 +0000)]
Adjust /etc/updatedb.conf to not parse /var/lib/ceph

Using updatedb -e doesnt make a permanent change, but will updatedb
without the passed path.

To make this change more permanent we should update the
/etc/updatedb.conf file to include /var/lib/ceph.

(cherry picked from commit 2779d2a850265d01b62b9d8b4db7c2b4ce8b8fec)

7 years agoupdate: look for short and fqdn in ceph_health_raw v3.0.26
Guillaume Abrioux [Fri, 16 Feb 2018 12:45:26 +0000 (13:45 +0100)]
update: look for short and fqdn in ceph_health_raw

According to hostname configuration, the task waiting for mons to be in
quorum might fail.
The idea here is to look for both shortname and fqdn in
`ceph_health_raw` instead of just `ansible_hostname`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c04e67347c284c2c127f09b201e8a293c5192e1f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainer: osd remove run_once v3.0.25
Sébastien Han [Wed, 14 Feb 2018 00:44:18 +0000 (01:44 +0100)]
container: osd remove run_once

When used along with  delegate, run_once does not belong well. Thus,
using | last always brings the desired result.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c816a9282c8f778f18249827397901e04c040019)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodocker-common: fix container restart on new image
Sébastien Han [Thu, 8 Feb 2018 16:35:05 +0000 (17:35 +0100)]
docker-common: fix container restart on new image

We now look for any excisting containers, if any we compare their
running image with the latest pulled container image.
For OSDs, we iterate over the list of running OSDs, this handles the
case where the first OSD of the list has been updated (runs the new
image) and not the others.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d47d02a5eb20067b5ae997ab18aeebe40b27cff0)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefault: remove duplicate code
Sébastien Han [Tue, 13 Feb 2018 08:37:14 +0000 (09:37 +0100)]
default: remove duplicate code

This is already defined in ceph-defaults.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc195487c2a2c8764594403b388c3d4624443fe)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: add test for containers resources changes
Sébastien Han [Fri, 9 Feb 2018 17:15:25 +0000 (18:15 +0100)]
test: add test for containers resources changes

We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7d690878df4e34f2003996697e8f623b49282578)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: add test for restart on new container image
Sébastien Han [Fri, 9 Feb 2018 17:11:07 +0000 (18:11 +0100)]
test: add test for restart on new container image

Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 79864a8936e8c25ac66bba3cee48d7721453a6af)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoSet application for OpenStack pools
Andy McCrae [Fri, 9 Feb 2018 14:12:35 +0000 (14:12 +0000)]
Set application for OpenStack pools

Since Luminous we need to set the application tag for each pool,
otherwise a CEPH_WARNING is generated when the pools are in use.

We should assign the OpenStack pools to their default which would be
"rbd". When updating to Luminous this would happen automatically to the
vms, images, backups and volumes pools, but for new deploys this is not
the case.

7 years agoinfra: do not include host_vars/* in take-over-existing-cluster.yml
Andrew Schoen [Fri, 9 Feb 2018 20:02:07 +0000 (14:02 -0600)]
infra: do not include host_vars/* in take-over-existing-cluster.yml

These are better collected by ansible automatically. This would also
fail if the host_var file didn't exist.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 7c7017ebe66c70b1f3e06ee71466f30beb4eb2b0)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling update: fix undefined jewel_minor_update failure
Andrew Schoen [Mon, 12 Feb 2018 20:52:27 +0000 (14:52 -0600)]
rolling update: fix undefined jewel_minor_update failure

Variables set at the play level with ``vars`` do
not carry over into the next play in the playbook.

The var jewel_minor_update was set in a previous play but
used in this one and was failing because it was not defined.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 699c777e680655be12f53cabed626b28623f8160)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainers: bump memory limit
Sébastien Han [Mon, 8 Jan 2018 15:41:42 +0000 (16:41 +0100)]
containers: bump memory limit

A default value of 4GB for MDS is more appropriate and 3GB for OSD also.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 97f520bc7488b8e09d4057783049c8975fbc336e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoinfra: fix take-over-existing-cluster.yml playbook
Caleb Boylan [Fri, 3 Nov 2017 16:54:54 +0000 (09:54 -0700)]
infra: fix take-over-existing-cluster.yml playbook

The ansible inventory could have more than just ceph-ansible hosts, so
we shouldnt use "hosts: all", also only grab one file when getting
the ceph cluster name instead of failing when there is more than one
file in /etc/ceph. Also fix location of the ceph.conf template

(cherry picked from commit 41d10a2f6496c216eaad87112a0794e51204c578)

7 years agoosd: fix osd restart when dmcrypt
Guillaume Abrioux [Thu, 8 Feb 2018 12:27:45 +0000 (13:27 +0100)]
osd: fix osd restart when dmcrypt

This commit fixes a bug that occurs especially for dmcrypt scenarios.

There is an issue where the 'disk_list' container can't reach the ceph
cluster because it's not launched with `--net=host`.

If this container can't reach the cluster, it will hang on this step
(when trying to retrieve the dm-crypt key) :

```
+common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \
client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \
/var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \
config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks
+common_functions.sh:452: open_encrypted_part(): base64 -d
+common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \
-luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb
```

It means the `ceph-run-osd.sh` script won't be able to start the
`osd_disk_activate` process in ceph-container because he won't have
filled the `$DOCKER_ENV` environment variable properly.

Adding `--net=host` to the 'disk_list' container fixes this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e537779bb3cf73c569ce6c29ab8b20169cc5ffae)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefault: define 'osd_scenario' variable
Sébastien Han [Thu, 8 Feb 2018 13:51:15 +0000 (14:51 +0100)]
default: define 'osd_scenario' variable

osd_scenario does not exist in the ceph-default role so if we try to
play ceph-default on an OSD node, the playbook will fail with undefined
variable.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 22f843e3d4e7fa32f8cd74eaf36772445ed20c0d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoCheck for docker sockets named after both _hostname or _fqdn v3.0.24
Giulio Fidente [Fri, 2 Feb 2018 08:45:07 +0000 (09:45 +0100)]
Check for docker sockets named after both _hostname or _fqdn

While hostname -f will always return an hostname including its
domain part and -s without the domain part, the behavior when
no arguments are given can include or not include the domain part
depending on how the system is configured; the socket name might
not match the instance name then.

(cherry picked from commit bdcc52b96dc1f9c99ce490117170f644623d4846)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: Fixed crush_rule_config for containerised deployment.
Greg Charot [Fri, 2 Feb 2018 14:12:18 +0000 (15:12 +0100)]
mon: Fixed crush_rule_config for containerised deployment.

Was called too early, container was not yet started so the commands failed.
Moved the section after include docker/main.yml

Signed-off-by: Greg Charot <gcharot@redhat.com>
(cherry picked from commit a6d1922a2e70c36036ff130dc6b6b942101379ba)

7 years agoConvert interface names to underscores for facts
Major Hayden [Mon, 11 Dec 2017 15:56:56 +0000 (09:56 -0600)]
Convert interface names to underscores for facts

If a deployer uses an interface name with a dash/hyphen in it, such
as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2
template fails to find the right facts. It looks for
'ansible_br-storage' but only 'ansible_br_storage' exists.

This patch converts the interface name to underscores when the
template does the fact lookup.

(cherry picked from commit 5676fa23b169e0ca3af7d4f9b804bbe90d1cccc6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-docker: fix ceph-osd-zap name container v3.0.23
Guillaume Abrioux [Fri, 2 Feb 2018 10:55:18 +0000 (11:55 +0100)]
purge-docker: fix ceph-osd-zap name container

the `zap ceph osd disks` task should iter on `resolved_parent_device`
instead of `combined_devices_list` which contain only the base device
name (vs. full path name in `combined_devices_list`).

this fixes the issue where docker complain about container name because
of illegal characters such as `/` :
```
"/usr/bin/docker-current: Error response from daemon: Invalid container
name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-]
are allowed.","See '/usr/bin/docker-current run --help'."
""
```

having the the basename of the device path is enough for the container
name.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3b2f6c34e42eae4033a209d819620211dc68c34b)

7 years agoceph-osd: respect nvme partitions when device is a disk.
Konstantin Shalygin [Tue, 28 Nov 2017 14:27:09 +0000 (21:27 +0700)]
ceph-osd: respect nvme partitions when device is a disk.

(cherry picked from commit d7dadc3e7b9d2e218d85784df72e4cd008ecb1ee)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosyntax: change local_action syntax v3.0.22
Guillaume Abrioux [Wed, 31 Jan 2018 08:23:28 +0000 (09:23 +0100)]
syntax: change local_action syntax

Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit deaf273b25601991fc16712cc03820207125554f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: do not use `shell` module when it is not needed
Guillaume Abrioux [Wed, 31 Jan 2018 08:31:11 +0000 (09:31 +0100)]
common: do not use `shell` module when it is not needed

There is no need here to use `shell` instead of `command`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dd0c98c5a2e9e26bca60e00564ea2018984545f6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoconfig: remove any spaces in public_network or cluster_network
Sébastien Han [Tue, 30 Jan 2018 13:39:58 +0000 (14:39 +0100)]
config: remove any spaces in public_network or cluster_network

With two public networks configured - we found that with
"NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became
broken, trying to find docker registry on second network, and not
finding mon container.

but without spaces
"NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds
so, containerized install is more peculiar with formatting of this line

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6f9dd26caab18c4e4e98a78bc834f2fa5c255bc7)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge: fix resolve parent device task
Guillaume Abrioux [Tue, 30 Jan 2018 16:27:53 +0000 (17:27 +0100)]
purge: fix resolve parent device task

This is a typo caused by leftover.
It was previously written like this :
`shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")`
and has been rewritten to :
`shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")`
because we are appending later the '/dev/' in the next task.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f372a4232e830856399a25e55c2ce239ac086614)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDo not search osd ids if ceph-volume
Sébastien Han [Mon, 29 Jan 2018 13:28:23 +0000 (14:28 +0100)]
Do not search osd ids if ceph-volume

Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.

This happens always, regardless if the setup and deployment is correct.

Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.

How reproducible: 100%

Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster

Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
    "attempts": 10,
    "changed": false,
    "cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
    "delta": "0:00:00.002717",
    "end": "2018-01-21 18:10:31.237933",
    "failed": true,
    "failed_when_result": false,
    "rc": 0,
    "start": "2018-01-21 18:10:31.235216"
}

STDOUT:

0
1
2

Expected results:
There aren't any (or just a few) timeouts while the OSDs are found

Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.

Basically this line:

    until: osd_id.stdout_lines|length == devices|unique|length

Means in this 2 OSD case it is trying to ensure the following incorrect condition:

    until: 2 == 0

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103
(cherry picked from commit 5132cc3de4780fdfb4fdeab7535c3bc50151aa6b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdd default for radosgw_keystone_ssl
Andy McCrae [Sat, 27 Jan 2018 19:40:09 +0000 (19:40 +0000)]
Add default for radosgw_keystone_ssl

This should default to False. The default for Keystone is not to use PKI
keys, additionally, anybody using this setting had to have been manually
setting it before.

Fixes: #2111
(cherry picked from commit 481173f20377b09d781ee6bc2d5b26c9d8637519)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "monitor_interface: document need to use monitor_address when using IPv6"
Guillaume Abrioux [Wed, 24 Jan 2018 13:06:47 +0000 (14:06 +0100)]
Revert "monitor_interface: document need to use monitor_address when using IPv6"

This reverts commit 10b91661ceef7992354032030c7c2673a90d40f4.

This reverts also the same comment added in
1359869497a44df0c3b4157f41453b84326b58e7

(cherry picked from commit f1232b33fd7a8da53aa2e1ad2b11ee16109633b3)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoconfig: add host-specific ceph_conf_overrides evaluation and generation.
Eduard Egorov [Thu, 9 Nov 2017 11:49:00 +0000 (11:49 +0000)]
config: add host-specific ceph_conf_overrides evaluation and generation.

This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157.

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
(cherry picked from commit 93e9f3723bb4bcf8004bbcea3213d72d11588899)

7 years agoceph-common: Don't check for ceph_stable_release for distro packages v3.0.21
Markos Chandras [Fri, 13 Oct 2017 09:18:27 +0000 (10:18 +0100)]
ceph-common: Don't check for ceph_stable_release for distro packages

When we consume the distribution packages, we don't have the choise on
which version to install, so we shouldn't require that variable to be
set. Distributions normally provide only one version of Ceph in the
official repositories so we get whatever they provide.

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit dd6ee72547a4eca22c8c9b8691b910c2cfa821d3)

7 years agoupgrade: skip luminous tasks for jewel minor update v3.0.20
Guillaume Abrioux [Thu, 25 Jan 2018 15:57:45 +0000 (16:57 +0100)]
upgrade: skip luminous tasks for jewel minor update

These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
(cherry picked from commit c7ec12d49ca3c3f936f4c7a34ef15c042ab0f699)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodefaults: avoid getting stuck (ceph --connect-timeout)
Guillaume Abrioux [Wed, 24 Jan 2018 17:49:41 +0000 (18:49 +0100)]
defaults: avoid getting stuck (ceph --connect-timeout)

Sometime the playbook gets stuck because even with `--connect-timeout=`
option, the connexion to the existing ceph cluster never timeout.

As a workaround, using `timeout` command provided by coreutils will
actually timeout if we can't connect to the cluster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ec16cbdb1af9069de09d4a2e2e88739c2c303350)

7 years agoansible: set ssh retry option to 5
Guillaume Abrioux [Tue, 23 Jan 2018 13:38:35 +0000 (14:38 +0100)]
ansible: set ssh retry option to 5

We noticed that sometime, ceph-ansible can fail with error :

`Failed to connect to the host via ssh:`

It can occurs after the task `restart firewalld` has been played.

Setting `retries` to 5 should prevent from unexcepted ssh failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bf564255626890973b7cc4e9622763471e561ea)

7 years agoosds: change default value for `dedicated_devices` v3.0.19
Guillaume Abrioux [Mon, 22 Jan 2018 13:28:15 +0000 (14:28 +0100)]
osds: change default value for `dedicated_devices`

This is to keep backward compatibility with stable-2.2 and satisfy the
check "verify dedicated devices have been provided" in
`check_mandatory_vars.yml`. This check is looking for
`dedicated_devices` so we need to default it's value to
`raw_journal_devices` when `raw_multi_journal` is set to `True`.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1536098
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9306a1789c95e5abb77260dde4d9cc3df900959f)

7 years agoosd: fix a typo in roles/ceph-osd/defaults/main.yml
Guillaume Abrioux [Tue, 7 Nov 2017 08:48:29 +0000 (09:48 +0100)]
osd: fix a typo in roles/ceph-osd/defaults/main.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 39b584e540570ef79af98c3c23fdce90f02a701c)

7 years agopurge-container: use lsblk to resolv parent device v3.0.18
Guillaume Abrioux [Wed, 17 Jan 2018 08:08:16 +0000 (09:08 +0100)]
purge-container: use lsblk to resolv parent device

Using `lsblk` to resolv the parent device is better than just removing the last
char when passing it to the zap container.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55298fa80cf542c3d9c0275f085b89fb0e6d61f2)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-container: remove awk usage in favor of blkid
Guillaume Abrioux [Wed, 17 Jan 2018 08:06:43 +0000 (09:06 +0100)]
purge-container: remove awk usage in favor of blkid

Avoid using `awk` to get the different devices from the partlabel.
Using `blkid` is more readable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58eb045d2fac02337ed47ead1cab9b4cc484a092)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: skip devices marked as '/dev/dead'
Sébastien Han [Tue, 19 Dec 2017 17:54:19 +0000 (18:54 +0100)]
osd: skip devices marked as '/dev/dead'

On a non-collocated scenario, if a drive is faulty we can't really
remove it from the list of 'devices' without messing up or having to
re-arrange the order of the 'dedicated_devices'. We want to keep this
device list ordered. This will prevent the activation failing on a
device that we know is failing but we can't remove it yet to not mess up
the dedicated_devices mapping with devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6db4aea453b6371345b2a1db96ab449b34870235)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling update: add mgr exception for jewel minor updates
Sébastien Han [Wed, 17 Jan 2018 14:18:11 +0000 (15:18 +0100)]
rolling update: add mgr exception for jewel minor updates

When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items

-e jewel_minor_update=true

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8af745947695ff7dc543754db802ec57c3238adf)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: disable legacy unit
Sébastien Han [Thu, 18 Jan 2018 09:06:34 +0000 (10:06 +0100)]
rgw: disable legacy unit

Some systems that were deployed with old tools can leave units named
"ceph-radosgw@radosgw.gateway.service". As a consequence, they will
prevent the new unit to start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1509584
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f88795e8433f92ddc049d3e0d87e7757448e5005)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agontp: followup cleanup
Sébastien Han [Tue, 16 Jan 2018 17:24:32 +0000 (18:24 +0100)]
ntp: followup cleanup

Checking if ntp is present is not needed anymore, these tasks are not
used anymore. So let's remove them

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon/docker-common: always start ntp
Sébastien Han [Tue, 16 Jan 2018 16:43:54 +0000 (17:43 +0100)]
common/docker-common: always start ntp

There is no need to only start ntp only if the package was present. If
the package is not present, we install it AND eventually activate + run
the service.

The original fix is part of this commit:
https://github.com/ceph/ceph-ansible/commit/849786967ac4c6235e624243019f0b54bf3340a4
However, this is a feature addition so it cannot be backported. Hence
this commit.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: test on ansible 2.4.2 v3.0.17
Sébastien Han [Thu, 21 Dec 2017 18:57:01 +0000 (19:57 +0100)]
ci: test on ansible 2.4.2

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7ba25b20dcb199f81666b34cae6c1b95c30b1033)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "tests: set CEPH_STABLE_RELEASE in ceph-build"
Guillaume Abrioux [Wed, 6 Dec 2017 14:18:42 +0000 (15:18 +0100)]
Revert "tests: set CEPH_STABLE_RELEASE in ceph-build"

This reverts commit 7a1d7d92ff4d6f38be9f11f4c26909b361b58f99.

(cherry picked from commit 73a20e9b50f9212f4e610ae021b23c8e010e9991)

7 years agohandlers: avoid duplicate handler
Guillaume Abrioux [Mon, 8 Jan 2018 09:00:25 +0000 (10:00 +0100)]
handlers: avoid duplicate handler

Having handlers in both ceph-defaults and ceph-docker-common roles can make the
playbook restarting two times services. Handlers can be triggered first
time because of a change in ceph.conf and a second time because a new
image has been pulled.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b29a42cba6a4059b2c0035572d570c0812f48d16)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainer: trigger handlers on systemd file change
Guillaume Abrioux [Mon, 8 Jan 2018 14:00:32 +0000 (15:00 +0100)]
container: trigger handlers on systemd file change

When a systemd unit file is changed we should trigger handlers to
restart the services.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70401f955b4ff9d6d922c113b833dbd8b8ce27a8)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainer: restart container when there is a new image
Sébastien Han [Fri, 15 Dec 2017 18:43:23 +0000 (19:43 +0100)]
container: restart container when there is a new image

This wasn't any good choice to implement this.
We had several options and none of them were ideal since handlers can
not be triggered cross-roles.
We could have achieved that by doing:

* option 1 was to add a dependancy in the meta of the ceph-docker-common
role. We had that long ago and we decided to stop so everything is
managed via site.yml

* option 2 was to import files from another role. This is messy and we
don't that anywhere in the current code base. We will continue to do so.

There is option 3 where we pull the image from the ceph-config role.
This is not suitable as well since the docker command won't be available
unless you run Atomic distro. This would also mean that you're trying to
pull twice. First time in ceph-config, second time in ceph-docker-common

The only option I came up with was to duplicate a bit of the ceph-config
handlers code.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8a19a83354cd8a4f9a729b3864850ec69be6d5da)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDocker image pull retry
Joe Talerico [Tue, 17 Oct 2017 19:09:03 +0000 (15:09 -0400)]
Docker image pull retry

This change sets a default timeout of 300s for the image pull. If the
image pull times out (300s), we will retry 3 times by default.

fixes 1954

(cherry picked from commit ab587642885f1f518fe14ee7f1c7fc8cbbbf29f0)

7 years agodefaults: rename check_socket files for containers
Guillaume Abrioux [Wed, 10 Jan 2018 08:08:01 +0000 (09:08 +0100)]
defaults: rename check_socket files for containers

When containerized deployment, we are not looking for a socket but for a
running container.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit acfbebe67e06d64a72a855b0c4d5fd2ee8bce03a)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainers: fix bug when looking for existing cluster
Guillaume Abrioux [Wed, 10 Jan 2018 09:18:27 +0000 (10:18 +0100)]
containers: fix bug when looking for existing cluster

When containerized deployment, `docker_exec_cmd` is not set before the
task which try to retrieve the current fsid is played, it means it
considers there is no existing fsid and try to generate a new one.

Typical error:

```
ok: [mon0 -> mon0] => {
    "changed": false,
    "cmd": [
        "ceph",
        "--connect-timeout",
        "3",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:00:00.179909",
    "end": "2018-01-09 10:36:58.759846",
    "failed": false,
    "failed_when_result": false,
    "rc": 1,
    "start": "2018-01-09 10:36:58.579937"
}

STDERR:

Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',)
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 900f447c82c722539c6eed74c98bf1988a001b3d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainer: change the way we force no logs inside the container
Sébastien Han [Tue, 9 Jan 2018 13:34:09 +0000 (14:34 +0100)]
container: change the way we force no logs inside the container

Previously we were using ceph_conf_overrides however this doesn't play
nice for softwares like TripleO that uses ceph_conf_overrides inside its
own code. For now, and since this is the only occurence of this, we can
ensure no logs through the ceph conf template.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1532619
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c2e04623a54007674ec60647a9e5ddd2da4f991b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: use crush rules for non-container too
Sébastien Han [Tue, 9 Jan 2018 12:54:50 +0000 (13:54 +0100)]
mon: use crush rules for non-container too

There is no reasons why we can't use crush rules when deploying
containers. So moving the inlcude in the main.yml so it can be called.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f0787e64da45fdbefb2ff1376a0705fadf6a502d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: set UPDATE_CEPH_DOCKER_IMAGE_TAG for jewel tests
Andrew Schoen [Fri, 5 Jan 2018 19:47:10 +0000 (13:47 -0600)]
test: set UPDATE_CEPH_DOCKER_IMAGE_TAG for jewel tests

We want to be explict here and update to luminous and not
the 'latest' tag.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit a8509fbc9c0328670224f608abea17d8e64257ab)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoswitch-to-containers: do not fail when stopping the nfs-ganesha service
Andrew Schoen [Fri, 5 Jan 2018 18:42:16 +0000 (12:42 -0600)]
switch-to-containers: do not fail when stopping the nfs-ganesha service

If we're working with a jewel cluster then this service will not exist.

This is mainly a problem with CI testing because our tests are setup to
work with both jewel and luminous, meaning that eventhough we want to
test jewel we still have a nfs-ganesha host in the test causing these
tasks to run.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b613321c210155f390d4ddb7dcda8dc685a6e9ea)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoswitch-to-containers: do not fail when stopping the ceph-mgr daemon
Andrew Schoen [Fri, 5 Jan 2018 18:37:36 +0000 (12:37 -0600)]
switch-to-containers: do not fail when stopping the ceph-mgr daemon

If we are working with a jewel cluster ceph mgr does not exist
and this makes the playbook fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 0b4b60e3c9cabbbda2883feb40a6f80763c66b50)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: do not fail the playbook if nfs-ganesha is not present
Andrew Schoen [Fri, 5 Jan 2018 16:06:53 +0000 (10:06 -0600)]
rolling_update: do not fail the playbook if nfs-ganesha is not present

The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 997edea271b713b29f896ebb87dc6df29a60488b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-cluster: clean some code v3.0.16
Guillaume Abrioux [Wed, 13 Dec 2017 14:23:47 +0000 (15:23 +0100)]
purge-cluster: clean some code

Avoid using regexp to match device

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c5b7b37105e0933f2f2c69441854e889fe932399)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: fix check gpt
Guillaume Abrioux [Tue, 19 Dec 2017 09:55:02 +0000 (10:55 +0100)]
osd: fix check gpt

the gpt label creation doesn't work even with parted module.
This commit fixes the gpt label creation by using parted command
instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 895949d6c463c227da3dd7250c2ae228ee269872)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-cluster: wipe disk using dd
Guillaume Abrioux [Wed, 13 Dec 2017 14:24:33 +0000 (15:24 +0100)]
purge-cluster: wipe disk using dd

`bluestore_purge_osd_non_container` scenario is failing because it
keeps old osd_uuid information on devices and cause the `ceph-disk activate`
to fail when trying to redeploy a new cluster after a purge.

typical error seen :

```
2017-12-13 14:29:48.021288 7f6620651d00 -1
bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label
bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid
770080e2-20db-450f-bc17-81b55f167982 does not match our fsid
f33efff0-2f07-4203-ad8d-8a0844d6bda0
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eeedefdf0207f04e67af490e03d895324ab609a1)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: always run ceph-create-keys
Sébastien Han [Wed, 20 Dec 2017 14:29:02 +0000 (15:29 +0100)]
mon: always run ceph-create-keys

ceph-create-keys is idempotent so it's not an issue to run it each time
we play ansible. This also fix issues where the 'creates' arg skips the
task and no keys get generated on newer version, e.g during an upgrade.

Closes: https://github.com/ceph/ceph-ansible/issues/2228
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0b55abe3d0fc6db6c93d963545781c05a31503bb)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: disable legacy rgw service unit
Sébastien Han [Thu, 21 Dec 2017 09:19:22 +0000 (10:19 +0100)]
rgw: disable legacy rgw service unit

When upgrading from OSP11 to OSP12 container, ceph-ansible attempts to
disable the RGW service provided by the overcloud image. The task
attempts to stop/disable ceph-rgw@{{ ansible-hostname }} and
ceph-radosgw@{{ ansible-hostname }}.service. The actual service name is
ceph-radosgw@radosgw.$name

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525209
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ad54e19262f3d523ad57ee39e64d6927b0c21dea)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agofix jewel scenarios on container
Sébastien Han [Wed, 20 Dec 2017 12:39:33 +0000 (13:39 +0100)]
fix jewel scenarios on container

When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 39f2bfd5d58bae3fef2dd4fca0b2bab2e67ba21f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosite-docker: ability to disable fact sharing
Sébastien Han [Tue, 19 Dec 2017 14:10:05 +0000 (15:10 +0100)]
site-docker: ability to disable fact sharing

When deploying with Ansible at large scale, the delegate_facts method
consumes a lot of memory on the host that is running Ansible. This can
cause various issues like memory exhaustion on that machine.
You can now run Ansible with "-e delegate_facts_host=False" to disable
the fact sharing.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c315f81dfe440945aaa90265cd3294fdea549942)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: do not require root to answer question
Sébastien Han [Fri, 15 Dec 2017 16:39:32 +0000 (17:39 +0100)]
rolling_update: do not require root to answer question

There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
  unnecessary anyways.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 200785832f3b56dd8c5766ec0b503c5d77b4a984)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: best effort if no device is found during activation
Sébastien Han [Mon, 18 Dec 2017 15:43:37 +0000 (16:43 +0100)]
osd: best effort if no device is found during activation

We have a scenario when we switch from non-container to containers. This
means we don't know anything about the ceph partitions associated to an
OSD. Normally in a containerized context we have files containing the
preparation sequence. From these files we can get the capabilities of
each OSD. As a last resort we use a ceph-disk call inside a dummy bash
container to discover the ceph journal on the current osd.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bbc79765f3e8b93b707b0f25f94e975c1bd85c66)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agonfs: fix package install for debian/suss systems
Sébastien Han [Tue, 19 Dec 2017 10:17:04 +0000 (11:17 +0100)]
nfs: fix package install for debian/suss systems

This resolves the following error:
E: There were unauthenticated packages and -y was used without
--allow-unauthenticated

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dfbef8361d3ac03788aa1f93b23907bc9595a730)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRename fact docker_version to ceph_docker_version
Christian Berendt [Tue, 12 Dec 2017 10:06:15 +0000 (11:06 +0100)]
Rename fact docker_version to ceph_docker_version

The name docker_version is very generic and is also used by other
roles. As a result, there may be name conflicts. To avoid this a
ceph_ prefix should be used for this fact. Since it is an internal
fact renaming is not a problem.

(cherry picked from commit 50a848dc408a35c02b934bfe1511cd8aaee259be)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: fix CI issue with ceph_uid fact
Guillaume Abrioux [Mon, 11 Dec 2017 17:48:13 +0000 (18:48 +0100)]
defaults: fix CI issue with ceph_uid fact

The CI complains because of `ceph_uid` fact which doesn't exist since
the docker image tag used in the CI doesn't match with this condition.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6a9b5c9632a39d290ebf707a21e98f17b064f198)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: move restapi template to config
Sébastien Han [Fri, 20 Oct 2017 09:14:13 +0000 (11:14 +0200)]
common: move restapi template to config

Closes: github.com/ceph/ceph-ansible/issues/1981
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ba5c6e66f03314d1b7263225e75f0f56c438db3b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoroles: ceph-mgr: Install the ceph-mgr package on SUSE
Markos Chandras [Thu, 14 Dec 2017 18:13:09 +0000 (18:13 +0000)]
roles: ceph-mgr: Install the ceph-mgr package on SUSE

The ceph-mgr package name is identical to RedHat so add the SUSE family
to the existing task.

(cherry picked from commit 162b7d2b23b72adabdae32275962409e19ba4e0b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: don't make `osd_pool_default_pg_num` mandatory
Guillaume Abrioux [Tue, 12 Dec 2017 10:28:36 +0000 (11:28 +0100)]
client: don't make `osd_pool_default_pg_num` mandatory

making `osd_pool_default_pg_num` mandatory is a bit agressive and is
unrelated when you just want to create users keyrings.

Closes: #2241
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a24fd1cfd9a2f5a5daa9bee1f533cd2da0cc8fe2)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: don't try to generate keys
Guillaume Abrioux [Tue, 12 Dec 2017 10:25:26 +0000 (11:25 +0100)]
client: don't try to generate keys

the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab1dd3027a4b9932e58f28b86ab46979eb1f1682)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodocker: add missing condition for selinux tasks
Guillaume Abrioux [Tue, 12 Dec 2017 13:55:02 +0000 (14:55 +0100)]
docker: add missing condition for selinux tasks

on `client` and `mds` roles, it tries to set selinux even on non rhel
based distributions.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 26afe46e1333df8bec554feb3f57ab8c60390655)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefault: look for the right return code on socket stat in-use
Sébastien Han [Thu, 14 Dec 2017 10:31:28 +0000 (11:31 +0100)]
default: look for the right return code on socket stat in-use

As reported in https://github.com/ceph/ceph-ansible/issues/2254, the
check with fuser is not ideal. If fuser is not available the return code
is 127. Here we want to make sure that we looking for the correct return
code, so 1.

Closes: https://github.com/ceph/ceph-ansible/issues/2254
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7eaf444328c8c381c673883913cf71b8ebe9d064)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMerge pull request #2243 from ceph/2226-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:45:52 +0000 (10:45 +0100)]
Merge pull request #2243 from ceph/2226-bkp

[skip ci] backport of 2226

7 years agoMerge pull request #2237 from ceph/2211-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:45:33 +0000 (10:45 +0100)]
Merge pull request #2237 from ceph/2211-bkp

Set tighter permissions on keyrings when containerized

7 years agoMerge pull request #2231 from ceph/doc_update-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:45:12 +0000 (10:45 +0100)]
Merge pull request #2231 from ceph/doc_update-bkp

fix the ansible version for the stable-3.0 branch

7 years agoMerge pull request #2222 from ceph/2221-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:44:43 +0000 (10:44 +0100)]
Merge pull request #2222 from ceph/2221-bkp

[skip ci] backport of 2221

7 years agoMerge pull request #2220 from squidboylan/bkp-2215
Guillaume Abrioux [Mon, 11 Dec 2017 09:44:09 +0000 (10:44 +0100)]
Merge pull request #2220 from squidboylan/bkp-2215

Backport of 2215

7 years agoMerge pull request #2218 from ceph/2202-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:43:44 +0000 (10:43 +0100)]
Merge pull request #2218 from ceph/2202-bkp

[skip ci] backport of 2202

7 years agoMerge pull request #2217 from ceph/2214-bkp
Guillaume Abrioux [Mon, 11 Dec 2017 09:43:23 +0000 (10:43 +0100)]
Merge pull request #2217 from ceph/2214-bkp

[skip ci] backport of 2214