Sébastien Han [Thu, 8 Feb 2018 16:35:05 +0000 (17:35 +0100)]
docker-common: fix container restart on new image
We now look for any excisting containers, if any we compare their
running image with the latest pulled container image.
For OSDs, we iterate over the list of running OSDs, this handles the
case where the first OSD of the list has been updated (runs the new
image) and not the others.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d47d02a5eb20067b5ae997ab18aeebe40b27cff0) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 13 Feb 2018 08:37:14 +0000 (09:37 +0100)]
default: remove duplicate code
This is already defined in ceph-defaults.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc195487c2a2c8764594403b388c3d4624443fe) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 9 Feb 2018 17:15:25 +0000 (18:15 +0100)]
test: add test for containers resources changes
We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7d690878df4e34f2003996697e8f623b49282578) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 9 Feb 2018 17:11:07 +0000 (18:11 +0100)]
test: add test for restart on new container image
Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 79864a8936e8c25ac66bba3cee48d7721453a6af) Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Fri, 9 Feb 2018 14:12:35 +0000 (14:12 +0000)]
Set application for OpenStack pools
Since Luminous we need to set the application tag for each pool,
otherwise a CEPH_WARNING is generated when the pools are in use.
We should assign the OpenStack pools to their default which would be
"rbd". When updating to Luminous this would happen automatically to the
vms, images, backups and volumes pools, but for new deploys this is not
the case.
Andrew Schoen [Fri, 9 Feb 2018 20:02:07 +0000 (14:02 -0600)]
infra: do not include host_vars/* in take-over-existing-cluster.yml
These are better collected by ansible automatically. This would also
fail if the host_var file didn't exist.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 7c7017ebe66c70b1f3e06ee71466f30beb4eb2b0) Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 699c777e680655be12f53cabed626b28623f8160) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 8 Jan 2018 15:41:42 +0000 (16:41 +0100)]
containers: bump memory limit
A default value of 4GB for MDS is more appropriate and 3GB for OSD also.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 97f520bc7488b8e09d4057783049c8975fbc336e) Signed-off-by: Sébastien Han <seb@redhat.com>
The ansible inventory could have more than just ceph-ansible hosts, so
we shouldnt use "hosts: all", also only grab one file when getting
the ceph cluster name instead of failing when there is more than one
file in /etc/ceph. Also fix location of the ceph.conf template
It means the `ceph-run-osd.sh` script won't be able to start the
`osd_disk_activate` process in ceph-container because he won't have
filled the `$DOCKER_ENV` environment variable properly.
Adding `--net=host` to the 'disk_list' container fixes this issue.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e537779bb3cf73c569ce6c29ab8b20169cc5ffae) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 8 Feb 2018 13:51:15 +0000 (14:51 +0100)]
default: define 'osd_scenario' variable
osd_scenario does not exist in the ceph-default role so if we try to
play ceph-default on an OSD node, the playbook will fail with undefined
variable.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 22f843e3d4e7fa32f8cd74eaf36772445ed20c0d) Signed-off-by: Sébastien Han <seb@redhat.com>
Giulio Fidente [Fri, 2 Feb 2018 08:45:07 +0000 (09:45 +0100)]
Check for docker sockets named after both _hostname or _fqdn
While hostname -f will always return an hostname including its
domain part and -s without the domain part, the behavior when
no arguments are given can include or not include the domain part
depending on how the system is configured; the socket name might
not match the instance name then.
Major Hayden [Mon, 11 Dec 2017 15:56:56 +0000 (09:56 -0600)]
Convert interface names to underscores for facts
If a deployer uses an interface name with a dash/hyphen in it, such
as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2
template fails to find the right facts. It looks for
'ansible_br-storage' but only 'ansible_br_storage' exists.
This patch converts the interface name to underscores when the
template does the fact lookup.
the `zap ceph osd disks` task should iter on `resolved_parent_device`
instead of `combined_devices_list` which contain only the base device
name (vs. full path name in `combined_devices_list`).
this fixes the issue where docker complain about container name because
of illegal characters such as `/` :
```
"/usr/bin/docker-current: Error response from daemon: Invalid container
name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-]
are allowed.","See '/usr/bin/docker-current run --help'."
""
```
having the the basename of the device path is enough for the container
name.
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```
The usual syntax:
```
local_action:
module: wait_for
port: 22
host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
state: started
delay: 10
timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.
This also fix a potential issue about missing quotation :
```
Traceback (most recent call last):
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
main()
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
File "/usr/lib64/python2.7/shlex.py", line 279, in split
return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next
token = self.get_token()
File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
raw = self.read_token()
File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
raise ValueError, "No closing quotation"
ValueError: No closing quotation
```
writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit deaf273b25601991fc16712cc03820207125554f) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 30 Jan 2018 13:39:58 +0000 (14:39 +0100)]
config: remove any spaces in public_network or cluster_network
With two public networks configured - we found that with
"NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became
broken, trying to find docker registry on second network, and not
finding mon container.
but without spaces
"NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds
so, containerized install is more peculiar with formatting of this line
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6f9dd26caab18c4e4e98a78bc834f2fa5c255bc7) Signed-off-by: Sébastien Han <seb@redhat.com>
This is a typo caused by leftover.
It was previously written like this :
`shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")`
and has been rewritten to :
`shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")`
because we are appending later the '/dev/' in the next task.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f372a4232e830856399a25e55c2ce239ac086614) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 29 Jan 2018 13:28:23 +0000 (14:28 +0100)]
Do not search osd ids if ceph-volume
Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.
This happens always, regardless if the setup and deployment is correct.
Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.
How reproducible: 100%
Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster
Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
"attempts": 10,
"changed": false,
"cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
"delta": "0:00:00.002717",
"end": "2018-01-21 18:10:31.237933",
"failed": true,
"failed_when_result": false,
"rc": 0,
"start": "2018-01-21 18:10:31.235216"
}
STDOUT:
0
1
2
Expected results:
There aren't any (or just a few) timeouts while the OSDs are found
Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.
Andy McCrae [Sat, 27 Jan 2018 19:40:09 +0000 (19:40 +0000)]
Add default for radosgw_keystone_ssl
This should default to False. The default for Keystone is not to use PKI
keys, additionally, anybody using this setting had to have been manually
setting it before.
Eduard Egorov [Thu, 9 Nov 2017 11:49:00 +0000 (11:49 +0000)]
config: add host-specific ceph_conf_overrides evaluation and generation.
This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157.
Markos Chandras [Fri, 13 Oct 2017 09:18:27 +0000 (10:18 +0100)]
ceph-common: Don't check for ceph_stable_release for distro packages
When we consume the distribution packages, we don't have the choise on
which version to install, so we shouldn't require that variable to be
set. Distributions normally provide only one version of Ceph in the
official repositories so we get whatever they provide.
upgrade: skip luminous tasks for jewel minor update
These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.
osds: change default value for `dedicated_devices`
This is to keep backward compatibility with stable-2.2 and satisfy the
check "verify dedicated devices have been provided" in
`check_mandatory_vars.yml`. This check is looking for
`dedicated_devices` so we need to default it's value to
`raw_journal_devices` when `raw_multi_journal` is set to `True`.
Sébastien Han [Tue, 19 Dec 2017 17:54:19 +0000 (18:54 +0100)]
osd: skip devices marked as '/dev/dead'
On a non-collocated scenario, if a drive is faulty we can't really
remove it from the list of 'devices' without messing up or having to
re-arrange the order of the 'dedicated_devices'. We want to keep this
device list ordered. This will prevent the activation failing on a
device that we know is failing but we can't remove it yet to not mess up
the dedicated_devices mapping with devices.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6db4aea453b6371345b2a1db96ab449b34870235) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 17 Jan 2018 14:18:11 +0000 (15:18 +0100)]
rolling update: add mgr exception for jewel minor updates
When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items
-e jewel_minor_update=true
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8af745947695ff7dc543754db802ec57c3238adf) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 18 Jan 2018 09:06:34 +0000 (10:06 +0100)]
rgw: disable legacy unit
Some systems that were deployed with old tools can leave units named
"ceph-radosgw@radosgw.gateway.service". As a consequence, they will
prevent the new unit to start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1509584 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f88795e8433f92ddc049d3e0d87e7757448e5005) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 16 Jan 2018 16:43:54 +0000 (17:43 +0100)]
common/docker-common: always start ntp
There is no need to only start ntp only if the package was present. If
the package is not present, we install it AND eventually activate + run
the service.
The original fix is part of this commit:
https://github.com/ceph/ceph-ansible/commit/849786967ac4c6235e624243019f0b54bf3340a4
However, this is a feature addition so it cannot be backported. Hence
this commit.
Sébastien Han [Thu, 21 Dec 2017 18:57:01 +0000 (19:57 +0100)]
ci: test on ansible 2.4.2
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7ba25b20dcb199f81666b34cae6c1b95c30b1033) Signed-off-by: Sébastien Han <seb@redhat.com>
Having handlers in both ceph-defaults and ceph-docker-common roles can make the
playbook restarting two times services. Handlers can be triggered first
time because of a change in ceph.conf and a second time because a new
image has been pulled.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b29a42cba6a4059b2c0035572d570c0812f48d16) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 15 Dec 2017 18:43:23 +0000 (19:43 +0100)]
container: restart container when there is a new image
This wasn't any good choice to implement this.
We had several options and none of them were ideal since handlers can
not be triggered cross-roles.
We could have achieved that by doing:
* option 1 was to add a dependancy in the meta of the ceph-docker-common
role. We had that long ago and we decided to stop so everything is
managed via site.yml
* option 2 was to import files from another role. This is messy and we
don't that anywhere in the current code base. We will continue to do so.
There is option 3 where we pull the image from the ceph-config role.
This is not suitable as well since the docker command won't be available
unless you run Atomic distro. This would also mean that you're trying to
pull twice. First time in ceph-config, second time in ceph-docker-common
The only option I came up with was to duplicate a bit of the ceph-config
handlers code.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8a19a83354cd8a4f9a729b3864850ec69be6d5da) Signed-off-by: Sébastien Han <seb@redhat.com>
containers: fix bug when looking for existing cluster
When containerized deployment, `docker_exec_cmd` is not set before the
task which try to retrieve the current fsid is played, it means it
considers there is no existing fsid and try to generate a new one.
Sébastien Han [Tue, 9 Jan 2018 13:34:09 +0000 (14:34 +0100)]
container: change the way we force no logs inside the container
Previously we were using ceph_conf_overrides however this doesn't play
nice for softwares like TripleO that uses ceph_conf_overrides inside its
own code. For now, and since this is the only occurence of this, we can
ensure no logs through the ceph conf template.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1532619 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c2e04623a54007674ec60647a9e5ddd2da4f991b) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 9 Jan 2018 12:54:50 +0000 (13:54 +0100)]
mon: use crush rules for non-container too
There is no reasons why we can't use crush rules when deploying
containers. So moving the inlcude in the main.yml so it can be called.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f0787e64da45fdbefb2ff1376a0705fadf6a502d) Signed-off-by: Sébastien Han <seb@redhat.com>
Andrew Schoen [Fri, 5 Jan 2018 19:47:10 +0000 (13:47 -0600)]
test: set UPDATE_CEPH_DOCKER_IMAGE_TAG for jewel tests
We want to be explict here and update to luminous and not
the 'latest' tag.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit a8509fbc9c0328670224f608abea17d8e64257ab) Signed-off-by: Sébastien Han <seb@redhat.com>
Andrew Schoen [Fri, 5 Jan 2018 18:42:16 +0000 (12:42 -0600)]
switch-to-containers: do not fail when stopping the nfs-ganesha service
If we're working with a jewel cluster then this service will not exist.
This is mainly a problem with CI testing because our tests are setup to
work with both jewel and luminous, meaning that eventhough we want to
test jewel we still have a nfs-ganesha host in the test causing these
tasks to run.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b613321c210155f390d4ddb7dcda8dc685a6e9ea) Signed-off-by: Sébastien Han <seb@redhat.com>
Andrew Schoen [Fri, 5 Jan 2018 18:37:36 +0000 (12:37 -0600)]
switch-to-containers: do not fail when stopping the ceph-mgr daemon
If we are working with a jewel cluster ceph mgr does not exist
and this makes the playbook fail.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 0b4b60e3c9cabbbda2883feb40a6f80763c66b50) Signed-off-by: Sébastien Han <seb@redhat.com>
Andrew Schoen [Fri, 5 Jan 2018 16:06:53 +0000 (10:06 -0600)]
rolling_update: do not fail the playbook if nfs-ganesha is not present
The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 997edea271b713b29f896ebb87dc6df29a60488b) Signed-off-by: Sébastien Han <seb@redhat.com>
`bluestore_purge_osd_non_container` scenario is failing because it
keeps old osd_uuid information on devices and cause the `ceph-disk activate`
to fail when trying to redeploy a new cluster after a purge.
typical error seen :
```
2017-12-13 14:29:48.021288 7f6620651d00 -1
bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label
bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid 770080e2-20db-450f-bc17-81b55f167982 does not match our fsid f33efff0-2f07-4203-ad8d-8a0844d6bda0
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eeedefdf0207f04e67af490e03d895324ab609a1) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 20 Dec 2017 14:29:02 +0000 (15:29 +0100)]
mon: always run ceph-create-keys
ceph-create-keys is idempotent so it's not an issue to run it each time
we play ansible. This also fix issues where the 'creates' arg skips the
task and no keys get generated on newer version, e.g during an upgrade.
Closes: https://github.com/ceph/ceph-ansible/issues/2228 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0b55abe3d0fc6db6c93d963545781c05a31503bb) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 21 Dec 2017 09:19:22 +0000 (10:19 +0100)]
rgw: disable legacy rgw service unit
When upgrading from OSP11 to OSP12 container, ceph-ansible attempts to
disable the RGW service provided by the overcloud image. The task
attempts to stop/disable ceph-rgw@{{ ansible-hostname }} and
ceph-radosgw@{{ ansible-hostname }}.service. The actual service name is
ceph-radosgw@radosgw.$name
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525209 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ad54e19262f3d523ad57ee39e64d6927b0c21dea) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 20 Dec 2017 12:39:33 +0000 (13:39 +0100)]
fix jewel scenarios on container
When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 39f2bfd5d58bae3fef2dd4fca0b2bab2e67ba21f) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 19 Dec 2017 14:10:05 +0000 (15:10 +0100)]
site-docker: ability to disable fact sharing
When deploying with Ansible at large scale, the delegate_facts method
consumes a lot of memory on the host that is running Ansible. This can
cause various issues like memory exhaustion on that machine.
You can now run Ansible with "-e delegate_facts_host=False" to disable
the fact sharing.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c315f81dfe440945aaa90265cd3294fdea549942) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 15 Dec 2017 16:39:32 +0000 (17:39 +0100)]
rolling_update: do not require root to answer question
There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
unnecessary anyways.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 200785832f3b56dd8c5766ec0b503c5d77b4a984) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 18 Dec 2017 15:43:37 +0000 (16:43 +0100)]
osd: best effort if no device is found during activation
We have a scenario when we switch from non-container to containers. This
means we don't know anything about the ceph partitions associated to an
OSD. Normally in a containerized context we have files containing the
preparation sequence. From these files we can get the capabilities of
each OSD. As a last resort we use a ceph-disk call inside a dummy bash
container to discover the ceph journal on the current osd.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bbc79765f3e8b93b707b0f25f94e975c1bd85c66) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 19 Dec 2017 10:17:04 +0000 (11:17 +0100)]
nfs: fix package install for debian/suss systems
This resolves the following error:
E: There were unauthenticated packages and -y was used without
--allow-unauthenticated
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dfbef8361d3ac03788aa1f93b23907bc9595a730) Signed-off-by: Sébastien Han <seb@redhat.com>
The name docker_version is very generic and is also used by other
roles. As a result, there may be name conflicts. To avoid this a
ceph_ prefix should be used for this fact. Since it is an internal
fact renaming is not a problem.
Sébastien Han [Fri, 20 Oct 2017 09:14:13 +0000 (11:14 +0200)]
common: move restapi template to config
Closes: github.com/ceph/ceph-ansible/issues/1981 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ba5c6e66f03314d1b7263225e75f0f56c438db3b) Signed-off-by: Sébastien Han <seb@redhat.com>
the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab1dd3027a4b9932e58f28b86ab46979eb1f1682) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 14 Dec 2017 10:31:28 +0000 (11:31 +0100)]
default: look for the right return code on socket stat in-use
As reported in https://github.com/ceph/ceph-ansible/issues/2254, the
check with fuser is not ideal. If fuser is not available the return code
is 127. Here we want to make sure that we looking for the correct return
code, so 1.
Closes: https://github.com/ceph/ceph-ansible/issues/2254 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7eaf444328c8c381c673883913cf71b8ebe9d064) Signed-off-by: Sébastien Han <seb@redhat.com>
this task hangs because `{{ inventory_hostname }}` doesn't resolv to an
actual ip address.
Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']`
should fix this because it will reach the node with its actual IP
address.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aaaf980140832de694ef0ffe3282dabbf0b90081) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 22 Nov 2017 16:11:50 +0000 (17:11 +0100)]
common: install ceph-common on all the machines
Since some daemons now install their own packages the task checking the
ceph version fails on Debian systems. So the 'ceph-common' package must
be installed on all the machines.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bb7b29a9fcc33e7316bbe7dad3dc3cd5395ef8ab) Signed-off-by: Sébastien Han <seb@redhat.com>
John Fulton [Thu, 16 Nov 2017 16:29:59 +0000 (11:29 -0500)]
Make openstack_keys param support no acls list
A recent change [1] required that the openstack_keys
param always containe an acls list. However, it's
possible it might not contain that list. Thus, this
param sets a default for that list to be empty if it
is not in the structure as defined by the user.