Sébastien Han [Fri, 18 May 2018 12:43:57 +0000 (14:43 +0200)]
defaults: restart_osd_daemon unit spaces
Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail
It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2f43e9dab5f077276162069f449978ea97c2e9c0) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
there is some leftover on devices when purging osds because of a invalid
device list construction.
typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
"changed": true,
"cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
"delta": "0:00:00.015188",
"end": "2018-05-16 12:41:40.408597",
"item": "/dev/sda sda1",
"rc": 0,
"start": "2018-05-16 12:41:40.393409"
}
Error: Could not stat device /dev/sda sda1 - No such file or directory.
```
the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output
For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`
Sébastien Han [Wed, 16 May 2018 15:37:10 +0000 (17:37 +0200)]
switch: disable ceph-disk units
During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 49a47124859e6577fb99e6dd680c5244ccd6f38f) Signed-off-by: Sébastien Han <seb@redhat.com>
take-over: fix bug when trying to override variable
A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.
Typical error:
```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```
Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.
Sébastien Han [Wed, 16 May 2018 14:02:41 +0000 (16:02 +0200)]
rolling_update: move osd flag section
During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071 Co-authored-by: Tomas Petr <tpetr@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d80a871a078a175d0775e91df00baf625dc39725)
Sébastien Han [Mon, 14 May 2018 07:21:48 +0000 (09:21 +0200)]
iscsi: add python-rtslib repository
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c7c11b774f54078b32b652481145699dbbd79ff) Signed-off-by: Sébastien Han <seb@redhat.com>
trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a145caf947aec64467150a007b7aafe57abe2891) Signed-off-by: Sébastien Han <seb@redhat.com>
Gregory Meno [Wed, 9 May 2018 18:17:26 +0000 (11:17 -0700)]
adds missing state needed to upgrade nfs-ganesha
in tasks for os_family Red Hat we were missing this
fixes: bz1575859 Signed-off-by: Gregory Meno <gmeno@redhat.com>
(cherry picked from commit 26f6a650425517216fb57c08e1a8bda39ddcf2b5) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 8 May 2018 14:11:14 +0000 (07:11 -0700)]
common: enable Tools repo for rhcs clients
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574458 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 07ca91b5cb7e213545687b8a62c421ebf8dd741d) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 3 May 2018 14:54:53 +0000 (16:54 +0200)]
common: copy iso files if rolling_update
If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4a186237e6fdc98f779c2e25985da4325b3b16cd) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 4 Apr 2018 14:23:54 +0000 (16:23 +0200)]
add .vscode/ to gitignore
I personally dev on vscode and I have some preferences to save when it
comes to running the python unit tests. So escaping this directory is
actually useful.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3c4319ca4b5355d69b2925e916420f86d29ee524) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 20 Apr 2018 09:13:51 +0000 (11:13 +0200)]
shrink-osd: ability to shrink NVMe drives
Now if the service name contains nvme we know we need to remove the last
2 character instead of 1.
If nvme then osd_to_kill_disks is nvme0n1, we need nvme0
If ssd or hdd then osd_to_kill_disks is sda1, we need sda
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 66c1ea8cd561fce6cfe5cdd1ecaa13411c824e3a) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 5 Apr 2018 08:28:51 +0000 (10:28 +0200)]
tox: use container latest tag for upgrades
Currently tag-build-master-luminous-ubuntu-16.04 is not used anymore.
Also now, 'latest' points to CentOS so we need to make that switch here
too.
We know have latest tags for each stable release so let's use them and
point tox at them to deploy the right version.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 14eff6b571eb760e8afcdfefc063f1af06342809) Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-defaults: fix ceph_uid fact on container deployments
Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7.
Because of this, the ceph_uid conditional passes for Debian
when 'ceph_docker_image_tag: latest' on RH deployments.
I've added an additional task to check for rhceph image specifically,
and also updated the RH family task for ceph/daemon [centos|fedora]tags.
Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit 127a643fd0ce4d66a5243b789ab0905e54e9d960) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 17 Apr 2018 13:59:52 +0000 (15:59 +0200)]
rhcs: re-add apt-pining
When installing rhcs on Debian systems the red hat repos must have the
highest priority so we avoid packages conflicts and install the rhcs
version.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1565850 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a98885a71ec63ff129d7001301a0323bfaadad8a) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 12 Apr 2018 10:15:35 +0000 (12:15 +0200)]
common: add tools repo for iscsi gw
To install iscsi gw packages we need to enable the tools repo.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1547849 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 37117071ebb7ab3cf68b607b6760077a2b46a00d) Signed-off-by: Sébastien Han <seb@redhat.com>
Ali Maredia [Mon, 2 Apr 2018 17:47:31 +0000 (13:47 -0400)]
nfs: ensure nfs-server server is stopped
NFS-ganesha cannot start is the nfs-server service
is running. This commit stops nfs-server in case it
is running on a (debian, redhat, suse) node before
the nfs-ganesha service starts up
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 01c58695fc344d65876b3acaea4f915f896401ac) Signed-off-by: Sébastien Han <seb@redhat.com>
backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 66c4118dcd0c8e7a7081bce5c8d6ba7752b959fd) Signed-off-by: Sébastien Han <seb@redhat.com>
Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]
common: upgrade/install ceph-test RPM first
Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.
The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.
In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.
When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.
Signed-off-by: Attila Fazekas <afazekas@redhat.com>
(cherry picked from commit ecd3563c2128553d4145a2f9c940ff31458c33b4) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]
ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.
This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f3caee84605e17f1fdfa4add634f0bf2c2cd510e) Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977
We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b73be254d249a23ac2eb2f86c4412ef296352a9) Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.
Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit ca572a11f1eb7ded5583c8d8b810a42db61cd98f) Signed-off-by: Sébastien Han <seb@redhat.com>
Ning Yao [Fri, 23 Mar 2018 15:48:16 +0000 (23:48 +0800)]
cleanup osd.conf.j2 in ceph-osd
osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.
Signed-off-by: Ning Yao <yaoning@unitedstack.com>
(cherry picked from commit 691ddf534989b4d27dc41997630b3307436835ea) Signed-off-by: Sébastien Han <seb@redhat.com>
Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]
ceph-defaults: set is_atomic variable
This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.
The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1e283bf69be8b9efbc1a7a873d91212ad57c7351) Signed-off-by: Sébastien Han <seb@redhat.com>
client: use `ceph_uid` fact to set uid/gid on admin key
That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d35bc9bde6502ffa81f3c77679cf3f418cd62ca) Signed-off-by: Sébastien Han <seb@redhat.com>
Giulio Fidente [Thu, 22 Feb 2018 18:57:47 +0000 (19:57 +0100)]
Make rule_name optional when defining items in openstack_pools
Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.
The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.
The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.
We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a8986459f2ed7e077390162d0df431a3321a478) Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Wed, 20 Dec 2017 03:49:16 +0000 (13:49 +1000)]
Restart services if handler called
This patch fixes an issue where if hosts have different service lists,
it will prevent restarting changes on services that run later on.
For example, hostA in the mons and rgws group would initiate a config
change and restart of services on all mons and rgws hosts, even though
a separate hostB (which is only in the rgws group) has not had its
configuration changed yet. Additionally, when the second host has its
coniguration changed as part of the ceph-rgw role, it will not initiate
a restart since its inventory name != the first hosts.
To fix this we should run the restart once (using run_once: True)
as long as the host has called the handler. This will ensure that even
if only 1 host has called the handler it will initiate a restart on all
hosts that have called the handler.
Additionally, we add a var that is set when the handler runs, this will
ensure that only hosts that have called the handler get restarted.
Includes minor fix to remove unrequired "inventory_hostname in
play_hosts" when: clause. This is no longer required since the handlers
were changed. The host calling the handler will be in play_hosts
already.
update: look for short and fqdn in ceph_health_raw
According to hostname configuration, the task waiting for mons to be in
quorum might fail.
The idea here is to look for both shortname and fqdn in
`ceph_health_raw` instead of just `ansible_hostname`
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c04e67347c284c2c127f09b201e8a293c5192e1f) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 14 Feb 2018 00:44:18 +0000 (01:44 +0100)]
container: osd remove run_once
When used along with delegate, run_once does not belong well. Thus,
using | last always brings the desired result.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c816a9282c8f778f18249827397901e04c040019) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 8 Feb 2018 16:35:05 +0000 (17:35 +0100)]
docker-common: fix container restart on new image
We now look for any excisting containers, if any we compare their
running image with the latest pulled container image.
For OSDs, we iterate over the list of running OSDs, this handles the
case where the first OSD of the list has been updated (runs the new
image) and not the others.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d47d02a5eb20067b5ae997ab18aeebe40b27cff0) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 13 Feb 2018 08:37:14 +0000 (09:37 +0100)]
default: remove duplicate code
This is already defined in ceph-defaults.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc195487c2a2c8764594403b388c3d4624443fe) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 9 Feb 2018 17:15:25 +0000 (18:15 +0100)]
test: add test for containers resources changes
We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7d690878df4e34f2003996697e8f623b49282578) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 9 Feb 2018 17:11:07 +0000 (18:11 +0100)]
test: add test for restart on new container image
Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 79864a8936e8c25ac66bba3cee48d7721453a6af) Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Fri, 9 Feb 2018 14:12:35 +0000 (14:12 +0000)]
Set application for OpenStack pools
Since Luminous we need to set the application tag for each pool,
otherwise a CEPH_WARNING is generated when the pools are in use.
We should assign the OpenStack pools to their default which would be
"rbd". When updating to Luminous this would happen automatically to the
vms, images, backups and volumes pools, but for new deploys this is not
the case.
Andrew Schoen [Fri, 9 Feb 2018 20:02:07 +0000 (14:02 -0600)]
infra: do not include host_vars/* in take-over-existing-cluster.yml
These are better collected by ansible automatically. This would also
fail if the host_var file didn't exist.
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 7c7017ebe66c70b1f3e06ee71466f30beb4eb2b0) Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 699c777e680655be12f53cabed626b28623f8160) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 8 Jan 2018 15:41:42 +0000 (16:41 +0100)]
containers: bump memory limit
A default value of 4GB for MDS is more appropriate and 3GB for OSD also.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 97f520bc7488b8e09d4057783049c8975fbc336e) Signed-off-by: Sébastien Han <seb@redhat.com>
The ansible inventory could have more than just ceph-ansible hosts, so
we shouldnt use "hosts: all", also only grab one file when getting
the ceph cluster name instead of failing when there is more than one
file in /etc/ceph. Also fix location of the ceph.conf template
It means the `ceph-run-osd.sh` script won't be able to start the
`osd_disk_activate` process in ceph-container because he won't have
filled the `$DOCKER_ENV` environment variable properly.
Adding `--net=host` to the 'disk_list' container fixes this issue.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e537779bb3cf73c569ce6c29ab8b20169cc5ffae) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 8 Feb 2018 13:51:15 +0000 (14:51 +0100)]
default: define 'osd_scenario' variable
osd_scenario does not exist in the ceph-default role so if we try to
play ceph-default on an OSD node, the playbook will fail with undefined
variable.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 22f843e3d4e7fa32f8cd74eaf36772445ed20c0d) Signed-off-by: Sébastien Han <seb@redhat.com>
Giulio Fidente [Fri, 2 Feb 2018 08:45:07 +0000 (09:45 +0100)]
Check for docker sockets named after both _hostname or _fqdn
While hostname -f will always return an hostname including its
domain part and -s without the domain part, the behavior when
no arguments are given can include or not include the domain part
depending on how the system is configured; the socket name might
not match the instance name then.
Major Hayden [Mon, 11 Dec 2017 15:56:56 +0000 (09:56 -0600)]
Convert interface names to underscores for facts
If a deployer uses an interface name with a dash/hyphen in it, such
as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2
template fails to find the right facts. It looks for
'ansible_br-storage' but only 'ansible_br_storage' exists.
This patch converts the interface name to underscores when the
template does the fact lookup.
the `zap ceph osd disks` task should iter on `resolved_parent_device`
instead of `combined_devices_list` which contain only the base device
name (vs. full path name in `combined_devices_list`).
this fixes the issue where docker complain about container name because
of illegal characters such as `/` :
```
"/usr/bin/docker-current: Error response from daemon: Invalid container
name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-]
are allowed.","See '/usr/bin/docker-current run --help'."
""
```
having the the basename of the device path is enough for the container
name.
Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```
The usual syntax:
```
local_action:
module: wait_for
port: 22
host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
state: started
delay: 10
timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.
This also fix a potential issue about missing quotation :
```
Traceback (most recent call last):
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
main()
File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
File "/usr/lib64/python2.7/shlex.py", line 279, in split
return list(lex) File "/usr/lib64/python2.7/shlex.py", line 269, in next
token = self.get_token()
File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
raw = self.read_token()
File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
raise ValueError, "No closing quotation"
ValueError: No closing quotation
```
writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit deaf273b25601991fc16712cc03820207125554f) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 30 Jan 2018 13:39:58 +0000 (14:39 +0100)]
config: remove any spaces in public_network or cluster_network
With two public networks configured - we found that with
"NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became
broken, trying to find docker registry on second network, and not
finding mon container.
but without spaces
"NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds
so, containerized install is more peculiar with formatting of this line
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6f9dd26caab18c4e4e98a78bc834f2fa5c255bc7) Signed-off-by: Sébastien Han <seb@redhat.com>
This is a typo caused by leftover.
It was previously written like this :
`shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")`
and has been rewritten to :
`shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")`
because we are appending later the '/dev/' in the next task.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137 Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f372a4232e830856399a25e55c2ce239ac086614) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 29 Jan 2018 13:28:23 +0000 (14:28 +0100)]
Do not search osd ids if ceph-volume
Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.
This happens always, regardless if the setup and deployment is correct.
Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.
How reproducible: 100%
Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster
Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
"attempts": 10,
"changed": false,
"cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
"delta": "0:00:00.002717",
"end": "2018-01-21 18:10:31.237933",
"failed": true,
"failed_when_result": false,
"rc": 0,
"start": "2018-01-21 18:10:31.235216"
}
STDOUT:
0
1
2
Expected results:
There aren't any (or just a few) timeouts while the OSDs are found
Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.
Andy McCrae [Sat, 27 Jan 2018 19:40:09 +0000 (19:40 +0000)]
Add default for radosgw_keystone_ssl
This should default to False. The default for Keystone is not to use PKI
keys, additionally, anybody using this setting had to have been manually
setting it before.
Eduard Egorov [Thu, 9 Nov 2017 11:49:00 +0000 (11:49 +0000)]
config: add host-specific ceph_conf_overrides evaluation and generation.
This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157.
Markos Chandras [Fri, 13 Oct 2017 09:18:27 +0000 (10:18 +0100)]
ceph-common: Don't check for ceph_stable_release for distro packages
When we consume the distribution packages, we don't have the choise on
which version to install, so we shouldn't require that variable to be
set. Distributions normally provide only one version of Ceph in the
official repositories so we get whatever they provide.
upgrade: skip luminous tasks for jewel minor update
These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.
osds: change default value for `dedicated_devices`
This is to keep backward compatibility with stable-2.2 and satisfy the
check "verify dedicated devices have been provided" in
`check_mandatory_vars.yml`. This check is looking for
`dedicated_devices` so we need to default it's value to
`raw_journal_devices` when `raw_multi_journal` is set to `True`.
Sébastien Han [Tue, 19 Dec 2017 17:54:19 +0000 (18:54 +0100)]
osd: skip devices marked as '/dev/dead'
On a non-collocated scenario, if a drive is faulty we can't really
remove it from the list of 'devices' without messing up or having to
re-arrange the order of the 'dedicated_devices'. We want to keep this
device list ordered. This will prevent the activation failing on a
device that we know is failing but we can't remove it yet to not mess up
the dedicated_devices mapping with devices.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6db4aea453b6371345b2a1db96ab449b34870235) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 17 Jan 2018 14:18:11 +0000 (15:18 +0100)]
rolling update: add mgr exception for jewel minor updates
When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items
-e jewel_minor_update=true
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8af745947695ff7dc543754db802ec57c3238adf) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 18 Jan 2018 09:06:34 +0000 (10:06 +0100)]
rgw: disable legacy unit
Some systems that were deployed with old tools can leave units named
"ceph-radosgw@radosgw.gateway.service". As a consequence, they will
prevent the new unit to start.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1509584 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f88795e8433f92ddc049d3e0d87e7757448e5005) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 16 Jan 2018 16:43:54 +0000 (17:43 +0100)]
common/docker-common: always start ntp
There is no need to only start ntp only if the package was present. If
the package is not present, we install it AND eventually activate + run
the service.
The original fix is part of this commit:
https://github.com/ceph/ceph-ansible/commit/849786967ac4c6235e624243019f0b54bf3340a4
However, this is a feature addition so it cannot be backported. Hence
this commit.
Sébastien Han [Thu, 21 Dec 2017 18:57:01 +0000 (19:57 +0100)]
ci: test on ansible 2.4.2
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7ba25b20dcb199f81666b34cae6c1b95c30b1033) Signed-off-by: Sébastien Han <seb@redhat.com>
Having handlers in both ceph-defaults and ceph-docker-common roles can make the
playbook restarting two times services. Handlers can be triggered first
time because of a change in ceph.conf and a second time because a new
image has been pulled.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b29a42cba6a4059b2c0035572d570c0812f48d16) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 15 Dec 2017 18:43:23 +0000 (19:43 +0100)]
container: restart container when there is a new image
This wasn't any good choice to implement this.
We had several options and none of them were ideal since handlers can
not be triggered cross-roles.
We could have achieved that by doing:
* option 1 was to add a dependancy in the meta of the ceph-docker-common
role. We had that long ago and we decided to stop so everything is
managed via site.yml
* option 2 was to import files from another role. This is messy and we
don't that anywhere in the current code base. We will continue to do so.
There is option 3 where we pull the image from the ceph-config role.
This is not suitable as well since the docker command won't be available
unless you run Atomic distro. This would also mean that you're trying to
pull twice. First time in ceph-config, second time in ceph-docker-common
The only option I came up with was to duplicate a bit of the ceph-config
handlers code.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8a19a83354cd8a4f9a729b3864850ec69be6d5da) Signed-off-by: Sébastien Han <seb@redhat.com>