git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

projects / ceph-ansible.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Guillaume Abrioux [Thu, 11 Apr 2019 08:01:15 +0000 (10:01 +0200)]

osd: remove variable osd_scenario

As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed283a7c4d5cc2f61184b5ca8c55e2b2)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 10 Apr 2019 11:33:57 +0000 (13:33 +0200)]

osd: remove legacy file

ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d5637fd8a3d76489da823f73401609b756afb6c)

commit | commitdiff | tree

Sébastien Han [Fri, 12 Oct 2018 16:32:40 +0000 (18:32 +0200)]

validate: only check device when they are devices

We only validate the devices that are passed if there is a list of
devices to validate.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2888c0825fe11176d3cd06fe3568fe340d6d6538)

commit | commitdiff | tree

Sébastien Han [Thu, 11 Oct 2018 16:01:10 +0000 (18:01 +0200)]

plugin: validate.py do not check osd_scenario

osd_scenario now defaults to lvm and should not be changed. So we don't
need to test it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 72211d4a247f7485dc24be05ba0d19aa9b6cb52d)

commit | commitdiff | tree

Sébastien Han [Thu, 11 Oct 2018 15:59:31 +0000 (17:59 +0200)]

plugin: validate lint

Make python linter happy.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e2467272f86bbba064ef3928b14f927a6b0fa3d4)

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 19:38:27 +0000 (15:38 -0400)]

doc: update osd scenario

This commits adds documentation for the lvm scenario and the deprecation
of collocated and non-collocated scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f8e1542423e82d622adc037bec8bf1d19a4a5510)

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 19:17:38 +0000 (15:17 -0400)]

osd: default osd_scenario to lvm

osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52df15895b2522b0bdfae4174aae8e5bafe4381b)

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 19:16:43 +0000 (15:16 -0400)]

validate: print a message for old scenarios

ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9ea1e494076c819cc623fca234ece79fd38883ba)

commit | commitdiff | tree

Sébastien Han [Tue, 2 Oct 2018 21:54:57 +0000 (23:54 +0200)]

osd: remove ceph-disk support

We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2a5aa062eae90b154e98c3c5f6d6a427c28bf97)

commit | commitdiff | tree

Dimitri Savineau [Tue, 9 Apr 2019 16:20:35 +0000 (12:20 -0400)]

tests: Add debug to ceph-override.json

It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872628607e37741c760aa31b88229f3da)

commit | commitdiff | tree

Dimitri Savineau [Tue, 9 Apr 2019 16:18:43 +0000 (12:18 -0400)]

tests/functional: use ceph-override.json symlink

We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18d748a5133216a2339f1989a0e0b27b)

commit | commitdiff | tree

Dimitri Savineau [Thu, 4 Apr 2019 13:33:05 +0000 (09:33 -0400)]

ceph-mds: Set application pool to cephfs

We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b9e6f6888449dbaeba0e2435606ca43)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 11 Apr 2019 07:16:28 +0000 (09:16 +0200)]

update: fix undefined error when no mgr group is declared

if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c1e4529b0e44329f561fc46ec96bd081cdd6d38c)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 10 Apr 2019 15:16:21 +0000 (17:16 +0200)]

osds: allow passing devices by path

ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e0adca7a45fe841e40eed774a15a6cc97e638e4)

commit | commitdiff | tree

Dimitri Savineau [Tue, 26 Feb 2019 14:16:37 +0000 (09:16 -0500)]

rgw: change default frontend on nautilus

As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d17b1b48b6d4259f88445a0752e1c13b4522ced0)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 9 Apr 2019 15:38:01 +0000 (17:38 +0200)]

mon: remove useless delegate_to

Let's use a condition to run this task only on the first mon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 631e5d31446e514a00428c345d0db9d4c9c9cdab)

commit | commitdiff | tree

Matthew Vernon [Wed, 27 Mar 2019 13:34:47 +0000 (13:34 +0000)]

UCA: Uncomment UCA variables in defaults, fix consequent breakage

The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a3dcc12683b55ae13d95bca6f15cd32)

commit | commitdiff | tree

Dimitri Savineau [Mon, 1 Apr 2019 16:12:52 +0000 (12:12 -0400)]

container-common: Enable docker on boot for ubuntu

docker daemon is automatically started during package installation
but the service isn't enabled on boot.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 37816570c6204ccc37279f7120309ad31cf3f5cb)

commit | commitdiff | tree

Dimitri Savineau [Fri, 15 Mar 2019 14:18:48 +0000 (10:18 -0400)]

rolling_update: Remove ceph aliases

ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 57b4e76d11e0e3bb58c299ae5d37617ea06e2c17)

commit | commitdiff | tree

Rishabh Dave [Tue, 12 Feb 2019 03:15:44 +0000 (08:45 +0530)]

allow adding a MDS to already deployed cluster

Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c0dfa9b61a36194006b55105bf30079172d26f5e)

commit | commitdiff | tree

Dimitri Savineau [Fri, 5 Apr 2019 19:04:45 +0000 (15:04 -0400)]

ceph-facts: use last ipv6 address for mon/rgw

When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd4b0ec7eb4502eb246b84c1fefa213ae04df151)

commit | commitdiff | tree

François Lafont [Sat, 6 Apr 2019 09:44:03 +0000 (11:44 +0200)]

ceph-rgw: Fix bad paths which depend on the clustername

The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.

Signed-off-by: flaf <francois.lafont.1978@gmail.com>
(cherry picked from commit 4c3e77d8690a7be4fb89f7292c51f8644faaeafa)

commit | commitdiff | tree

Guillaume Abrioux [Mon, 8 Apr 2019 11:56:01 +0000 (13:56 +0200)]

mgr: manage mgr modules when mgr and mon are collocated

When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :

`inventory_hostname == groups[mgr_group_name][0]`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177e3aed4132be3b1f175c85b412dbe97)

commit | commitdiff | tree

Guillaume Abrioux [Mon, 8 Apr 2019 11:34:59 +0000 (13:34 +0200)]

mgr: wait for all mgr to be available

before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:

```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```

It happens because all mgr are not yet available when trying to manage
with mgr modules.

Closes: #3100
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711428ee801fc04be2a94b36a5e342a00)

commit | commitdiff | tree

Guillaume Abrioux [Mon, 8 Apr 2019 13:09:47 +0000 (15:09 +0200)]

tests: fix update job

jenkins sets `CEPH_ANSIBLE_BRANCH` to `stable-4.0`, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Thu, 31 Jan 2019 20:43:21 +0000 (20:43 +0000)]

rgw multisite: add more than 1 rgw to the master or secondary zone

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5de9585c2639cc4741ee8f62bc2c854b)

commit | commitdiff | tree

fpantano [Wed, 3 Apr 2019 16:35:10 +0000 (18:35 +0200)]

Check ceph_health_raw.stdout value as string during mon bootstrap

According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4acb6e0bacddf52bb512bef74b013fa68)

commit | commitdiff | tree

Dimitri Savineau [Tue, 2 Apr 2019 14:39:42 +0000 (10:39 -0400)]

radosgw: Raise cpu limit to 8

In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05fe46933a1437501b0d8a5edb4ca2056)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 4 Apr 2019 02:09:12 +0000 (04:09 +0200)]

tests: add back testinfra testing

136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58d354e10335ca017fd8df4c427e38a6)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 4 Apr 2019 02:01:01 +0000 (04:01 +0200)]

tests: pin pytest-xdist to 1.27.0

looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211cc00b2cae14b018722f437c0091a2ef)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 3 Apr 2019 12:45:21 +0000 (14:45 +0200)]

tests: fix update scenario (stable-4.0)

perform upgrade from luminous to nautilus.

All dev* references are not needed anymore in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 2 Apr 2019 08:43:01 +0000 (10:43 +0200)]

purge: fix lvm-batch purge osd

`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01807383132c2897e331dcc665f062f8be0feeb8)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 2 Apr 2019 15:31:25 +0000 (17:31 +0200)]

tests: switch rhel-container-podman to nautilus

in stable-4.0 this should be set to nautilus.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 1 Apr 2019 20:02:28 +0000 (16:02 -0400)]

ceph-volume: Add PYTHONIOENCODING env variable

Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses
stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment
variables.
Thoses variables aren't set when using ansible.
Currently this commit breaks non containerized deployment on Ubuntu.

TASK [use ceph-volume to create bluestore osds] ********************
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - create
  - --bluestore
  - --data
  - /dev/sdb
  rc: 1
  stderr: |-
    Traceback (most recent call last):
    (...)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
    position 132: ordinal not in range(128)

Note that the task is failing on ansible side due to the stdout
decoding but the osd creation is successful.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e4229b794c354d9cc7a5951f00356ea3418c4)

commit | commitdiff | tree

Guillaume Abrioux [Mon, 1 Apr 2019 15:22:50 +0000 (17:22 +0200)]

tests: test idempotency only on all_daemons job

there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c5e97c5c983d02882919d4af2af48a6)

commit | commitdiff | tree

Dimitri Savineau [Fri, 29 Mar 2019 15:23:58 +0000 (11:23 -0400)]

tox: Set nautilus as default release

On stable-4.0 branch we don't want to use dev setup but stable
release (nautilus).
Also update the container image tag to reflect this change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Mar 2019 07:44:25 +0000 (08:44 +0100)]

remove all NBSPs on master branch

Similar to #3658

Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 27 Mar 2019 18:11:20 +0000 (14:11 -0400)]

container: Add python3-docker on Ubuntu bionic

When installing python-minimal on Ubuntu bionic, this will add the
/usr/bin/python symlink to the default python interpreter.
On bionic, this isn't python2 but python3.

$ /usr/bin/python --version
Python 3.6.7

The python docker library is only installed for python2 which causes
issues when running the purge-docker-cluster playbook. This playbook
uses the ansible docker modules and requires to have python bindings
installed on the remote host.
Without the bindings we can see python error reported by the docker
module.

msg: Failed to import docker or docker-py - No module named 'docker'.
Try `pip install docker` or `pip install docker-py` (Python 2.6)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 26 Mar 2019 18:57:54 +0000 (14:57 -0400)]

tests/functional: Use the ansible reboot module

Ansible 2.7 introduces the reboot module so we don't need to use the
shell/reboot + wait_for tasks.

https://docs.ansible.com/ansible/latest/modules/reboot_module.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 26 Mar 2019 19:22:41 +0000 (15:22 -0400)]

tox: Fix container purge jobs

On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 12 Mar 2019 15:22:03 +0000 (11:22 -0400)]

rolling_update: Update systemd unit regex for nvme

The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 22 Mar 2019 19:51:35 +0000 (15:51 -0400)]

travis: Remove galaxy lint rules repository

The galaxy-lint-rules github repository isn't used anymore and has
been archived.
All the rules are now part of the ansible-lint project.

https://github.com/ansible/galaxy-lint-rules
https://github.com/ansible/ansible-lint

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 22 Mar 2019 19:03:15 +0000 (15:03 -0400)]

Add uca to ceph_repository choices validation

Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Mar 2019 14:10:23 +0000 (15:10 +0100)]

rgw: fix a typo

ee2d52d33df2a311cdf0ff62abd353fccb3affbc introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Mar 2019 14:08:22 +0000 (15:08 +0100)]

rgw: cleanup legacy task

this task was here for backward compatibility.
It's time to remove it in the next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Mar 2019 14:04:50 +0000 (15:04 +0100)]

rgw: add a retry on pool related tasks

sometimes those tasks might fail because of a timeout.
I've been facing this several times in the CI, adding this retry might
help and won't hurt in any case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Mar 2019 13:50:09 +0000 (14:50 +0100)]

update: followup on edfdc49

all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Mar 2019 08:48:48 +0000 (09:48 +0100)]

update: add containerized deployment upgrade support (L->N)

Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 21 Mar 2019 12:28:33 +0000 (13:28 +0100)]

update: add missing hosts in facts gathering

iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 21 Mar 2019 08:00:02 +0000 (09:00 +0100)]

update: remove rbdmirror legacy task

This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 18:11:32 +0000 (19:11 +0100)]

update: show all daemons version at the end

Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 17:34:47 +0000 (18:34 +0100)]

facts: retrieve fsid during rolling_update playbook

otherwise it generates a new cluster fsid and makes the upgrade failing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 16:53:22 +0000 (17:53 +0100)]

mon: fetch initial keyring even when running rolling_update

otherwise, the task to copy mgr keyring fails during the rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 12:42:00 +0000 (13:42 +0100)]

tests: split tox configuration into multiple pieces

This file is becoming too big, let's isolate the update related code in
a dedicated tox configuration file.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 12:25:26 +0000 (13:25 +0100)]

update: enable new nautilus-only functionality

once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 12:22:46 +0000 (13:22 +0100)]

update: enable msgr2 protocol

This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 10:44:11 +0000 (11:44 +0100)]

update: ensure mgrs are upgraded after ALL monitors

As of 1c760904b0bd1b6b0f49d6ac19d87d79f185c18f, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 10:38:28 +0000 (11:38 +0100)]

update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present

This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 10:35:36 +0000 (11:35 +0100)]

update: mask systemd service units during upgrade

This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 08:46:21 +0000 (09:46 +0100)]

update: set osd flags only once

There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 08:31:12 +0000 (09:31 +0100)]

update: fix tasks waiting for the node to join the quorum

We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 08:25:49 +0000 (09:25 +0100)]

update: remove an old parameter in ceph_key module call

the `containerized` parameter in ceph_key module doesn't exist anymore.
This was making the module failing but was hidden because of the
`ignore_errors: True`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 08:14:17 +0000 (09:14 +0100)]

ceph_key: `lookup_ceph_initial_entities` shouldn't fail on update

As of nautilus, the initial keyrings list has changed, it means when
upgrading from Luminous or Mimic, it is expected there's a mismatch
between what is found on the cluster and the expected initial keyring
list hardcoded in ceph_key module. We shouldn't fail when upgrading to
nautilus.

str_to_bool() took from ceph-volume.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Alfredo Deza <adeza@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Mar 2019 06:46:23 +0000 (07:46 +0100)]

handlers: do not trigger handlers on rolling_update

rolling_update playbook already takes care of stopping/starting services
during the sequence. There's no need to trigger potential unwanted
services restart.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 20 Mar 2019 19:30:46 +0000 (15:30 -0400)]

ceph-osd: Ensure lvm2 is installed

When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.

Resolves: #3728

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Bruceforce [Tue, 19 Mar 2019 17:23:56 +0000 (18:23 +0100)]

ceph_crush: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Please backport to stable-3.2

Signed-off-by: Bruceforce <markus.greis@gmx.de>

commit | commitdiff | tree

Phuong Nguyen [Wed, 6 Mar 2019 02:38:50 +0000 (13:38 +1100)]

Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file.

Also fixed rhcs_edits.txt for variable ceph_docker_registry.

Moved namespace to ceph_docker_image variable.

Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Sat, 9 Mar 2019 07:55:12 +0000 (08:55 +0100)]

osd: backward compatibility with old disk_list.sh location

Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 14 Mar 2019 20:22:01 +0000 (16:22 -0400)]

ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet

When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 15 Mar 2019 15:30:15 +0000 (11:30 -0400)]

ceph-common: Install yum plugin priorities

When using community repository we need to set the priority on the
ceph repositories because we could have some conflict with EPEL
packages.
In order to set the priority on the ceph repositories, we need to
install the yum-plugin-priorities package.

http://docs.ceph.com/docs/master/install/get-packages/#rpm-packages

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 14 Mar 2019 21:03:16 +0000 (22:03 +0100)]

Revert "site.yml: run ceph-validate before facts/defaults roles"

This commit wasn't making any sense and should have never got merged.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Rishabh Dave [Mon, 11 Mar 2019 10:49:32 +0000 (16:19 +0530)]

don't append path components while calling os.path.join()

This creates a confusion whether directory/file names are being
formed by appendng strings or path components are being appended.
Since latter should never be done manually, get rid of the statements
creating confusion.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

commit | commitdiff | tree

Rishabh Dave [Mon, 11 Mar 2019 10:22:42 +0000 (15:52 +0530)]

don't use os.path.join() on a single path component

Signed-off-by: Rishabh Dave <ridave@redhat.com>

commit | commitdiff | tree

Rishabh Dave [Mon, 11 Mar 2019 10:20:08 +0000 (15:50 +0530)]

use os.path.join() correctly

os.path.join adds the separator (i.e. '/') between the provided path
components only if needed. Providing a single path component doesn't
lead to any checks.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

commit | commitdiff | tree

wumingqiao [Fri, 8 Mar 2019 06:56:55 +0000 (14:56 +0800)]

ceph-mgr: run mgr_modules.yml only on the first mgr host

the task will be delegated to mons[0] for all mgr hosts, so we can just run it on the first host and have the same effect.

Signed-off-by: wumingqiao <wumingqiao@beyondcent.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 7 Mar 2019 22:14:12 +0000 (17:14 -0500)]

Set the default crush rule in ceph.conf

Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 11 Mar 2019 14:44:47 +0000 (10:44 -0400)]

ceph-osd: Install numactl package when needed

With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Sat, 9 Mar 2019 08:33:46 +0000 (09:33 +0100)]

samples: resync sample files

I suspect `./generate_group_vars_sample.sh` wasn't used in
b8d580b3f48c69ba9882df773c4d144b73d01c95 because it introduced a typo in
`group_vars/all.yml.sample` and `group_vars/clients.yml.sample`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Sat, 9 Mar 2019 08:24:46 +0000 (09:24 +0100)]

osd: support numactl options on OSD activate

This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 8 Mar 2019 20:15:39 +0000 (15:15 -0500)]

add-osd.yml: Add become flag for ceph-validate

The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 7 Mar 2019 17:31:39 +0000 (12:31 -0500)]

systemd/service: Set docker.service conditionally

We don't need to set After=docker.service when the container_binary
variable isn't set to docker.
It doesn't break anything currently but it could be confusing when
using podman.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 5 Mar 2019 22:29:40 +0000 (17:29 -0500)]

common: Use rhsm_repository module for RHCS

Instead of using subscription-manager with command module we can use
the rhsm_repository ansible module.
This module already uses repos list feature to determine if a
repository is enabled or not. That way this module is idempotent so
we don't need changed_when: false anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 6 Mar 2019 19:18:52 +0000 (14:18 -0500)]

ceph_key: Use client name to build key path

Because the client name is part of the client key path we can reuse
the user variable to build this path.
Also remove a duplicate user variable declaration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 4 Mar 2019 15:29:09 +0000 (10:29 -0500)]

travis: Add python 2.7

Because we're still using Linux distributions with python 2.7 (like
CentOS/RHEL 7) it could be useful to run travis tests against python
2.7 even if the support will be ended in 2020.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 4 Mar 2019 16:10:24 +0000 (11:10 -0500)]

common: Add noarch to community repository

The ceph stable community repository only enables the basearch
packages url.
Adding the noarch url because starting with nautilus release, some
packages are added there and useful for mgr or grafana.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 5 Mar 2019 19:17:11 +0000 (14:17 -0500)]

Force osd pool min_size value to integer

After b8d580b and e9e5d5a we could have either item.min_size or
osd_pool_default_min_size using string instead of int causing the
condition to be true when it's false.
As a result, the task could try to set the pool min_size value to
0 which leads to:

Error EINVAL: pool min_size must be between 1 and 1

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 5 Mar 2019 14:28:14 +0000 (09:28 -0500)]

Add CONTAINER_IMAGE env var to ceph daemons

Ceph daemons will set the CONTAINER_IMAGE environment variable value
in the daemon metadata.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Mar 2019 12:46:18 +0000 (13:46 +0100)]

fix pool min_size customization

b8d580b3f48c69ba9882df773c4d144b73d01c95 introduced a bug when
`min_size` isn't set (default to 0).

Typical error:

```
Error EINVAL: pool min_size must be between 1 and 1
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Radu Toader [Thu, 28 Feb 2019 20:58:31 +0000 (22:58 +0200)]

Customize pools min_size

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>

commit | commitdiff | tree

Radu Toader [Thu, 28 Feb 2019 16:46:29 +0000 (18:46 +0200)]

When creating pool, read pool.application and make the call to ceph osd pool enable application

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>

commit | commitdiff | tree

Kevin Coakley [Fri, 1 Mar 2019 18:53:03 +0000 (10:53 -0800)]

Updated 7 ansible-lint issues in the ceph-mon, ceph-osd, and ceph-rgw roles

The following lint issues have been resolved:

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml:2

[305] Use shell only when shell functionality is required
/home/travis/build/ceph/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:47

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:2

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:7

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:14

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:19

[301] Commands should not change things if nothing needs doing
/home/travis/build/ceph/ceph-ansible/roles/ceph-rgw/tasks/multisite/destroy.yml:24

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 4 Mar 2019 16:45:51 +0000 (17:45 +0100)]

nfs: fix systemd template service for ubuntu

`mkdir` is located in `/bin` on Ubuntu.
Let's use some jinja to support Ubuntu.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 4 Mar 2019 16:35:37 +0000 (17:35 +0100)]

tests: add symlink for ubuntu hosts inventory

otherwise a bunch of jobs will fail like following:

```
[WARNING]: Unable to parse /home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-ubuntu-container-stable-3.2-bluestore_lvm_osds/tests/functional/bs-lvm-osds/container/hosts-ubuntu as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 4 Mar 2019 09:01:07 +0000 (10:01 +0100)]

tests: pin testinfra version

As of testinfra 2.0.0, the binary name is `py.test`.

But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.

Note that I've replaced all `testinfra` occurences by `py.test` anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 4 Mar 2019 08:23:00 +0000 (09:23 +0100)]

add-osd: gather facts in second part of playbook

otherwise, it will end up with error like following:

```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```

because facts won't have been gathered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Mar 2019 17:23:39 +0000 (18:23 +0100)]

purge: fix rbd-mirror group name

the default is rbdmirrors in ceph-defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Mar 2019 15:47:36 +0000 (16:47 +0100)]

purge: fix rbd mirror purge

as of b70d54ac809a92cd88e39e3efa7ed3fee864a866 the service launched isn't
ceph-rbd-mirror@admin.service.

it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Mar 2019 13:45:48 +0000 (14:45 +0100)]

purge: do not remove /var/lib/apt/lists/*

removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.

Typical error:
```
fatal: [mon0]: FAILED! => changed=false
  attempts: 3
  msg: No package matching 'ceph' is available
```

The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
  apt:
    update_cache: yes
    cache_valid_time: 3600
  register: result
  until: result is succeeded
```

since the task installing ceph packages has a `update_cache: no` it
fails:

```
- name: install ceph for debian
  apt:
    name: "{{ debian_ceph_pkgs | unique }}"
    update_cache: no
    state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
    default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
  register: result
  until: result is succeeded
```

/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Mar 2019 11:27:00 +0000 (12:27 +0100)]

purge: fix purge of lvm devices

using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.

on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
  cmd: command -v ceph-volume
  failed_when_result: false
  msg: '[Errno 2] No such file or directory: ''command'': ''command'''
  rc: 2
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 1 Mar 2019 15:41:55 +0000 (10:41 -0500)]

lint: Fix spaces before and after variables

ansible-lint reports:

[206] Variables should have spaces after {{ and before }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.