git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

Update group_vars according to defaults

b2f2426 didn't use the generate_group_vars_sample.sh script so we
currently have a difference between the content in group_vars and the
ceph-defaults/defaults directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1eeddc394d27895509e6e5e5afabe71e6993a006)

"when" keyword should precede "block" keyword

Otherwise the reader is forced to search for "when" when blocks are too
long.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0beaf123ac8c4de8212377c24c3e11c8a774b89)

Conflicts:
roles/ceph-config/tasks/main.yml
roles/ceph-container-common/tasks/pre_requisites/prerequisites.yml
roles/ceph-validate/tasks/check_devices.yml

validate: fix a typo

5aa27794615e7d4521b1dbf1444b61388aacb852 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d6e28ffd277b0052fa5d8e247b0088630b1fb383)

validate: fix notario error

Typical error:

```
AttributeError: 'Invalid' object has no attribute 'message'
```

As of python 2.6, `BaseException.message` has been deprecated.
When using python3, it fails because it has been removed.

Let's use `str(error)` instead so we don't hit this error when using
python3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2326180bf98ef311ebddb8c6e18d83ae819cd29f)

rgw: add cpuset support

1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.

ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids

2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware` to list the nodes
and which cpu ids correspond to each.

Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bee90b20195f0764c3d9001fb06d0999b5fc6cf)

Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
(cherry picked from commit b2f242660edc63eece5f2dd2eaefb187c4b425a2)

ceph-container-common: modify requirement flow

Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.

https://docs.docker.com/install/linux/docker-ce/debian/
https://docs.docker.com/install/linux/docker-ce/ubuntu/

Resolves: #3609

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8105a1cefb065b8a519de0aad1c89c8c887ee2a4)

rolling_update: ceph commands should use --cluster

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e2529dcd7f92176cb69e31c88e5fce7c1058044e)

rolling_update: set num_osds to the number of running osds

We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 67453853ff559f1c19bc969ca3116c9c13f877bb)

ceph-osd: do not run lvm batch tasks during update

When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 5e3dfe50210f6b5e49f6c15160086a0dd84f9e6b)

tests: adds the migrate_ceph_disk_to_ceph_volume scenario

This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 399a821439f8f0997dc9b66e97aacd61ea870ea3)

rolling_update: migrate ceph-disk osds to ceph-volume

When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 28c47e4d1bb22b0c89f67b315a732b624f650f4b)

ceph-mgr: Add extra module packages

Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 86315272c75f62dd9fa07eec91b9e6f05f54c284)

update: ensure tasks are executed on an upgraded mon

These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7eb42c9e8ec9f28b81eb1f1348c20749c4c8adc2)

update: ensure ceph command returns 0

these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed84325b1d97cc7faf0f478a68d8bc2b1edcaaf6)

update: set osd flags before upgrading any mon

Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 543d1e2e41c00bccf611c7e240ba0d8f4eda5d95)

update: refact msgr2 migration

this commit refact the msgr2 protocol introduction.

If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a4bc7bda5148a94e917d228037447a138bd08cab)

ceph-iscsi-gw: Remove library directory

The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8814d13310c67665da6d39058807f8b3e089d8d)

mds: remove legacy task

this task has nothing to do in stable-4.0 and after.
Let's remove it since stable-4.0 and after aren't intended to deploy
luminous.

Closes: #3873
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 58f38515730730771f363c3b79acab14d1093b6d)

test_mons: test mon listening on port 3300

Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c84a74592a21afa0b723b21ec84063d88c8a8ee4)

test_osds: remove scenario leftover

Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f601549a8a6a859548b3d8259f23150572e09c67)

allow using ansible 2.8

Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e471bce76b58b4e64c0c6d18051abc9c9fbd16d6)

defaults: refact package dependencies installation.

Because 5c98e361df5241fbfa5bd0a2ae1317219b7e1244 could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit edfa4310d3f5c2744747643d25f16136005b5259)

defaults: remove some package dependencies

These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5c98e361df5241fbfa5bd0a2ae1317219b7e1244)

allow adding a monitor to a deployed cluster

Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d5967af7fb5ae96196bbea1e433301bb8d2cab71)

check if mon daemon is installed before restarting it

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 96c180cc0ea6d39555e72da5ba3ae758be1b38ba)

mon: check if an initial monitor keyring already exists

When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit edf1ee20739289c7a62588b563a64d631be3f79b)

purge-cluster: remove python-ceph-argparse package

When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit eb658b3af6f77ac7206294f20cd39c62022d1140)

docs: Update ceph.conf supported section

[rgw] isn't a valide section.
[client.rgw.{instance_name] should be used instead.

Resolves: #3841

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c8b585edb5a32e241127f9659d788ed1e813a51)

switch-from-non-containerized: stop all osds

e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 150acba8c532f059f99a3724382c5f25fc91cdb4)

purge: remove references to ceph-disk

as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1254d767ce6430ce017ca9d54b4a57390834c43)

shrink-osd: remove legacy playbook

as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73aa788459c8296bf6f017bb93cfe68cbdf86bd4)

switch_to_containers: remove ceph-disk references

as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e6bfb843f4549e23a58ea5dec9465e0ea5833d98)

osd: remove legacy file

this file is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f899da3172a1f6a78a3e6f6fdfd22a638a276f12)

tests: pass osd_scenario value to lvm_setup.yml

we must pass the value of osd_scenario from the stable-3.2 branch which
is used for the initial deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3519281b44912d9c5accab475d4d4f4ed417b7b5)

tests: remove test_journal_collocation.py in OSD testing

this test is related to ceph-disk which is dropped as of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 83e84c6a4a58900a7e1d5438129b9a4e877641eb)

resync sample file

d17b1b48b6 introduced a change that hasn't been reported in sample files

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bb15c19519855cb764444185a972c050505fc01c)

osd: remove ceph-disk scenarios files

these files aren't needed anymore since we only use lvm scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4f6846200925e24b9e29b32277c1270ea35c5ebb)

osd: remove dedicated_devices variable

This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0416c88922db59a769f792d716bd89526227209)

osd: remove variable osd_scenario

As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d35e9eeed283a7c4d5cc2f61184b5ca8c55e2b2)

osd: remove legacy file

ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4d5637fd8a3d76489da823f73401609b756afb6c)

validate: only check device when they are devices

We only validate the devices that are passed if there is a list of
devices to validate.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2888c0825fe11176d3cd06fe3568fe340d6d6538)

plugin: validate.py do not check osd_scenario

osd_scenario now defaults to lvm and should not be changed. So we don't
need to test it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 72211d4a247f7485dc24be05ba0d19aa9b6cb52d)

plugin: validate lint

Make python linter happy.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e2467272f86bbba064ef3928b14f927a6b0fa3d4)

doc: update osd scenario

This commits adds documentation for the lvm scenario and the deprecation
of collocated and non-collocated scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f8e1542423e82d622adc037bec8bf1d19a4a5510)

osd: default osd_scenario to lvm

osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52df15895b2522b0bdfae4174aae8e5bafe4381b)

validate: print a message for old scenarios

ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9ea1e494076c819cc623fca234ece79fd38883ba)

osd: remove ceph-disk support

We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2a5aa062eae90b154e98c3c5f6d6a427c28bf97)

tests: Add debug to ceph-override.json

It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872628607e37741c760aa31b88229f3da)

tests/functional: use ceph-override.json symlink

We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18d748a5133216a2339f1989a0e0b27b)

ceph-mds: Set application pool to cephfs

We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b9e6f6888449dbaeba0e2435606ca43)

update: fix undefined error when no mgr group is declared

if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c1e4529b0e44329f561fc46ec96bd081cdd6d38c)

osds: allow passing devices by path

ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7e0adca7a45fe841e40eed774a15a6cc97e638e4)

rgw: change default frontend on nautilus

As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d17b1b48b6d4259f88445a0752e1c13b4522ced0)

mon: remove useless delegate_to

Let's use a condition to run this task only on the first mon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 631e5d31446e514a00428c345d0db9d4c9c9cdab)

UCA: Uncomment UCA variables in defaults, fix consequent breakage

The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a3dcc12683b55ae13d95bca6f15cd32)

container-common: Enable docker on boot for ubuntu

docker daemon is automatically started during package installation
but the service isn't enabled on boot.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 37816570c6204ccc37279f7120309ad31cf3f5cb)

rolling_update: Remove ceph aliases

ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 57b4e76d11e0e3bb58c299ae5d37617ea06e2c17)

allow adding a MDS to already deployed cluster

Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c0dfa9b61a36194006b55105bf30079172d26f5e)

ceph-facts: use last ipv6 address for mon/rgw

When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd4b0ec7eb4502eb246b84c1fefa213ae04df151)

ceph-rgw: Fix bad paths which depend on the clustername

The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.

Signed-off-by: flaf <francois.lafont.1978@gmail.com>
(cherry picked from commit 4c3e77d8690a7be4fb89f7292c51f8644faaeafa)

mgr: manage mgr modules when mgr and mon are collocated

When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :

`inventory_hostname == groups[mgr_group_name][0]`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cbfdbab177e3aed4132be3b1f175c85b412dbe97)

mgr: wait for all mgr to be available

before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:

```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```

It happens because all mgr are not yet available when trying to manage
with mgr modules.

Closes: #3100
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f596cc1711428ee801fc04be2a94b36a5e342a00)

tests: fix update job

jenkins sets `CEPH_ANSIBLE_BRANCH` to `stable-4.0`, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

rgw multisite: add more than 1 rgw to the master or secondary zone

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5de9585c2639cc4741ee8f62bc2c854b)

Check ceph_health_raw.stdout value as string during mon bootstrap

According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).

Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit afbb90e4acb6e0bacddf52bb512bef74b013fa68)

radosgw: Raise cpu limit to 8

In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05fe46933a1437501b0d8a5edb4ca2056)

tests: add back testinfra testing

136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58d354e10335ca017fd8df4c427e38a6)

tests: pin pytest-xdist to 1.27.0

looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211cc00b2cae14b018722f437c0091a2ef)

tests: fix update scenario (stable-4.0)

perform upgrade from luminous to nautilus.

All dev* references are not needed anymore in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

purge: fix lvm-batch purge osd

`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01807383132c2897e331dcc665f062f8be0feeb8)

tests: switch rhel-container-podman to nautilus

in stable-4.0 this should be set to nautilus.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-volume: Add PYTHONIOENCODING env variable

Since https://github.com/ceph/ceph/commit/77912c0 ceph-volume uses
stdout encoding based on LC_CTYPE and PYTHONIOENCODING environment
variables.
Thoses variables aren't set when using ansible.
Currently this commit breaks non containerized deployment on Ubuntu.

TASK [use ceph-volume to create bluestore osds] ********************
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - create
  - --bluestore
  - --data
  - /dev/sdb
  rc: 1
  stderr: |-
    Traceback (most recent call last):
    (...)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in
    position 132: ordinal not in range(128)

Note that the task is failing on ansible side due to the stdout
decoding but the osd creation is successful.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e4229b794c354d9cc7a5951f00356ea3418c4)

tests: test idempotency only on all_daemons job

there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c5e97c5c983d02882919d4af2af48a6)

tox: Set nautilus as default release

On stable-4.0 branch we don't want to use dev setup but stable
release (nautilus).
Also update the container image tag to reflect this change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

remove all NBSPs on master branch

Similar to #3658

Since there's too many changes between master and stable branches let's
commit directly in each branches instead of trying to backport this
commit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

container: Add python3-docker on Ubuntu bionic

When installing python-minimal on Ubuntu bionic, this will add the
/usr/bin/python symlink to the default python interpreter.
On bionic, this isn't python2 but python3.

$ /usr/bin/python --version
Python 3.6.7

The python docker library is only installed for python2 which causes
issues when running the purge-docker-cluster playbook. This playbook
uses the ansible docker modules and requires to have python bindings
installed on the remote host.
Without the bindings we can see python error reported by the docker
module.

msg: Failed to import docker or docker-py - No module named 'docker'.
Try `pip install docker` or `pip install docker-py` (Python 2.6)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests/functional: Use the ansible reboot module

Ansible 2.7 introduces the reboot module so we don't need to use the
shell/reboot + wait_for tasks.

https://docs.ansible.com/ansible/latest/modules/reboot_module.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tox: Fix container purge jobs

On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

rolling_update: Update systemd unit regex for nvme

The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

travis: Remove galaxy lint rules repository

The galaxy-lint-rules github repository isn't used anymore and has
been archived.
All the rules are now part of the ansible-lint project.

https://github.com/ansible/galaxy-lint-rules
https://github.com/ansible/ansible-lint

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Add uca to ceph_repository choices validation

Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

rgw: fix a typo

ee2d52d33df2a311cdf0ff62abd353fccb3affbc introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

rgw: cleanup legacy task

this task was here for backward compatibility.
It's time to remove it in the next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

rgw: add a retry on pool related tasks

sometimes those tasks might fail because of a timeout.
I've been facing this several times in the CI, adding this retry might
help and won't hurt in any case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: followup on edfdc49

all rgw instances should be stopped according to the multiple rgw
instances support added in rolling_update.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: add containerized deployment upgrade support (L->N)

Add a couple of fixes to allow containerized deployments upgrade support
to upgrade from luminous/mimic to nautilus.

- pass CEPH_CONTAINER_IMAGE and CEPH_CONTAINER_BINARY environment
variable to the ceph_key module,
- fix the docker exec command in 'waiting for the containerized monitor
to join the quorum' task according to the `delegate_to` parameter,
- override `docker_exec_cmd` in `ceph-facts` with `mon_host` when
rolling_update is `True`,
- do not run unnecessarily `create_mds_filesystems.yml` when performing an
upgrade.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: add missing hosts in facts gathering

iscsigws were missing.
The 'complete upgrade' couldn't complete because rolling_update was set
to False for iscsigw nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: remove rbdmirror legacy task

This task is no longer needed for next release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: show all daemons version at the end

Let's display all daemons version at the end of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

facts: retrieve fsid during rolling_update playbook

otherwise it generates a new cluster fsid and makes the upgrade failing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

mon: fetch initial keyring even when running rolling_update

otherwise, the task to copy mgr keyring fails during the rolling_update.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: split tox configuration into multiple pieces

This file is becoming too big, let's isolate the update related code in
a dedicated tox configuration file.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: enable new nautilus-only functionality

once the cluster is upgraded to nautilus, we can complete the process by
disallowing pre-nautilus OSDs and enabling all new nautilus-only functionality

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: enable msgr2 protocol

This commit enable the msgr2 protocol when the cluster is fully upgraded
to nautilus

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: ensure mgrs are upgraded after ALL monitors

As of 1c760904b0bd1b6b0f49d6ac19d87d79f185c18f, ceph-ansible implicitly
bootstrap managers on monitors.
mgrs must be upgraded only after all monitors, therefore, this commit
refact the way mgrs are upgraded to be sure we don't upgrade a mgr
during the monitors upgrade.

This commit also ensure we handle the case were we split managers on
dedicated nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: ensure /var/lib/ceph/bootstrap-rbd-mirror is present

This directory is created by ceph-config node by node.
In the upgrade context we need it to be created on ALL monitors as soon
as the first iteration because of the task right after which creates and sends
the keyrings on all monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: mask systemd service units during upgrade

This prevents the packaging from restarting services before we do need
to restart them in the rolling update sequence.
We want to handle services restart at rolling_update playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: set osd flags only once

There is no need to set osd flags (noout, norebalance) each time we
upgrade a mon.

This commit moves up those tasks (before stopping the mon) so we don't need
to delegate them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: fix tasks waiting for the node to join the quorum

We actually want to ensure the node being upgraded is joining the quorum
instead of the monitor picked up earlier.

Indeed, the `mon_host`is used only in `delegate_to:` so we can still run ceph
commands while the monitor being upgraded is stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>