git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

ceph-infra: move dashboard into a dedicated file

Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f4c261ef9023d006cabfac28b45e7820bb132ceb)

ceph-infra: open dashboard port on monitor

When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4535985188dcc656ff4da60318dc07b44eabf3a6)

dashboard: use fqdn in external url

Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 498bc45859f9a7ac4b3ac419e21852164f8a762e)

tests: use ceph iscsi stable repository

The ceph iscsi repository was still set to dev (shaman) instead of
using the stable ceph-iscsi repository.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

purge-iscsi-gateways: remove node from dashboard

When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 931a842f21e5eb847ad371640307b7c0fef198bd)

filestore-to-bluestore: umount partitions before zapping them

When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8056514134512f20a4b02028fb051f075ad7a145)

shrink-mds: do not play ceph-facts entirely

We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ae0a9ce2812796d943085c6622f4188f16d6231)

shrink-mds: use fact from delegated node

The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 77b39d235b9b713b7e814296164db27b4d428ae0)

facts: use correct python interpreter

that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5adb735c78767545993192c67cf12b9e03f42138)

shrink-mds: fix filesystem removal task

This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 38278a6bb5eb2ec4186984f2006094fe7e36dd79)

shrink-mds: ensure max_mds is always honored

This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfe5a04bfcb4aae6b7842621dfacab90bdcc7c3)

ceph_volume: support filestore to bluestore migration

This commit adds the filestore to bluestore migration support in
ceph_volume module.

We must append to the executed command only the relevant options
according to what is passed in `osd_objectostore`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aabba3baab50fe4fb86535cd8838edc4af87d917)

filestore-to-bluestore: ensure all dm are closed

This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8e6ef818a287e8bf139420142493843077ea3851)

filestore-to-bluestore: force OSDs to be marked down

Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51d601193ee2050a002dafd29005019e26e2a804)

filestore-to-bluestore: do not use --destroy

Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e3305e6bb655aa64f936687097cbcb6fc62f43cb)

ceph_volume: add destroy option support

The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0dcacdbed09ef78350c03112021cae885ff47ba9)

filestore-to-bluestore: add non containerized support

This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4833b85e0428349f99c68a94f5f90867799ae224)

tests: add filestore_to_bluestore job

This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 40de34fb5e42cffa43e59afe1cacc4d5675287b0)

ansible.cfg: do not enforce PreferredAuthentications

There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d682412e2aa5eeb411cc0dff9a3ffef4b4aa8683)

defaults: change default value for dashboard_admin_password

A recent change in ceph/ceph prevent from having username in the
password:

`Error EINVAL: Password cannot contain username.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0756fa467d2bdbe6b758b07b89eabc0a8b00e715)

update: restart iscsigws daemons after upgrade

In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7708eb4582f65717599d2d7bee7a7a5885ecc2b)

upgrade: add dashboard deployment

when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 451c5ca93499fc0520b60729a170f0aa6f6132db)

defaults: add a comment

This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.

Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a234338eff4b4e35e1c0504ae2ee005e4ca76cf7)

dashboard: run node_export as privileged container

Typical error:

```
type=AVC msg=audit(1575367499.582:3210): avc: denied { search } for pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```

node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d245eb7e7d9453af04e141ed0abd3fbdef1e563c)

ceph-defaults: exclude md devices from discovery

The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 014f51c2a42e4922b43da07b97c4b810ede32200)

facts: avoid duplicated element in devices list

When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.

Using `unique` filter when building the list fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 23b1f43897db0a03ef94f51e83ed3c562c4584d0)

purge-cluster: add podman support

The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 89f6cc54a2f6310541eb243390a696cb914b0c7a)

tests: reduce max_mds from 3 to 2

Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4a6d19dae296969954e5101e9bd53443fddde03d)

purge: rename playbook (container)

Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7bc7e3669d26fb41d096bdf50e2301ee782a428b)

add-{mon,osd}: run raw install python tasks

If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.

Closes: #4368
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34b03d1873f6a5fba8baddbf61b08b50a224e555)

ceph-grafana: remove ipv6 brakets on wait_for

The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 55adc10be31f313ac97b3b3c3c9ea7f17181922d)

ceph-defaults: pin prometheus container tags

In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.

$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
  build user:       root@67fee13ed48f
  build date:       20191023-14:38:12
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
  build user:       root@70b79a3f29b6
  build date:       20191023-14:57:30
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
  build user:       root@12da054778a3
  build date:       20191023-14:39:36
  go version:       go1.11.13

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3e29b8d5ffcfe4d0c74f44d6d974bc770e21e828)

switch_to_containers: fix umount ceph partitions

When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 39cfe0aa65ddd96458ba9d0a031d801efbb0d394)

purge: do not try to stop docker when binary is podman

If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18476a1a6ff23550ca3b36864771f7afbfaf373)

facts: isolate container_binary facts

in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe5ffe589e3cf19d1b03d73f3e2867a870c86192)

purge: remove docker_* task

All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d23383a820ed23e675af28e80b161e17d17fe40d)

dashboard: use fqdn url for active alert

When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.

This commit changes the template in order to use the fqdn.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8d76d72d71c8440003ac1ac78fd187f7b7a65e6)

dashboard: only print dashboard url of the grafana-server node

This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc0c1ce30154118670d7e9b19e9e092b22dbfc2c)

docker2podman: import ceph-handler role

This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a43a8721050aa858fec30da7ab8845e2ae845659)

docker2podman: do not hardcode group name

let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fe0d55efff1b3ec9571ba3f6881d68cfa66ffed)

docker2podman: import ceph-defaults in first play

We must import this role in the first play otherwise the first call to
`client_group_name`fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6526a25ab52c12841403191539297ff8adc9cacd)

tests: revert vagrant_variable file name detection

This commit reverts the following change:

https://github.com/ceph/ceph-ansible/pull/4510/commits/fcf181342a70b78a355d1c985699028012326b5f#diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14

this is causing CI failures so this commit is intended to unlock the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5353ab8a23aee92ebab8146eeeeffcfcb25c0865)

Fixes failure of cephfs configuration using --limit
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 72c43cc5d9052455f37414f7cd4fba1e37f99ef0)

container: add always tag on gather fact tasks

If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:

The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined

This commit also adds missing 'gather_facts: false' to mons plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d7fd769b6ddcb63086cc414e00cce31433d56673)

Evades validation of ceph_repository_type in containerized scenario
This will prevent failure of site-docker.yml with configs in doc.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a1f1626c3e57e64bdcd8d37ae600c21f3ea2a24)

ceph_key: restore file mode after a key is fetched

when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b717b5f736448903c69882392e0691fba60893aa)

tests: add coverage on purge playbook

This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db77fbda15bf9f79f8122559b01b6625005ae29c)

purge: use sysfs to unmap rbd devices

in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3cfcc7a105156dfde65b23e9d8662cd848537094)

mergify: remove mergify config on stable-4.0

This commit removes the mergify config on stable-4.0

At the moment there is no need to have a mergify config on this branch
given that we don't use it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-osd: fix fs.aio-max-nr sysctl condition

[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ece46d33be566994d6ce799fdc4547299b352429)

tests/requirements: bump testinfra and pytest

The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 02df2ab5ea37ab7d9cd42b1a4d324515cb503677)

ceph-defaults: pin grafana container tag to 5.2.4

The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.

$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2037fb87b63ef0c6f9f70ae43af3a0fc82ff7d1a)

ceph-osd: Remove ulimit nofile on container start

Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f79d5018e6999362fd025e9c04c9b3f)

Set grafana-server user and password in ceph-dashboard role

This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 41b8c17356fa1273761c3d864f959fbcb11813e7)

ceph-mon: use --admin-daemon to set default crush rule

Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>
(cherry picked from commit d3f67d63aebd31133310a39b4e42a16d064397da)

update: add default values when setting fact

This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9823f319ba8deb2f653c51868eda4566ebff609)

rolling_update: remove default filter on mds group

There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99bcff6f73478f11e67ba7edb178b029)

rolling_update: fix active mds host value

The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c7974f0839b5e74cb23849e943a1131c6)

ipaddrs_in_ranges: fix python indent

pycodestyle returns:

E111 indentation is not a multiple of four

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a0a13f67a56bd3c1eb1078892456cdb692b0fcb)

move library/plugins tests files under tests dir

To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.

[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'

Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6ce4fde82074c82e762fd7c56f2b5901f0705d09)

rolling_update: fix reset mon_host variable

mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.

Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 650bc0c3f0598feb0a6d9b0f7688b773836819a2)

add-mon: add missing become flag

Without the become flag set to true, we can't executed the roles
successfully.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 77b212833e22cb6584ad020aa9bd0e477c50b461)

update: use right node when creating active mds group

This must be consistent with what is used in `name` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d06057ebd2f5730dea67e687ad596c98f1e5fad8)

update: avoid skipping single mds deployment upgrade

otherwise a single MDS would never be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8ab11d2f8e08c28731f783d592507dd198b5742)

update: skip mds deactivation when no mds in inventory

Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af2d188de23cc354ecb9ddcfc0af9d90)

add-{mon,osd}: add ceph-container-engine role

The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.

fatal: [xxxxx]: FAILED! => changed=false
  cmd: docker --version
  msg: '[Errno 2] No such file or directory'
  rc: 2

Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bfb1d6be12541bb0fff4db07a6389fc9ea24ac51)

defaults: add user/pass auth registry variables

Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b33c476f16a17abaedc5f4a31e37fbed121b968e)

tests: use osd ids instead of device name in ooo_collocation

on master, it doesn't make sense anymore to use device name, we should
use osd id instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b5a61fe2e3b4a509bb80536b634543c5e7a44d07)

tests: fix keyring creation in ooo_collocation

This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd10730a1354641418d74acb4d3f82bc)

tests: update container tag for ooo_collocation

It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03ac7bd131f60c9550e6e890b2abeffd)

dashboard: add ceph iscsi management

When deploying with ceph-iscsi nodes and dashboard enabled, we need to
add the ceph iscsi gateway endpoints to the dashboard configuration and
add the mgr ip address in the trusted list in the iscsi gateway
configuration file.

Closes: #4638
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d050391cbbe8a56d1cf44e744a02e4aa3f0583e5)

ceph-iscsi: add ceph-iscsi stable repositories

This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2cb937193f96e9dc94c3d178fd2ea4cdd5bf895)

rhcs: set ceph_iscsi_config_dev to false

We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 82495eaf976023e66c7b701ae4ba00519b8ece52)

Revert "iscsigw: install python-requests"

We don't need this since [1]. Also this was only working for python2 and
not supporting python3.

[1] https://github.com/ceph/ceph-iscsi/commit/00f198a

This reverts commit 167737dd3de02057403fb458c50d22cf94a85b95.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd8d47da98e2a2a13bc7297d79260de8a0c2af9a)

container/dashboard: run the registry auth task

When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.

Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9ad000618ff7be9967bbdae2b1c5cfc0793efe26)

travis: fail on ansible-lint errors

If ansible-lint reports an error then it's skipped. We should fail in
this case.

This patch also fixes the pipefail lint in the rbd mirror role

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3969470fca995f7297d2e8516ea3bf9b287da1f8)

lint: fix error [303,602,701,702]

[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f57f642698608974e9d5121b1d41c9c)

validate: fix credentials validation

This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit da4215e9c055682f1f91552cc40e4fa77a75fc7a)

update: add missing quotes

Add missing quote in order to keep consistency.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d72ff8e5e671f7db9ad39b123686acde1b45aa1)

playbooks: show cluster status after dashboard

We should show the ceph cluster status as the last task of the playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47ce9c0d225ff0d39cf1f1bb982ca94531f476d1)

Move the dashboard playbook in the main directory

The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.

Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8426856262d2f33b841b0f756e9a8661c328e0be)

tests: fix the size on the second data LV

The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd33ba6b3a3daf73ecd011e87ab41c0a0)

tests: add multimds coverage

This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce322a6fa04c3709597a341b7cd1a0c3d)

upgrade: fix standby_mdss group creation

This commit fixes the standby_mdss group creation by using `{{ item }}`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fc8cc8787e13cb6a45db51581c558a587a6f27)

tests: reduce handler mon and osd delay

We don't need to have high handler delay in the CI so reducing to
10 seconds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 04ec1ad3cc0d4b60458a8b14dcc0b640e80f9eef)

common: do not override ceph_release when using custom repo

Otherwise it fails like following:

```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e9504c939a1daddceef3806f7165844952a6618)

nfs: remove unnecessary set_fact in main.yml

this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.

Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b63bd1307388c10aa8965febc680f3bc8f3d08a6)

iscsi-gw: Fix rtslib installation

When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit ba141298d708082f0b4256136d4a96c63407c473)

rbd-mirror: fail if the peer is not added

Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.

"stderr": "rbd: invalid spec 'admin@cluster1'"

This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0b1e9c0737ca84c2e4a34f827cf91e1a11007b16)

update: follow new recommandation to upgrade mds cluster

Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71cebf80a623388c64dfcb190133eb5f54a524f9)

ceph-iscsi: notify rbd target services

When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bc701860d541b9f752d0f8cfcbed43979aafb70e)

Execute common roles once on all nodes

The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 68a3dac7cd492cabc1626873018a40a50efe3ada)

playbook: remove duplicate facts

The is_atomic and container_binary facts are already defined in the
ceph-facts role so we don't need to have dedicated tasks for that
before the ceph-facts role exectution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 643b50bd4fe4459a1263562cf0e4ed81123390ba)

mgr: do not copy all keyrings on all mgr

There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.

Closes: #4602
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb8023172541d09979cc0c224c56aba43674d892)

ceph-handler: group listen topics and condition

We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.

Before:

NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0

After:

NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fe9c5b8c686e838a7a16623b5296e9a919fa54de)

handler: followup on #4519

This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc933f88cec22ffca477cf8431fe024e09)

handlers: refact osd handler

This commit merges the two restart tasks into a single one, this way
it's one task less to notify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 411bd07d54fc3f585296b68f2fd04484328399b5)

Remove validate action and notario dependency

The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969ba4ceb643ccce134d5ffb5ffcadf12c)

dashboard: disable facts gathering

This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ae7304acec7cde23f90fdc7d7ae407dc3c27adb)

mgr: improve mgr keyring creation

Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.

Closes: #4518
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 161170524d282d2bfb2fff44886a6451f8b74ecd)