git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

ceph-osd: Remove ulimit nofile on container start

Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a996aef7f79d5018e6999362fd025e9c04c9b3f)

Set grafana-server user and password in ceph-dashboard role

This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 41b8c17356fa1273761c3d864f959fbcb11813e7)

ceph-mon: use --admin-daemon to set default crush rule

Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>
(cherry picked from commit d3f67d63aebd31133310a39b4e42a16d064397da)

update: add default values when setting fact

This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e9823f319ba8deb2f653c51868eda4566ebff609)

rolling_update: remove default filter on mds group

There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2ca79fcc99bcff6f73478f11e67ba7edb178b029)

rolling_update: fix active mds host value

The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1f2352c7974f0839b5e74cb23849e943a1131c6)

ipaddrs_in_ranges: fix python indent

pycodestyle returns:

E111 indentation is not a multiple of four

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8a0a13f67a56bd3c1eb1078892456cdb692b0fcb)

move library/plugins tests files under tests dir

To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.

[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'

Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6ce4fde82074c82e762fd7c56f2b5901f0705d09)

rolling_update: fix reset mon_host variable

mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.

Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 650bc0c3f0598feb0a6d9b0f7688b773836819a2)

add-mon: add missing become flag

Without the become flag set to true, we can't executed the roles
successfully.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 77b212833e22cb6584ad020aa9bd0e477c50b461)

update: use right node when creating active mds group

This must be consistent with what is used in `name` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d06057ebd2f5730dea67e687ad596c98f1e5fad8)

update: avoid skipping single mds deployment upgrade

otherwise a single MDS would never be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8ab11d2f8e08c28731f783d592507dd198b5742)

update: skip mds deactivation when no mds in inventory

Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ec906c3af2d188de23cc354ecb9ddcfc0af9d90)

add-{mon,osd}: add ceph-container-engine role

The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.

fatal: [xxxxx]: FAILED! => changed=false
  cmd: docker --version
  msg: '[Errno 2] No such file or directory'
  rc: 2

Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bfb1d6be12541bb0fff4db07a6389fc9ea24ac51)

defaults: add user/pass auth registry variables

Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b33c476f16a17abaedc5f4a31e37fbed121b968e)

tests: use osd ids instead of device name in ooo_collocation

on master, it doesn't make sense anymore to use device name, we should
use osd id instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b5a61fe2e3b4a509bb80536b634543c5e7a44d07)

tests: fix keyring creation in ooo_collocation

This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd10730a1354641418d74acb4d3f82bc)

tests: update container tag for ooo_collocation

It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03ac7bd131f60c9550e6e890b2abeffd)

dashboard: add ceph iscsi management

When deploying with ceph-iscsi nodes and dashboard enabled, we need to
add the ceph iscsi gateway endpoints to the dashboard configuration and
add the mgr ip address in the trusted list in the iscsi gateway
configuration file.

Closes: #4638
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d050391cbbe8a56d1cf44e744a02e4aa3f0583e5)

ceph-iscsi: add ceph-iscsi stable repositories

This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2cb937193f96e9dc94c3d178fd2ea4cdd5bf895)

rhcs: set ceph_iscsi_config_dev to false

We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 82495eaf976023e66c7b701ae4ba00519b8ece52)

Revert "iscsigw: install python-requests"

We don't need this since [1]. Also this was only working for python2 and
not supporting python3.

[1] https://github.com/ceph/ceph-iscsi/commit/00f198a

This reverts commit 167737dd3de02057403fb458c50d22cf94a85b95.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd8d47da98e2a2a13bc7297d79260de8a0c2af9a)

container/dashboard: run the registry auth task

When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.

Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9ad000618ff7be9967bbdae2b1c5cfc0793efe26)

travis: fail on ansible-lint errors

If ansible-lint reports an error then it's skipped. We should fail in
this case.

This patch also fixes the pipefail lint in the rbd mirror role

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3969470fca995f7297d2e8516ea3bf9b287da1f8)

lint: fix error [303,602,701,702]

[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f57f642698608974e9d5121b1d41c9c)

validate: fix credentials validation

This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit da4215e9c055682f1f91552cc40e4fa77a75fc7a)

update: add missing quotes

Add missing quote in order to keep consistency.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d72ff8e5e671f7db9ad39b123686acde1b45aa1)

playbooks: show cluster status after dashboard

We should show the ceph cluster status as the last task of the playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47ce9c0d225ff0d39cf1f1bb982ca94531f476d1)

Move the dashboard playbook in the main directory

The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.

Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8426856262d2f33b841b0f756e9a8661c328e0be)

tests: fix the size on the second data LV

The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd33ba6b3a3daf73ecd011e87ab41c0a0)

tests: add multimds coverage

This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce322a6fa04c3709597a341b7cd1a0c3d)

upgrade: fix standby_mdss group creation

This commit fixes the standby_mdss group creation by using `{{ item }}`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fc8cc8787e13cb6a45db51581c558a587a6f27)

tests: reduce handler mon and osd delay

We don't need to have high handler delay in the CI so reducing to
10 seconds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 04ec1ad3cc0d4b60458a8b14dcc0b640e80f9eef)

common: do not override ceph_release when using custom repo

Otherwise it fails like following:

```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e9504c939a1daddceef3806f7165844952a6618)

nfs: remove unnecessary set_fact in main.yml

this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.

Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b63bd1307388c10aa8965febc680f3bc8f3d08a6)

iscsi-gw: Fix rtslib installation

When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit ba141298d708082f0b4256136d4a96c63407c473)

rbd-mirror: fail if the peer is not added

Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.

"stderr": "rbd: invalid spec 'admin@cluster1'"

This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0b1e9c0737ca84c2e4a34f827cf91e1a11007b16)

update: follow new recommandation to upgrade mds cluster

Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71cebf80a623388c64dfcb190133eb5f54a524f9)

ceph-iscsi: notify rbd target services

When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bc701860d541b9f752d0f8cfcbed43979aafb70e)

Execute common roles once on all nodes

The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 68a3dac7cd492cabc1626873018a40a50efe3ada)

playbook: remove duplicate facts

The is_atomic and container_binary facts are already defined in the
ceph-facts role so we don't need to have dedicated tasks for that
before the ceph-facts role exectution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 643b50bd4fe4459a1263562cf0e4ed81123390ba)

mgr: do not copy all keyrings on all mgr

There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.

Closes: #4602
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb8023172541d09979cc0c224c56aba43674d892)

ceph-handler: group listen topics and condition

We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.

Before:

NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0

After:

NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fe9c5b8c686e838a7a16623b5296e9a919fa54de)

handler: followup on #4519

This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc933f88cec22ffca477cf8431fe024e09)

handlers: refact osd handler

This commit merges the two restart tasks into a single one, this way
it's one task less to notify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 411bd07d54fc3f585296b68f2fd04484328399b5)

Remove validate action and notario dependency

The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969ba4ceb643ccce134d5ffb5ffcadf12c)

dashboard: disable facts gathering

This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ae7304acec7cde23f90fdc7d7ae407dc3c27adb)

mgr: improve mgr keyring creation

Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.

Closes: #4518
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 161170524d282d2bfb2fff44886a6451f8b74ecd)

validate: prevent from installing OSD on same disk as the OS

This commit adds a validation task to prevent from installing an OSD on
the same disk as the OS.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80e2d00b16629c4815019e8bc58c4539bd109710)

common: do not reset `container_exec_cmd`

This commit removes some legacy tasks.

These tasks aren't needed, they cause the playbook to fail when
collocating daemons.

Closes: #4553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 273413186a0042e66612a2d68b8aaaf17b50b5e1)

dashboard: if no host is available, let's just skip these plays.

If there is no host available, let's just skip these plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b245bd0071038cde44fdc54993df65e2ed754e7)

dashboard: update layouts before the restart

If the mgr dashboard doesn't restart fast enough then the inject
dashboard task will fail with a HTTP error 400.

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command
    push_local_dashboards()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards
    retry()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call
    result = self.func(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push
    grafana.push_dashboard(body)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard
    response.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request

Instead we can trigger this task before the module restart.

Closes: #4565
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f6ff240b7d5a6863478cafc0aa78e2177a09ac3)

tests: update tox due to pipeline removal

This commit reflects the recent changes in ceph/ceph-build#1406

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bcaf8cedeec0f06eb8641b0038569a5cd3a3e7be)

switch_to_containers: umount osd lockbox partition

When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 19edf707a50c2e86110b2ba0231091b6bd355bd1)

nfs: stop nfs server service in all context

This commit moves this task in order to stop the nfs server service
regardless the deployment type desired (containerized or non
containerized).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6c6a512a720de0268e2d099926413e2816c65174)

nfs: stop nfs server service

The syntax here wasn't working, this refact fixes this task.
Also, removing the `ignore_errors: true` which was hidding the failure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47034effe0bb7de14442b0ba884ff4abe793b4b7)

ceph-dashboard: remove rgw api host,port,scheme

We don't need to have dedicated variables for the RGW integration into
the Ceph Dashboard and need to be manually filled.
Instead we can use the current values from the RGW nodes by using the
IP and port from the first RGW instance of the first RGW node via the
radosgw_address and radosgw_frontend_port variables.
We don't need to specify all RGW nodes, this will be done automatically
with one node.
The RGW api scheme is using the radosgw_frontend_ssl_certificate variable
to determine if the value is http or https. This variable is also reuse
as a condition for the ssl verify task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b9e93ad7a60772e953c3d88346bf94db4131dcf6)

switch_to_containers: do not re-set `ceph_uid`

This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e32d2b3eff9605db044e37421d8b938)

switch_to_containers: optimize ownership change

As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164

using `find` command should be faster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit c5d0c90bb7d8382fde2f07820c2d8547c8a3603e)

ceph-dashboard: Improve https configuration

This patch moves the https dashboard configuration into a dedicated
block to avoid the multiple occurence of the dashboard_protocol
condition.
It also fixes the dashboard certificate and key variables handling in
the condition introduced by ab54fe2. Those variables aren't boolean but
strings so we can test them via the length filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 249764047b9e85d3a858949872c1a1790b426044)

update: import ceph-defaults role in first play

Typical error:

```
fatal: [mon0]: FAILED! =>
msg: |-
The conditional check 'not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8138d4193c965a5c990c6c101ef7a9bf2e4080f7)

site.yml: remove raw installation of python2-dnf

these dependencies aren't needed anymore on recent releases of Fedora.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fdf8b62bed75e65f895d99fdc52bcae784c0c48)

main: exclude client nodes from facts gathering when delegate_facts_host

This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9ba81bdb9ecbd841e4b73608648dfae2)

handler: followup on #4519

This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc933f88cec22ffca477cf8431fe024e09)

playbook: add missing tags

Add missing tag on ceph-handler role call.
Otherwise, we can't use `--tags='ceph_update_config'` for updating the
ceph configuration file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f59dad620d43a740466a26f7fb8eba1ffc5ba0af)

tests: fix rgw multisite vagrant variables

The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.

There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.

This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.

[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information

It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.

Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 010158ff847bb59920f6a5bbf383a1cb7056c0cf)

ceph-dashboard: add cluster parameter to ceph cmd

The ceph dashboard tasks didn't use the cluster option if the cluster
name isn't the default value.

Closes: #4529
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd526cfe4ecceffcdce13b29b2c09ff19a8bd1b0)

dashboard: remove useless block section

The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf47594b47a2e50465bcd748ccc3a31df0b8eee4)

Vagrantfile: support more than 9 nodes per daemon type

because of the current ip address assignation, it's not possible to
deploy more than 9 nodes per daemon type.
This commit refact a bit and allows us to get around this limitation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 778c51a0ff7a8c66c464f4828a0f87dd290e1c3e)

ceph-handler: don't restart all OSDs with limit

When using the ansible --limit option on one or few OSD nodes and if the
handler is triggered then we will restart the OSD service on all OSDs
nodes instead of the hosts limited by the limit value.
Even if the play is limited by the --limit value we are using all OSD
nodes from the OSD group.

with_items: '{{ groups[osd_group_name] }}'

Instead we should iterate only on the nodes present in both OSD group and
limit list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0346871fb5c46fb1fedfb24ffe5a8c02108c244e)

ceph-facts: fix _radosgw_address with block

e695efc introduced a regression in the _radosgw_address fact when using
the radosgw_address_block variable.
There's no item there because we don't use the items lookup. This is
only used for _monitor_address with monitor_address_block.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 780cf36a596cce1ba786f7f04cfbf86fa7fd9621)

common: improve keyrings generation

There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d779b59ca906abdecde8f905fe79098cc)

ceph-facts: use --admin-daemon to get fsid

During the rolling_update scenario, the fsid value is retrieve from the
current ceph cluster configuration via the ceph daemon config command.
This command tries first to resolve the admin socket path via the
ceph-conf command.
Unfortunately this command won't work if you have a duplicate key in the
ceph configuration even if it only produces a warning. As a result the
task will fail.

Can't get admin socket path: unable to get conf option admin_socket for
mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined

Instead of using ceph daemon we can use the --admin-daemon option
because we already know what the socket admin path value based on the
ceph cluster and mon hostname values.

Closes: #4492
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec3b687dc4d2153390fcb848e3c839244f644182)

validate: fix gpt header check

Check for gpt header when osd scenario is lvm or lvm batch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 272d16e101984de41bba6fcbe6134bf39e547341)

rbdmirror: rename a file

rename this file to be more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed8616aa66b718274c5196f70b5a3223fcaf216b)

rgw: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd677bd3312240d46765e40b3f4aa6fe33)

rbdmirror: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b755295e33a9b96d64627de1683b0a46)

iscsigw: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e29267398495dc37db2fa511dae6596b)

upgrade: add an infra playbook to migrate systemd units to podman

this commit adds a new playbook to force systemd units for containers to
use podman instead of docker.
This is needed in the rhel8 upgrade context so after the base OS is upgraded
containers can be started using podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2017dcda276455e9fee42de1d593439cb94c0e9)

container: isolate systemd tasks

This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd641674691bbc529711e39583e5d4bf999d84d2)

ceph-facts: update external grafana fact filter

e695efc hasn't been updated with the changes introduced in 9bb11c7 so
the ips_in_ranges filter isn't used for an external grafana instance.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 20b1a464ec373b671bbdd49d17001f0a7fdc7036)

ceph-defaults: Change the default prometheus port

The old default prometheus port 9090 clashes with cockpit in rhel 8. The
9090 port is reserved for web service administration of machines. We
should change the default to something that does not clash with other
ports used in rhel 8, at least by default. The port 9092 seems like a
good choice in my testing.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit b96c6da83239a585d3e51301cd81112779c99928)

rhcs_edits: Fix ose container versions

For some reason, the floating tags were changed from v4.1 to just 4.1
for these images when switching ti registry.redhat.io. We should fix
the locations.

We are also changing the downstream grafana image to the one we used for
rhcs 3. The ose grafana image lacks the support for a lot of features
that we need (e.g. vonage and piechart grafana plugins, grafana-cli
binary and others).

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit f067e53c6efc602a09ca6415cae05ec825bdc9de)

tests: remove debug log verbosity

This was added for debugging purpose.
It's generating very large log output, let's remove this now.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01f6dd52b315c5d42ca614a82e18d758855ba896)

Revert "ceph-common: install only necesarry ceph-* packages on debian"

This reverts commit 58b27ef0b3bbd64d8a66da24d702f3ff761fe6ec.
This is breaking debian based OS deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4444d29e028feff7c2ea04d4f1a187e78f10147)

update: reset mon_host after mons upgrade

after all mon are upgraded, let's reset mon_host which is used in the
rest of the playbook for setting `container_exec_cmd` so we are sure to
use the right value.

Typical error:

```
failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true
  ansible_loop_var: item
  cmd:
  - docker
  - exec
  - ceph-mon-mon2
  - ceph
  - --cluster
  - ceph
  - auth
  - get
  - client.bootstrap-mds
  delta: '0:00:00.016294'
  end: '2019-09-27 13:54:58.828835'
  item:
    copy_key: true
    name: client.bootstrap-mds
    path: /var/lib/ceph/bootstrap-mds/ceph.keyring
  msg: non-zero return code
  rc: 1
  start: '2019-09-27 13:54:58.812541'
  stderr: 'Error response from daemon: No such container: ceph-mon-mon2'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d84160a170d9d134dc5b7ca246004fbe8a14b7af)

install python-xml on SUSE/openSUSE only if python2 is installed

raw_install_python.yml: on SUSE/openSUSE, install python-xml package only
if python2 is installed already

Background:
On SLES 15.x / openSUSE Leap 15.x, the python2 package `python-base` provides
/usr/bin/python, while python3 only provides /usr/bin/python3.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit a1811ca097415990211248648dc3d9480f6841eb)

move python-xml to raw_install_python.yml

The package python-xml is needed for ansible's zypper module to interact with
the zypper package management tool.

roles/ceph-defaults/defaults/main.yml:
Remove python-xml from variable suse_package_dependencies to only
install python-xml on SUSE/openSUSE if python is not found.
raw_install_python.yml already contains all the logic needed to check
if there is a valid python installation, so this is better suited there.

openSUSE Leap 15.x / SLES 15.x do no longer have /usr/bin/python,
only /usr/bin/python3, which already contains the xml module, so
nothing needs to be installed in that case.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 5cf22e9b312bb26b3144c329e6e597a0905b274a)

Replace ipaddr() with ips_in_ranges()

This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

[1] commit: d67230b2a26b40651c1c1dbee68a92b0e851f3d5
[2] commit: 20e4852888ecc76d8d0fa194a438fa2a90e1cde3

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283
Closes: #4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit e695efcaf79909e2237197fd473117930e8d83e5)

ceph-nfs: Allow to configure SecType value

Depending on the infrastruture (w/o kerberos auth) then the SecType
value could be different.
Currently this value is hardcoded in the NFS Ganesha template. Instead
we can use a variable.
The default value is still the same to avoid breaking the backward
compatibility.

Closes: #4459
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ca77d7bd317da75404ef4ee7143c7412d6ae63ee)

ceph-dashboard: Add prometheus api host

The set-prometheus-api-host ceph dashboard subcommand was missing in
ceph-dashboard role. Only grafana and alermanager were present.
This commit also remove the trailing slash at the end of the host/url
values.

Closes: #4453
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 74ab59c4f33d534cfbca4055c1f494a670be40e2)

ceph-common: install only necesarry ceph-* packages on debian

Currently, ceph package only an meta-package that do not contain
actual software, but simply depend on other packages. It's been few
release since debian stretch (official), ubuntu bionic (official),
ubuntu uca repository and upstream debian-jewel.
As we only support nautilus and higher release for master branch,
I propose to drop ceph package and use ceph-base instead for repository
model other than rhcs so debian ceph install will be more minimalis.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
(cherry picked from commit 58b27ef0b3bbd64d8a66da24d702f3ff761fe6ec)

dashboard: add grafana dashboard support on Debian based OS

download grafana dashboard files from github when running on Debian based OS

Signed-off-by: liuxu <liuxu623@gmail.com>
(cherry picked from commit 195f70897ca18faee18d63f006605af392572e8e)

rolling_update.yml: force ceph-volume scan on osds

The rolling_update.yml playbook fails when scanning ceph-disk osds while
deploying nautilus. The --force flag is required to scan existing osds
and rewrite their json metadata.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
(cherry picked from commit 7cc9f93680d84503943d60b2bb950dd68a2259ed)

Inject ceph grafana dashboard layouts

This change just adds the task to inject from the
ceph dashboard mgr module the required layouts
to show all the cluster metrics on the grafana
instance.
Since we're now able to push grafana layouts through
the ceph mgr module command, the dashboards configuration
template is no longer needed on containerized environments.
This commit also fixes the Vagrantfile IP static assigment
in the grafana section because it generates an issue (it's
the same of the mgr instance).
Finally, considering some deployments that use an external
grafana server instance, we reworked the 'grafana_server_addr'
assignment to address these requirements.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 9bb11c7b2a17db56cfcd7284d2190af36e17bba6)

iscsigw: install python-requests

Typical error at rbd-target-api startup:

```
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: Traceback (most recent call last):
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/bin/rbd-target-api", line 39, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: from gwcli.utils import (APIRequest, valid_gateway, valid_client,
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/lib/python2.7/site-packages/gwcli/utils.py", line 1, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: import requests
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: ImportError: No module named requests
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 167737dd3de02057403fb458c50d22cf94a85b95)

tests: pin jinja2 version

ensure we get the latest jinja2 version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 006df148d00a28de01bdbb1b8988039ef82ba0ac)

tests: set copy_admin_key at group_vars level

setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da4267b987aec6e20a8d09b18eebc2c693)

global: remove fetch_directory dependency

This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad823e551cfc324fd9c264633a34b72b5)

infrastructure-playbooks: add filestore-to-bluestore.yml

This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f9ccdaa8ab9ce299c300dedf239eb91f6aa44f0)