git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

tests: fix keyring creation in ooo_collocation

This commit removes the backslash in allow command parameter, this was
needed before the ceph_key module integration.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 384161edcd10730a1354641418d74acb4d3f82bc)

tests: update container tag for ooo_collocation

It doesn't make sense to test the old 3.0.x container images with
nautilus+ ceph releases.
Also disable the dashboard deployment and switch to bluestore backend.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3c2840da03ac7bd131f60c9550e6e890b2abeffd)

dashboard: add ceph iscsi management

When deploying with ceph-iscsi nodes and dashboard enabled, we need to
add the ceph iscsi gateway endpoints to the dashboard configuration and
add the mgr ip address in the trusted list in the iscsi gateway
configuration file.

Closes: #4638
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d050391cbbe8a56d1cf44e744a02e4aa3f0583e5)

ceph-iscsi: add ceph-iscsi stable repositories

This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2cb937193f96e9dc94c3d178fd2ea4cdd5bf895)

rhcs: set ceph_iscsi_config_dev to false

We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 82495eaf976023e66c7b701ae4ba00519b8ece52)

Revert "iscsigw: install python-requests"

We don't need this since [1]. Also this was only working for python2 and
not supporting python3.

[1] https://github.com/ceph/ceph-iscsi/commit/00f198a

This reverts commit 167737dd3de02057403fb458c50d22cf94a85b95.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fd8d47da98e2a2a13bc7297d79260de8a0c2af9a)

container/dashboard: run the registry auth task

When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.

Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9ad000618ff7be9967bbdae2b1c5cfc0793efe26)

travis: fail on ansible-lint errors

If ansible-lint reports an error then it's skipped. We should fail in
this case.

This patch also fixes the pipefail lint in the rbd mirror role

[306] Shells that use pipes should set the pipefail option

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3969470fca995f7297d2e8516ea3bf9b287da1f8)

lint: fix error [303,602,701,702]

[303] mktemp used in place of tempfile module
[602] Don't compare to empty string
[701] No 'galaxy_info' found
[702] Use 'galaxy_tags' rather than 'categories'

This patch also changes the ansible log_path value via the
ANSIBLE_LOG_PATH environment variable in the travis configuration to
avoid warnings.

[WARNING]: log file at /home/travis/ansible/ansible.log is not writeable
and we cannot create it, aborting

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f7fd0b6d4f57f642698608974e9d5121b1d41c9c)

validate: fix credentials validation

This task is failing when `ceph_docker_registry_auth` is enabled and
`ceph_docker_registry_username` is undefined with an ansible error
instead of the expected message.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit da4215e9c055682f1f91552cc40e4fa77a75fc7a)

update: add missing quotes

Add missing quote in order to keep consistency.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d72ff8e5e671f7db9ad39b123686acde1b45aa1)

playbooks: show cluster status after dashboard

We should show the ceph cluster status as the last task of the playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 47ce9c0d225ff0d39cf1f1bb982ca94531f476d1)

Move the dashboard playbook in the main directory

The [group|host]_vars directories are ignored for the dashboard playbook
when the inventory file directory doesn't contain those directories.

Closes: #4601
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1761612
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8426856262d2f33b841b0f756e9a8661c328e0be)

tests: fix the size on the second data LV

The commit replaces the pv/vg/lv commands used with the ansible command
module by the lvg and lvol modules.
This also fixes the size of the second data LV because we were only using
50% of the remaining space instead of 100%.

With a 50G device, the result was:
  - data-lv1 was 25G
  - data-lv2 was 12.5G
Instead of:
  - data-lv1 was 25G
  - data-lv2 was 25G

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2c03c6fcd33ba6b3a3daf73ecd011e87ab41c0a0)

tests: add multimds coverage

This commit makes the all_daemons scenario deploying 3 mds in order to
cover the multimds case.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 25b98b2ce322a6fa04c3709597a341b7cd1a0c3d)

upgrade: fix standby_mdss group creation

This commit fixes the standby_mdss group creation by using `{{ item }}`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c4fc8cc8787e13cb6a45db51581c558a587a6f27)

tests: reduce handler mon and osd delay

We don't need to have high handler delay in the CI so reducing to
10 seconds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 04ec1ad3cc0d4b60458a8b14dcc0b640e80f9eef)

common: do not override ceph_release when using custom repo

Otherwise it fails like following:

```
TASK [ceph-mds : allow multimds] **************************************************************************************************************************************************
Monday 22 July 2019 16:37:38 +0800 (0:00:03.269) 0:13:25.651 ***********
fatal: [rhel7u6clone1]: FAILED! => {"msg": "The conditional check 'ceph_release_num[ceph_release] == ceph_release_num.luminous' failed. The error was: error while evaluating conditional (ceph_release_num[ceph_release] == ceph_release_num.luminous): 'dict object' has no attribute u'dummy'\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mds/tasks/create_mds_filesystems.yml': line 43, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: allow multimds\n ^ here\n"}
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4e9504c939a1daddceef3806f7165844952a6618)

nfs: remove unnecessary set_fact in main.yml

this task is a leftover and no longer needed.
It even causes bug when collocating nfs with mon.

Closes: #4609
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b63bd1307388c10aa8965febc680f3bc8f3d08a6)

iscsi-gw: Fix rtslib installation

When using python3 the name of the rtslib rpm is python3-rtslib. The
packages that use rtslib already have code that detects the python
version and distro deps, so drop it from the ceph iscsi gw task list and
let the ceph-iscsi rpm dependency handle it.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1760930

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit ba141298d708082f0b4256136d4a96c63407c473)

rbd-mirror: fail if the peer is not added

Due the 'failed_when: false' statement present in the peer task then
the playbook continues to ran even if the peer task was failing (like
incorrect remote peer format.

"stderr": "rbd: invalid spec 'admin@cluster1'"

This patch adds a task to list the peer present and add the peer only if
it's not already added. With this we don't need the failed_when statement
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0b1e9c0737ca84c2e4a34f827cf91e1a11007b16)

update: follow new recommandation to upgrade mds cluster

Refact the mds cluster upgrade code in order to follow the documented
recommandation.
See: https://github.com/ceph/ceph/blob/master/doc/cephfs/upgrading.rst

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1569689
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 71cebf80a623388c64dfcb190133eb5f54a524f9)

ceph-iscsi: notify rbd target services

When the iscsi gateway or the ceph configuration file change then we
need to notify the rbd target api/gw services to be restarted.
This patch also merges the rbd-target-api and rbd-target-gw handler
into a single file and listen.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bc701860d541b9f752d0f8cfcbed43979aafb70e)

Execute common roles once on all nodes

The common roles don't need to be executed again on each group plays
(like mons, osds, etc..).
We only need to execute them during the first play. That wat, we will
apply the changes on all nodes in parallel instead of doing it once per
group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 68a3dac7cd492cabc1626873018a40a50efe3ada)

playbook: remove duplicate facts

The is_atomic and container_binary facts are already defined in the
ceph-facts role so we don't need to have dedicated tasks for that
before the ceph-facts role exectution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 643b50bd4fe4459a1263562cf0e4ed81123390ba)

mgr: do not copy all keyrings on all mgr

There is no need to loop over all mgr nodes to set this fact, it's even
breaking deployments because it tries to copy all mgr keyring on all
mgr.

Closes: #4602
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb8023172541d09979cc0c224c56aba43674d892)

ceph-handler: group listen topics and condition

We are using multiple listen topics with the handlers. That means that
we are notifying 4 tasks for each handler.
Instead we can group the listen on an include_tasks and based on the
group condition.

Before:

NOTIFIED HANDLER ceph-handler : set _mon_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mon restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mon daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mon_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy osd restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph osds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _osd_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mds restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mds daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mds_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rgw restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rgw daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rgw_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy mgr restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph mgr daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _mgr_handler_called after restart for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called before restart for mon0
NOTIFIED HANDLER ceph-handler : copy rbd mirror restart script for mon0
NOTIFIED HANDLER ceph-handler : restart ceph rbd mirror daemon(s) for mon0
NOTIFIED HANDLER ceph-handler : set _rbdmirror_handler_called after restart for mon0

After:

NOTIFIED HANDLER ceph-handler : mons handler for mon0
NOTIFIED HANDLER ceph-handler : osds handler for mon0
NOTIFIED HANDLER ceph-handler : mdss handler for mon0
NOTIFIED HANDLER ceph-handler : rgws handler for mon0
NOTIFIED HANDLER ceph-handler : mgrs handler for mon0
NOTIFIED HANDLER ceph-handler : rbdmirrors handler for mon0

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fe9c5b8c686e838a7a16623b5296e9a919fa54de)

handler: followup on #4519

This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc933f88cec22ffca477cf8431fe024e09)

handlers: refact osd handler

This commit merges the two restart tasks into a single one, this way
it's one task less to notify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 411bd07d54fc3f585296b68f2fd04484328399b5)

Remove validate action and notario dependency

The current ceph-validate role is using both validate action and fail
module tasks to validate the ceph configuration.
The validate action is based on the notario python library. When one of
the notario validation fails then a python stack trace is reported to the
ansible task. This output isn't understandable by users.

This patch removes the validate action and the notario depencendy. The
validation is now done with only fail ansible module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654790
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0f978d969ba4ceb643ccce134d5ffb5ffcadf12c)

dashboard: disable facts gathering

This is already done in the main playbooks but absent in the dashboard
playbook.
The facts are already gathered during the first play of the main
playbooks so we don't need to doing twice.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5ae7304acec7cde23f90fdc7d7ae407dc3c27adb)

mgr: improve mgr keyring creation

Delegating on remote node isn't necessary here since we are already
iterating over the right nodes.

Closes: #4518
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 161170524d282d2bfb2fff44886a6451f8b74ecd)

validate: prevent from installing OSD on same disk as the OS

This commit adds a validation task to prevent from installing an OSD on
the same disk as the OS.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623580
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 80e2d00b16629c4815019e8bc58c4539bd109710)

common: do not reset `container_exec_cmd`

This commit removes some legacy tasks.

These tasks aren't needed, they cause the playbook to fail when
collocating daemons.

Closes: #4553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 273413186a0042e66612a2d68b8aaaf17b50b5e1)

dashboard: if no host is available, let's just skip these plays.

If there is no host available, let's just skip these plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1759917
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b245bd0071038cde44fdc54993df65e2ed754e7)

dashboard: update layouts before the restart

If the mgr dashboard doesn't restart fast enough then the inject
dashboard task will fail with a HTTP error 400.

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 914, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/dashboard/module.py", line 450, in handle_command
    push_local_dashboards()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 132, in push_local_dashboards
    retry()
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 89, in call
    result = self.func(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 127, in push
    grafana.push_dashboard(body)
  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 54, in push_dashboard
    response.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 834, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 400 Client Error: Bad Request

Instead we can trigger this task before the module restart.

Closes: #4565
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f6ff240b7d5a6863478cafc0aa78e2177a09ac3)

tests: update tox due to pipeline removal

This commit reflects the recent changes in ceph/ceph-build#1406

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bcaf8cedeec0f06eb8641b0038569a5cd3a3e7be)

switch_to_containers: umount osd lockbox partition

When switching from a baremetal deployment to a containerized deployment
we only umount the OSD data partition.
If the OSD is encrypted (dmcrypt: true) then there's an additional
partition (part number 5) used for the lockbox and mount in the
/var/lib/ceph/osd-lockbox/ directory.
Because this partition isn't umount then the containerized OSD aren't
able to start. The partition is still mount by the system and can't be
remount from the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 19edf707a50c2e86110b2ba0231091b6bd355bd1)

nfs: stop nfs server service in all context

This commit moves this task in order to stop the nfs server service
regardless the deployment type desired (containerized or non
containerized).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6c6a512a720de0268e2d099926413e2816c65174)

nfs: stop nfs server service

The syntax here wasn't working, this refact fixes this task.
Also, removing the `ignore_errors: true` which was hidding the failure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47034effe0bb7de14442b0ba884ff4abe793b4b7)

ceph-dashboard: remove rgw api host,port,scheme

We don't need to have dedicated variables for the RGW integration into
the Ceph Dashboard and need to be manually filled.
Instead we can use the current values from the RGW nodes by using the
IP and port from the first RGW instance of the first RGW node via the
radosgw_address and radosgw_frontend_port variables.
We don't need to specify all RGW nodes, this will be done automatically
with one node.
The RGW api scheme is using the radosgw_frontend_ssl_certificate variable
to determine if the value is http or https. This variable is also reuse
as a condition for the ssl verify task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b9e93ad7a60772e953c3d88346bf94db4131dcf6)

switch_to_containers: do not re-set `ceph_uid`

This commit refacts the way we set `ceph_uid` fact in `ceph-facts` and
removes all `set_fact` tasks for `ceph_uid` in switch-to-containers playbook
to avoid duplicated code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fa9b42e98e32d2b3eff9605db044e37421d8b938)

switch_to_containers: optimize ownership change

As per https://github.com/ceph/ceph-ansible/pull/4323#issuecomment-538420164

using `find` command should be faster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1757400
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-Authored-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit c5d0c90bb7d8382fde2f07820c2d8547c8a3603e)

ceph-dashboard: Improve https configuration

This patch moves the https dashboard configuration into a dedicated
block to avoid the multiple occurence of the dashboard_protocol
condition.
It also fixes the dashboard certificate and key variables handling in
the condition introduced by ab54fe2. Those variables aren't boolean but
strings so we can test them via the length filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 249764047b9e85d3a858949872c1a1790b426044)

update: import ceph-defaults role in first play

Typical error:

```
fatal: [mon0]: FAILED! =>
msg: |-
The conditional check 'not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])' failed. The error was: error while evaluating conditional (not delegate_facts_host | bool or inventory_hostname in groups.get(client_group_name, [])): 'client_group_name' is undefined
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8138d4193c965a5c990c6c101ef7a9bf2e4080f7)

site.yml: remove raw installation of python2-dnf

these dependencies aren't needed anymore on recent releases of Fedora.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7fdf8b62bed75e65f895d99fdc52bcae784c0c48)

main: exclude client nodes from facts gathering when delegate_facts_host

This commit excludes client nodes from facts gathering, they are not
needed and can speed up this task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 865d2eac9ba81bdb9ecbd841e4b73608648dfae2)

handler: followup on #4519

This commit adds some missing `| bool` filters.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccc11cfc933f88cec22ffca477cf8431fe024e09)

playbook: add missing tags

Add missing tag on ceph-handler role call.
Otherwise, we can't use `--tags='ceph_update_config'` for updating the
ceph configuration file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f59dad620d43a740466a26f7fb8eba1ffc5ba0af)

tests: fix rgw multisite vagrant variables

The secondary vagrant variables didn't have the grafana vm variable
set which create an vagrant error.

There was an error loading a Vagrantfile. The file being loaded
and the error message are shown below. This is usually caused by
an invalid or undefined variable.

This patch also changes the ssh-extra-args parameter to ssh-common-args
to get the same values for ssh/sftp/scp. Otherwise we can see warnings
from ansible and some tasks are failing.

[WARNING]: sftp transfer mechanism failed on [mon0]. Use ANSIBLE_DEBUG=1
to see detailed information

It also updates the ssh-common-args value for the rgw-multisite scenario
to reflect the ANSIBLE_SSH_ARGS environment variable value.

Finally changing the IP addresses due to the Vagrant refact done in the
commit 778c51a

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 010158ff847bb59920f6a5bbf383a1cb7056c0cf)

ceph-dashboard: add cluster parameter to ceph cmd

The ceph dashboard tasks didn't use the cluster option if the cluster
name isn't the default value.

Closes: #4529
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd526cfe4ecceffcdce13b29b2c09ff19a8bd1b0)

dashboard: remove useless block section

The block section were used with the dashboard_enabled condition when
the code was included in the main playbooks.
Because this condition isn't present in the dashboard playbook anymore
we can remove the block section.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cf47594b47a2e50465bcd748ccc3a31df0b8eee4)

Vagrantfile: support more than 9 nodes per daemon type

because of the current ip address assignation, it's not possible to
deploy more than 9 nodes per daemon type.
This commit refact a bit and allows us to get around this limitation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 778c51a0ff7a8c66c464f4828a0f87dd290e1c3e)

ceph-handler: don't restart all OSDs with limit

When using the ansible --limit option on one or few OSD nodes and if the
handler is triggered then we will restart the OSD service on all OSDs
nodes instead of the hosts limited by the limit value.
Even if the play is limited by the --limit value we are using all OSD
nodes from the OSD group.

with_items: '{{ groups[osd_group_name] }}'

Instead we should iterate only on the nodes present in both OSD group and
limit list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0346871fb5c46fb1fedfb24ffe5a8c02108c244e)

ceph-facts: fix _radosgw_address with block

e695efc introduced a regression in the _radosgw_address fact when using
the radosgw_address_block variable.
There's no item there because we don't use the items lookup. This is
only used for _monitor_address with monitor_address_block.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1758099
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 780cf36a596cce1ba786f7f04cfbf86fa7fd9621)

common: improve keyrings generation

There is no need to get n * number of nodes the different keyrings.
Adding a `run_once: true` here avoid running a ceph command too many
times which could be impacting large cluster deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9bad239d779b59ca906abdecde8f905fe79098cc)

ceph-facts: use --admin-daemon to get fsid

During the rolling_update scenario, the fsid value is retrieve from the
current ceph cluster configuration via the ceph daemon config command.
This command tries first to resolve the admin socket path via the
ceph-conf command.
Unfortunately this command won't work if you have a duplicate key in the
ceph configuration even if it only produces a warning. As a result the
task will fail.

Can't get admin socket path: unable to get conf option admin_socket for
mon.xxx: warning: line 13: 'osd_memory_target' in section 'osd' redefined

Instead of using ceph daemon we can use the --admin-daemon option
because we already know what the socket admin path value based on the
ceph cluster and mon hostname values.

Closes: #4492
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec3b687dc4d2153390fcb848e3c839244f644182)

validate: fix gpt header check

Check for gpt header when osd scenario is lvm or lvm batch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 272d16e101984de41bba6fcbe6134bf39e547341)

rbdmirror: rename a file

rename this file to be more generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed8616aa66b718274c5196f70b5a3223fcaf216b)

rgw: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e08194dd677bd3312240d46765e40b3f4aa6fe33)

rbdmirror: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/`
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c69816c6b755295e33a9b96d64627de1683b0a46)

iscsigw: refact tasks directory layout

This commit moves containerized deployment related files to `./tasks/
directory. This is needed to make `docker-to-podman.yml` working since
we use `tasks_from:` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4636f3f7e29267398495dc37db2fa511dae6596b)

upgrade: add an infra playbook to migrate systemd units to podman

this commit adds a new playbook to force systemd units for containers to
use podman instead of docker.
This is needed in the rhel8 upgrade context so after the base OS is upgraded
containers can be started using podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2017dcda276455e9fee42de1d593439cb94c0e9)

container: isolate systemd tasks

This commit isolates the systemd unit files generation for containers into
separate yml files in order to be able importing each corresponding roles
without playing all tasks.
This is needed so we can run ceph-ansible to render systemd unit files
so they call podman instead of docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bd641674691bbc529711e39583e5d4bf999d84d2)

ceph-facts: update external grafana fact filter

e695efc hasn't been updated with the changes introduced in 9bb11c7 so
the ips_in_ranges filter isn't used for an external grafana instance.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 20b1a464ec373b671bbdd49d17001f0a7fdc7036)

ceph-defaults: Change the default prometheus port

The old default prometheus port 9090 clashes with cockpit in rhel 8. The
9090 port is reserved for web service administration of machines. We
should change the default to something that does not clash with other
ports used in rhel 8, at least by default. The port 9092 seems like a
good choice in my testing.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit b96c6da83239a585d3e51301cd81112779c99928)

rhcs_edits: Fix ose container versions

For some reason, the floating tags were changed from v4.1 to just 4.1
for these images when switching ti registry.redhat.io. We should fix
the locations.

We are also changing the downstream grafana image to the one we used for
rhcs 3. The ose grafana image lacks the support for a lot of features
that we need (e.g. vonage and piechart grafana plugins, grafana-cli
binary and others).

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit f067e53c6efc602a09ca6415cae05ec825bdc9de)

tests: remove debug log verbosity

This was added for debugging purpose.
It's generating very large log output, let's remove this now.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01f6dd52b315c5d42ca614a82e18d758855ba896)

Revert "ceph-common: install only necesarry ceph-* packages on debian"

This reverts commit 58b27ef0b3bbd64d8a66da24d702f3ff761fe6ec.
This is breaking debian based OS deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4444d29e028feff7c2ea04d4f1a187e78f10147)

update: reset mon_host after mons upgrade

after all mon are upgraded, let's reset mon_host which is used in the
rest of the playbook for setting `container_exec_cmd` so we are sure to
use the right value.

Typical error:

```
failed: [mds0 -> mon0] (item={u'path': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'name': u'client.bootstrap-mds', u'copy_key': True}) => changed=true
  ansible_loop_var: item
  cmd:
  - docker
  - exec
  - ceph-mon-mon2
  - ceph
  - --cluster
  - ceph
  - auth
  - get
  - client.bootstrap-mds
  delta: '0:00:00.016294'
  end: '2019-09-27 13:54:58.828835'
  item:
    copy_key: true
    name: client.bootstrap-mds
    path: /var/lib/ceph/bootstrap-mds/ceph.keyring
  msg: non-zero return code
  rc: 1
  start: '2019-09-27 13:54:58.812541'
  stderr: 'Error response from daemon: No such container: ceph-mon-mon2'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d84160a170d9d134dc5b7ca246004fbe8a14b7af)

install python-xml on SUSE/openSUSE only if python2 is installed

raw_install_python.yml: on SUSE/openSUSE, install python-xml package only
if python2 is installed already

Background:
On SLES 15.x / openSUSE Leap 15.x, the python2 package `python-base` provides
/usr/bin/python, while python3 only provides /usr/bin/python3.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit a1811ca097415990211248648dc3d9480f6841eb)

move python-xml to raw_install_python.yml

The package python-xml is needed for ansible's zypper module to interact with
the zypper package management tool.

roles/ceph-defaults/defaults/main.yml:
Remove python-xml from variable suse_package_dependencies to only
install python-xml on SUSE/openSUSE if python is not found.
raw_install_python.yml already contains all the logic needed to check
if there is a valid python installation, so this is better suited there.

openSUSE Leap 15.x / SLES 15.x do no longer have /usr/bin/python,
only /usr/bin/python3, which already contains the xml module, so
nothing needs to be installed in that case.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 5cf22e9b312bb26b3144c329e6e597a0905b274a)

Replace ipaddr() with ips_in_ranges()

This change implements a filter_plugin that is used in the
ceph-facts, ceph-validate roles and infrastucture-playbooks.
The new filter plugin will return a list of all IP address
that reside in any one of the given IP ranges. The new filter
replaces the use of the ipaddr filter.

ceph.conf already support a comma separated list of CIDRs
for the public_network and cluster_network options.

Changes: [1] and [2] introduced a regression in ceph-ansible
where public_network can no longer be a comma separated list
of cidrs.

With this change a comma separated list of subnet CIDRs can
also be used for monitor_address_block and radosgw_address_block.

[1] commit: d67230b2a26b40651c1c1dbee68a92b0e851f3d5
[2] commit: 20e4852888ecc76d8d0fa194a438fa2a90e1cde3

Related-To: https://bugs.launchpad.net/tripleo/+bug/1840030
Related-To: https://bugzilla.redhat.com/show_bug.cgi?id=1740283
Closes: #4333
Please backport to stable-4.0

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit e695efcaf79909e2237197fd473117930e8d83e5)

ceph-nfs: Allow to configure SecType value

Depending on the infrastruture (w/o kerberos auth) then the SecType
value could be different.
Currently this value is hardcoded in the NFS Ganesha template. Instead
we can use a variable.
The default value is still the same to avoid breaking the backward
compatibility.

Closes: #4459
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ca77d7bd317da75404ef4ee7143c7412d6ae63ee)

ceph-dashboard: Add prometheus api host

The set-prometheus-api-host ceph dashboard subcommand was missing in
ceph-dashboard role. Only grafana and alermanager were present.
This commit also remove the trailing slash at the end of the host/url
values.

Closes: #4453
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 74ab59c4f33d534cfbca4055c1f494a670be40e2)

ceph-common: install only necesarry ceph-* packages on debian

Currently, ceph package only an meta-package that do not contain
actual software, but simply depend on other packages. It's been few
release since debian stretch (official), ubuntu bionic (official),
ubuntu uca repository and upstream debian-jewel.
As we only support nautilus and higher release for master branch,
I propose to drop ceph package and use ceph-base instead for repository
model other than rhcs so debian ceph install will be more minimalis.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
(cherry picked from commit 58b27ef0b3bbd64d8a66da24d702f3ff761fe6ec)

dashboard: add grafana dashboard support on Debian based OS

download grafana dashboard files from github when running on Debian based OS

Signed-off-by: liuxu <liuxu623@gmail.com>
(cherry picked from commit 195f70897ca18faee18d63f006605af392572e8e)

rolling_update.yml: force ceph-volume scan on osds

The rolling_update.yml playbook fails when scanning ceph-disk osds while
deploying nautilus. The --force flag is required to scan existing osds
and rewrite their json metadata.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
(cherry picked from commit 7cc9f93680d84503943d60b2bb950dd68a2259ed)

Inject ceph grafana dashboard layouts

This change just adds the task to inject from the
ceph dashboard mgr module the required layouts
to show all the cluster metrics on the grafana
instance.
Since we're now able to push grafana layouts through
the ceph mgr module command, the dashboards configuration
template is no longer needed on containerized environments.
This commit also fixes the Vagrantfile IP static assigment
in the grafana section because it generates an issue (it's
the same of the mgr instance).
Finally, considering some deployments that use an external
grafana server instance, we reworked the 'grafana_server_addr'
assignment to address these requirements.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 9bb11c7b2a17db56cfcd7284d2190af36e17bba6)

iscsigw: install python-requests

Typical error at rbd-target-api startup:

```
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: Traceback (most recent call last):
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/bin/rbd-target-api", line 39, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: from gwcli.utils import (APIRequest, valid_gateway, valid_client,
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/lib/python2.7/site-packages/gwcli/utils.py", line 1, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: import requests
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: ImportError: No module named requests
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 167737dd3de02057403fb458c50d22cf94a85b95)

tests: pin jinja2 version

ensure we get the latest jinja2 version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 006df148d00a28de01bdbb1b8988039ef82ba0ac)

tests: set copy_admin_key at group_vars level

setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da4267b987aec6e20a8d09b18eebc2c693)

global: remove fetch_directory dependency

This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad823e551cfc324fd9c264633a34b72b5)

infrastructure-playbooks: add filestore-to-bluestore.yml

This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f9ccdaa8ab9ce299c300dedf239eb91f6aa44f0)

osd: add wal_devices option support to ceph_volume module

This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09e04a91973304fda9ec006776f4a9b9f2bc93b9)

osd: update doc text in defaults/main.yml

This commit removes ceph-disk references.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70f1b37097b24edadad49158713fccb94b57c43e)

osd: add block_db_devices option support to ceph_volume module

This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b836eaa47fc0a84d5c5d79dfadd9dab96eb6472)

lv-create: fix a typo

This commit fixes a typo.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c785ad3637aa06f0457ae7fbf435907da4c5d929)

shrink-rgw.yml: fix confirmation play's name

the confirmation play's name should confirm removing rgw instead of
monitor

Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
(cherry picked from commit 9fa98d79fde5f886e43fb27a9c8139e8271c0095)

group_vars: remove useless dashboard files

The only useful ansible group for the grafana/prometheus stack is
grafana-server so no one of those files are actually needed.
The default values for all dashboard roles are present in ceph-defaults
role so it's also present in in group_vars/all.yml.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec56a95013efef59c591310cc16f3037cdaba255)

validate: check ceph_docker_registry_* length

This commit adds a condition to check whether these variables are empty.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b97ac921bcd913e2daf0fa106d98ff1d00743c9)

container: Allow to use registry authentication

The registry.redhat.io regsitry requires authentication so before pulling
the RHCS 4 container images from the registry we need to do the login
step.
This is done via the new ceph_docker_registry_auth variable. The
default value is false but true for RHCS setup.
When set to true, you need to provide the username and password
for the registry via the associated variables.
This patch also updates the ceph_docker_registry value for RHCS setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9f4a99fb244a705b5f04a9e8ec911d425a4bd23f)

rhel8: add default python bin path

On RHEL 8 system we should check the /usr/libexec/platform-python path
instead of installing python36 package.

[DEPRECATION WARNING]: Distribution redhat 8.0 on host xxxxx should use
/usr/libexec/platform-python, but is using /usr/bin/python for backward
compatibility with prior Ansible releases. A future Ansible release will
default to using the discovered platform python for this host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f90696c36e0a0cce57e61a66057b628267fdc3ed)

shrink-mon: search mon in the quorum_names list

If we're looking at the mon hostname in the ceph status output then
there's some scenarios where this could be true.
If we collocate some services (mons, mgrs, etc..) then the hostname of
the monitor to shrink will still be present in the ceph status (like
in mgrs or other).
Instead we should check the hostame only in the mon part of the output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 734c0dc3106a14a0bacd952d1fd91e3d8856bac6)

ceph-handler: Fix osd restart condition

In containerized deployment, the restart OSD handler couldn't be
triggered in most ansible execution.
This is due to the usage of run_once + a condition on the inventory
hostname and the last filter.
The run_once is triggered first so ansible will pick a node in the
osd group to execute the restart task. But if this node isn't the
last one in the osd group then the task is ignored. There's more
probability that the task will be ignored than executed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5b1c15653fcb4772f0839f3a57f7e36ba1b86f49)

rbd-mirror: Allow to copy the admin keyring

The ceph-rbd-mirror role allows to copy the admin keyring via the
copy_admin_key variable but there's actually no task in that role
doing the job.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1f505628dd5e62226ceee975679f1629788771f9)

rbd-mirror: Use the rbd mirror client keyring

The admin keyring isn't present by default on the rbd mirror nodes so
the rbd commands related to the mirroring confguration will fail.
Instead we can use the rbd mirror client keyring.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a3d36df02564d2301a0b51e5d63ea5ffb8ae6968)

tox-update: set the ansible.cfg path before update

During an upgrade we're installation the platform with the stable-3.2
branch. But the ansible configuration is still using the file from the
current branch which could have some differences.
Instead we can override the ANSIBLE_CONFIG environment variable with
the stable-3.2 commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a8740026ad3f1503e45add78630cb63bbb413cfc)

Support comma-delimited subnets in firewall

ceph.conf supports a comma separated list of
subnet CIDR's for the public_network and the
cluster network. ceph-ansible should support
setting up the firewall for this configuration.

Closes: #4425
Related: #4333
https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit d94229204d84fc27c5997d273dff577af0ab1684)

Look for additional names when checking ceph-nfs container status

Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.

This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d2a2bd7c423182b60460bffa6b3d6a28c7d12227)