]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
3 years agoUse upstream config_template collection
Dmitriy Rabotyagov [Thu, 13 Jan 2022 15:57:50 +0000 (17:57 +0200)]
Use upstream config_template collection

In order to reduce need of module
internal maintenance and to join forces on plugin development,
it's proposed to switch to using upstream version of
config_template module.

As it's shipped as collection, it's installation for end-users
is trivial and aligns with general approach of shipping extra modules.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
3 years agoFix rich version for ansible-lint
Dmitriy Rabotyagov [Thu, 13 Jan 2022 16:17:14 +0000 (18:17 +0200)]
Fix rich version for ansible-lint

Ansible-lint prior to v5.3.1 has issue with reach version >=11.0.0.
In order to cherry-pick fix to stable branches we fix rich version.

This should be reverted with ansible-lint version bump.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
3 years agocephadm: set allow_overwrite at bootstrap step
yasinlachiny [Fri, 17 Dec 2021 23:02:32 +0000 (02:32 +0330)]
cephadm: set allow_overwrite at bootstrap step

Signed-off-by: yasinlachiny <yasin.lachiny@gmail.com>
3 years agocephadm-adopt: use named args in rgw export creation
Guillaume Abrioux [Thu, 6 Jan 2022 13:33:42 +0000 (14:33 +0100)]
cephadm-adopt: use named args in rgw export creation

In order to avoid breaking changes, let's use named argument
instead of positional argument syntax in the command line
used to create rgw export.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2037691
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoceph-handler: Fix check mode
Benoît Knecht [Thu, 30 Dec 2021 14:08:08 +0000 (15:08 +0100)]
ceph-handler: Fix check mode

When running in check mode with one or more Ceph daemons that need to be
restarted, the `tmpdirpath.path` variable that several handlers rely on is
undefined, leading to fatal errors.

This commit ensures the tasks that require `tmpdirpath.path` are skipped when
it's undefined.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
3 years agotests: temporarily disable nfs-ganesha
Guillaume Abrioux [Wed, 5 Jan 2022 08:52:35 +0000 (09:52 +0100)]
tests: temporarily disable nfs-ganesha

This commit [1] seems to have broken a selinux policy preventing nfs-ganesha from
starting properly.

Since we can't address the issue in ceph-ansible, let's disable temporarily nfs-ganesha testing.

[1] https://github.com/nfs-ganesha/nfs-ganesha/commit/dae2da63d58ae6bfe9ee813b5a59bc40102d7b8d

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocommon: remove legacy repositories
Guillaume Abrioux [Wed, 15 Dec 2021 12:25:49 +0000 (13:25 +0100)]
common: remove legacy repositories

As of rhceph-5, those repositories don't longer exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2032790
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoupdate: speed up client play
Guillaume Abrioux [Tue, 9 Nov 2021 14:35:12 +0000 (15:35 +0100)]
update: speed up client play

wip

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocontainer: align systemd units with rpm
Guillaume Abrioux [Wed, 8 Dec 2021 16:37:14 +0000 (17:37 +0100)]
container: align systemd units with rpm

Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm-adopt: ensure /etc/ceph is present on monitoring node
Guillaume Abrioux [Tue, 7 Dec 2021 20:11:50 +0000 (21:11 +0100)]
cephadm-adopt: ensure /etc/ceph is present on monitoring node

When deploying the monitoring stack on a dedicated node, the directory
`/etc/ceph` has never been created. Therefore, the play for adopting the
monitoring stack fails because it can't write the minimal config file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2029697
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agomake grafana network a configurable option
Danny Webb [Tue, 23 Nov 2021 16:28:02 +0000 (16:28 +0000)]
make grafana network a configurable option

Signed-off-by: Danny Webb <danny.webb@thehutgroup.com>
3 years agopurge: remove ceph directories on client nodes
Guillaume Abrioux [Mon, 22 Nov 2021 08:22:45 +0000 (09:22 +0100)]
purge: remove ceph directories on client nodes

Otherwise any ceph directories are left over on client nodes
after the purge.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agovalidate: support obs repository
Guillaume Abrioux [Wed, 1 Dec 2021 07:44:28 +0000 (08:44 +0100)]
validate: support obs repository

Otherwise, installation on SuSe fails.

Fixes: #6996
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoroles/ceph-rgw: Support CRUSH device class
Benoît Knecht [Tue, 26 Oct 2021 14:00:05 +0000 (16:00 +0200)]
roles/ceph-rgw: Support CRUSH device class

The pools created by `ceph-rgw` (listed in `rgw_create_pools`) now support a
`ec_crush_device_class` option to specify which device class the EC pool should
use.

It default to being omitted, which means it will use OSDs from any device class
by default.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
3 years agolibrary/ceph_ec_profile.py: Support CRUSH device class
Benoît Knecht [Tue, 26 Oct 2021 13:49:33 +0000 (15:49 +0200)]
library/ceph_ec_profile.py: Support CRUSH device class

The `crush_device_class` option of the `ceph_ec_profile` module was documented
but not implemented.

This commit adds it and ensures its value is updated on the corresponding EC
profile.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
3 years agocephadm-adopt: bindmount /var/lib/ceph with 'ro'
Guillaume Abrioux [Tue, 30 Nov 2021 09:00:20 +0000 (10:00 +0100)]
cephadm-adopt: bindmount /var/lib/ceph with 'ro'

When collocating osds with iscsigw daemons, cephadm bindmounts the
following:

```
-v /var/lib/ceph/6126c064-6a9e-4092-8a64-977930df0843/iscsi.rbd.ceph-ameenasuhani-4fs3bq-node5.vomtqb/configfs:/sys/kernel/config
```

this prevents cephadm-adopt playbook from running container and bindmounting `/var/lib/ceph:/var/lib/ceph:z`

since 'ro' is enough in this playbook, let's replace the ':z' option on
this bindmount with ':ro'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoceph_volume: support overriding bind-mounts
Guillaume Abrioux [Tue, 30 Nov 2021 08:52:59 +0000 (09:52 +0100)]
ceph_volume: support overriding bind-mounts

This makes it possible to call `podman run` with custom bind-mounts.

cephadm-adopt.yml playbook needs it for a very specific use case:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027411
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoadopt: fix ceph_origin and ceph_repository defaults
Guillaume Abrioux [Mon, 29 Nov 2021 09:48:23 +0000 (10:48 +0100)]
adopt: fix ceph_origin and ceph_repository defaults

This is overriding those variables because the precedence at the 'block
var' level is greater than the group_vars/host_vars.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2026861
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agovalidate: fix bug when using vault
Guillaume Abrioux [Wed, 10 Nov 2021 13:32:26 +0000 (14:32 +0100)]
validate: fix bug when using vault

since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.

Fixes: #6991
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm-adopt: remove logrotate configuration
Dimitri Savineau [Thu, 28 Oct 2021 21:15:49 +0000 (17:15 -0400)]
cephadm-adopt: remove logrotate configuration

cephadm uses its own logrotate configuration file so ceph-ansible needs
to remove that custom file during the cephadm-adopt playbook.

Closes: #6944
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
3 years agoupdate: move a set_fact
Guillaume Abrioux [Thu, 28 Oct 2021 21:40:18 +0000 (23:40 +0200)]
update: move a set_fact

ceph-facts roles makes decisions based on the fact `rolling_update` so
it must be called before we run this role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoupdate: support --limit on monitor nodes
Guillaume Abrioux [Thu, 28 Oct 2021 14:17:24 +0000 (16:17 +0200)]
update: support --limit on monitor nodes

Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm: support adding hosts with ipv6
Guillaume Abrioux [Thu, 28 Oct 2021 12:12:46 +0000 (14:12 +0200)]
cephadm: support adding hosts with ipv6

The current implementation doesn't support adding hosts when using ipv6
addresses.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm: use public_network when adding hosts
Guillaume Abrioux [Thu, 28 Oct 2021 12:10:26 +0000 (14:10 +0200)]
cephadm: use public_network when adding hosts

When adding host, using ansible_facts['default_ipv4']['address'] might
not be the desired network, we shouldn't enforce the subnet with the
default route.
Let's use the public_network instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoRevert "cephadm: use public_network when adding host"
Guillaume Abrioux [Thu, 28 Oct 2021 11:43:57 +0000 (13:43 +0200)]
Revert "cephadm: use public_network when adding host"

This reverts commit 7a12b854c47c37dbff21ce36af5bc5adc4eda68b.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm: use public_network when adding host
Guillaume Abrioux [Tue, 26 Oct 2021 07:04:21 +0000 (09:04 +0200)]
cephadm: use public_network when adding host

When adding host, using `ansible_facts['default_ipv4']['address']` might
not be the desired network, we shouldn't enforce the subnet with the
default route.
Let's use the public_network instead.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2006415
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoadopt: fix rbd mirror adoption
Guillaume Abrioux [Tue, 12 Oct 2021 14:01:20 +0000 (16:01 +0200)]
adopt: fix rbd mirror adoption

The rbd mirroring is broken because cephadm doesn't bindmount /etc/ceph anymore.
It means the keyrings and ceph config file aren't available after the
migration.
The idea here is to remove the current rbd mirror peer and add it back
to the mon config store so we aren't bound to the /etc/ceph directory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1967440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoadopt: use mgr/nfs volume
Guillaume Abrioux [Thu, 14 Oct 2021 22:44:02 +0000 (00:44 +0200)]
adopt: use mgr/nfs volume

use the mgr 'nfs' module to recreate nfs exports.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1954971
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agorolling_update: modify default health_osd_check_*
Guillaume Abrioux [Mon, 25 Oct 2021 12:28:41 +0000 (14:28 +0200)]
rolling_update: modify default health_osd_check_*

let's do more retries with a shorter delay.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agorolling_update: fix pre and post osd upgrade play
Guillaume Abrioux [Mon, 25 Oct 2021 11:43:25 +0000 (13:43 +0200)]
rolling_update: fix pre and post osd upgrade play

when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: followup on pr6951
Guillaume Abrioux [Fri, 22 Oct 2021 02:37:53 +0000 (04:37 +0200)]
tests: followup on pr6951

destroy VMs at the end of the testing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoupdate: support upgrading a subset of nodes
Guillaume Abrioux [Wed, 20 Oct 2021 08:01:05 +0000 (10:01 +0200)]
update: support upgrading a subset of nodes

It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: add new scenario subset_update
Guillaume Abrioux [Wed, 20 Oct 2021 07:59:48 +0000 (09:59 +0200)]
tests: add new scenario subset_update

new scenario in order to test the subset upgrade approach using tags.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoshrink-osd: fix regression because of a wrong regex
Per Abildgaard Toft [Wed, 20 Oct 2021 07:45:16 +0000 (09:45 +0200)]
shrink-osd: fix regression because of a wrong regex

968891f4498da9625acfdd34bfb01fe445d1eef2 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950
Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
3 years agocephadm: set ssh configs at bootstrap step
Seena Fallah [Sat, 9 Oct 2021 22:52:08 +0000 (02:22 +0330)]
cephadm: set ssh configs at bootstrap step

Add support ssh_user and ssh_config to cephadm bootstrap plugin

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agoshrink-osd: check osd id format
Guillaume Abrioux [Tue, 12 Oct 2021 15:55:40 +0000 (17:55 +0200)]
shrink-osd: check osd id format

This adds a check early in order to ensure the format of osd ids passed
is correct.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm: install cephadm from repository
Seena Fallah [Wed, 15 Sep 2021 12:53:04 +0000 (17:23 +0430)]
cephadm: install cephadm from repository

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agocephadm-adopt: configure repository for cephadm installation
Seena Fallah [Thu, 5 Aug 2021 15:48:38 +0000 (20:18 +0430)]
cephadm-adopt: configure repository for cephadm installation

Configure repository for cephadm installation and use package install in both containerized and non containerized deployment

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agoceph-validate: export validate repository vars as a task
Seena Fallah [Thu, 5 Aug 2021 15:47:10 +0000 (20:17 +0430)]
ceph-validate: export validate repository vars as a task

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agoceph-common: export repository configuration to a single task
Seena Fallah [Thu, 5 Aug 2021 15:46:04 +0000 (20:16 +0430)]
ceph-common: export repository configuration to a single task

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agocephadm: use cephadm_ssh_user for ssh user
Seena Fallah [Wed, 15 Sep 2021 13:02:05 +0000 (17:32 +0430)]
cephadm: use cephadm_ssh_user for ssh user

Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agoAdd ceph_nfs_adopt tag to the cephadm-adopt playbook
Francesco Pantano [Thu, 30 Sep 2021 07:34:37 +0000 (09:34 +0200)]
Add ceph_nfs_adopt tag to the cephadm-adopt playbook

There are existing OpenStack scenarios where nfs is still not managed
by cephadm. For this reason sometimes is useful skip the nfs part of
the adoption playbook and leave this daemon unmanaged.
The purpose of this patch is providing a tag to enable the OpenStack
operators to skip this playbook section.

Closes: https://bugzilla.redhat.com/2009212
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
3 years agocephadm: add admin label on mon nodes
Guillaume Abrioux [Fri, 1 Oct 2021 12:41:23 +0000 (14:41 +0200)]
cephadm: add admin label on mon nodes

This is needed if you want a copy of the admin keyring on the admin
nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: remove all references to ceph_stable_release
Guillaume Abrioux [Wed, 29 Sep 2021 14:25:42 +0000 (16:25 +0200)]
tests: remove all references to ceph_stable_release

this is legacy and not needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoceph-defaults: set ceph_stable_release default to the stable branch release
Seena Fallah [Tue, 21 Sep 2021 07:54:13 +0000 (12:24 +0430)]
ceph-defaults: set ceph_stable_release default to the stable branch release

ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
3 years agotests: set rgw_instances in collect-logs.yml
Guillaume Abrioux [Thu, 30 Sep 2021 09:32:12 +0000 (11:32 +0200)]
tests: set rgw_instances in collect-logs.yml

in order to gather rgw logs, we need rgw_instances to be set.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: update collect-logs.yml playbook
Guillaume Abrioux [Thu, 30 Sep 2021 06:23:42 +0000 (08:23 +0200)]
tests: update collect-logs.yml playbook

- change `ceph -s` output to json-pretty.
- gather rgw logs
- add `health detail` command

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: move collect-logs.yml to ceph-ansible repo
Guillaume Abrioux [Wed, 29 Sep 2021 12:29:58 +0000 (14:29 +0200)]
tests: move collect-logs.yml to ceph-ansible repo

related ceph-build PR: ceph/ceph-build#1914

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agodashboard: allow disabling of unused features
Alex Lambert [Tue, 21 Sep 2021 09:14:43 +0000 (10:14 +0100)]
dashboard: allow disabling of unused features

Unconfigured dashboard features can lead to empty tabs in the dashboard
containing no meaningful content. Allow users to disable dashboard features
they know will not be used.

A list of features to be disabled allows the user to define a streamlined
dashboard as standard across deployments. Defaults to disabling no features,
ensuring that users are sure they do not need the dashboard feature before
disabling it.

Signed-off-by: Alex Lambert <lamberta@microsoft.com>
3 years agodashboard: retry setting rgw-credentials
Guillaume Abrioux [Wed, 29 Sep 2021 06:34:09 +0000 (08:34 +0200)]
dashboard: retry setting rgw-credentials

for some reason, this task can fail in the CI.
Adding a retry can help to avoid this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agotests: add osd node in collocation
Guillaume Abrioux [Tue, 28 Sep 2021 20:24:43 +0000 (22:24 +0200)]
tests: add osd node in collocation

we update the pool size from 1 to 2 in idempotency test
but only 1 node is available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agocephadm-adopt: add no_log: true
Guillaume Abrioux [Tue, 21 Sep 2021 08:41:53 +0000 (10:41 +0200)]
cephadm-adopt: add no_log: true

Let's add a `no_log: true` on the `cephadm registry-login` task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
3 years agoadopt: stop iscsi services in the first place
Guillaume Abrioux [Fri, 24 Sep 2021 12:45:11 +0000 (14:45 +0200)]
adopt: stop iscsi services in the first place

If old containers are still running, it can make tcmu-runner process
unable to open devices and there's nothing else to do than restarting
the container.

Also, as per discussion with iscsi experts, iscsi should be migrated before
OSDs. (the client should be closed before the server)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000412
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: auth_allow_insecure_global_id_reclaim false
Dimitri Savineau [Tue, 10 Aug 2021 15:41:50 +0000 (11:41 -0400)]
tests: auth_allow_insecure_global_id_reclaim false

Otherwise the clients won't be able to reconnect after the reboot in the
all_daemons and collocation jobs.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: fix container-cephadm job
Guillaume Abrioux [Thu, 16 Sep 2021 14:53:33 +0000 (16:53 +0200)]
tests: fix container-cephadm job

add missing variable `containerized_deployment` in group_vars

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agocommon: install ceph-volume package
Guillaume Abrioux [Thu, 16 Sep 2021 12:02:17 +0000 (14:02 +0200)]
common: install ceph-volume package

After pacific release, ceph-volume has its own package.
ceph-ansible has to explicitly install it on osd nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agocephadm-adopt: set cephadm registry login info
Daniel Pivonka [Thu, 9 Sep 2021 21:14:10 +0000 (17:14 -0400)]
cephadm-adopt: set cephadm registry login info

registry login info needs to be stored in cluster for cephadm and future hosts

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2000103
Signed-off-by: Daniel Pivonka <dpivonka@redhat.com>
4 years agoRevert "tests: rename grafana to monitoring"
Guillaume Abrioux [Thu, 9 Sep 2021 14:01:47 +0000 (16:01 +0200)]
Revert "tests: rename grafana to monitoring"

This reverts commit a36586a7777dc34cb977f81dc2d5bdfa2bebd4b6.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: rename grafana to monitoring
Dimitri Savineau [Mon, 9 Aug 2021 17:38:26 +0000 (13:38 -0400)]
tests: rename grafana to monitoring

Since the grafana-server group has been renamed to monitoring then
changing the associated tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge: add remove_docker tag
Seena Fallah [Mon, 16 Aug 2021 20:37:40 +0000 (01:07 +0430)]
purge: add remove_docker tag

This can help to skip docker removal tasks

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
4 years agopurge: add container_binary needed for zap osds
Seena Fallah [Mon, 16 Aug 2021 20:08:47 +0000 (00:38 +0430)]
purge: add container_binary needed for zap osds

`container_binary` isn't set anymore in the purge osd play because of a
regression introduced by 60aa70a.
The CI didn't catch it because the play purging node-exporter sets this
variable for all nodes before we run the purge osd play.

This commit fixes this regression.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
4 years agoceph-defaults: set quay.io as the default registry
Dimitri Savineau [Fri, 27 Aug 2021 16:01:27 +0000 (12:01 -0400)]
ceph-defaults: set quay.io as the default registry

Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge-dashboard: remove cid files
Dimitri Savineau [Tue, 7 Sep 2021 16:13:37 +0000 (12:13 -0400)]
purge-dashboard: remove cid files

This adds the service cid file cleanup as supported in the classic purge
playbook since b9dd253

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests/rgw: use json format output for user info
Dimitri Savineau [Thu, 26 Aug 2021 20:45:07 +0000 (16:45 -0400)]
tests/rgw: use json format output for user info

If the radosgw user already exists then we need to have the output in json
format because we are expecting to load the output with json.loads()
Otherwise we have pytest failure like:

```console
self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.

        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.

        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests/rgw: add timeout 5s to radosgw-admin command
Dimitri Savineau [Tue, 10 Aug 2021 15:57:01 +0000 (11:57 -0400)]
tests/rgw: add timeout 5s to radosgw-admin command

If the radosgw daemons aren't up and running correctly (like not registered
in the servicemap or the OSD are down) then the radosgw-admin will hang
forever.
Jenkins will kill the jobs after 3h but we don't want to wait until this global
timeout.
Adding the timeout 5 command to the radosgw-admin commands (which is already
present on other ceph calls) allows the job to fail earlier.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: fix orch host add with FQDN
Dimitri Savineau [Thu, 26 Aug 2021 16:06:11 +0000 (12:06 -0400)]
cephadm-adopt: fix orch host add with FQDN

When a node is configured with FQDN as the hostname value then the
`ceph orch host add` command will fail because the `ansible_hostname` used
by that command contains the short hostname which won't match the current
hostname (FQDN)
Instead we can use the ansible_nodename fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1997083
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocontainer: explicitly pull monitoring images
Dimitri Savineau [Thu, 19 Aug 2021 18:08:06 +0000 (14:08 -0400)]
container: explicitly pull monitoring images

We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoRevert "tests: use old build of ceph@master"
Dimitri Savineau [Thu, 19 Aug 2021 18:32:21 +0000 (14:32 -0400)]
Revert "tests: use old build of ceph@master"

This reverts commit 47a451426a8308a4ea80e0a1e4d867e9dd290fe5.

This build isn't available on shaman anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoiscsi: don't set default value for trusted_ip_list
Guillaume Abrioux [Wed, 18 Aug 2021 11:23:44 +0000 (13:23 +0200)]
iscsi: don't set default value for trusted_ip_list

It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node

Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.

We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: remove ceph-nfs.target
Dimitri Savineau [Wed, 18 Aug 2021 15:15:39 +0000 (11:15 -0400)]
cephadm-adopt: remove ceph-nfs.target

This systemd target doesn't exist at all.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocontainers: introduce target systemd unit
Guillaume Abrioux [Tue, 10 Aug 2021 13:21:19 +0000 (15:21 +0200)]
containers: introduce target systemd unit

This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoVagrantfile: fallback on 'varant_variables.yml.sample'
Guillaume Abrioux [Tue, 10 Aug 2021 14:11:37 +0000 (16:11 +0200)]
Vagrantfile: fallback on 'varant_variables.yml.sample'

When using a vagrant command from the root directory of the repo, it
throws an error if no 'vagrant_variables.yml' file is present.

```
Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-container-engine: allow override container_package_name and container_service_name
Seena Fallah [Thu, 5 Aug 2021 11:03:55 +0000 (15:33 +0430)]
ceph-container-engine: allow override container_package_name and container_service_name

Only include specific variables when they are undefined

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
4 years agocephadm-adopt: use cephadm_ssh_user for ssh user
Seena Fallah [Tue, 27 Jul 2021 17:44:38 +0000 (22:14 +0430)]
cephadm-adopt: use cephadm_ssh_user for ssh user

Use cephadm_ssh_user to set custom user (not root) for cephadm to ssh to the hosts

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
4 years agoroles: remove leftover from pr #4319
Guillaume Abrioux [Tue, 10 Aug 2021 13:34:50 +0000 (15:34 +0200)]
roles: remove leftover from pr #4319

pr #4319 introduced some uesless `become: true` on systemd tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoupdate: gather facts only one time
Guillaume Abrioux [Tue, 17 Aug 2021 14:07:03 +0000 (16:07 +0200)]
update: gather facts only one time

this play doesn't need to gather facts from localhost

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-dashboard: fix oject gateway integration
Dimitri Savineau [Tue, 17 Aug 2021 15:27:57 +0000 (11:27 -0400)]
ceph-dashboard: fix oject gateway integration

Since [1] multiple ceph dashboard commands have been removed and this is
breaking the current ceph-ansible dashboard with RGW automation.
This removes the following dashboard rgw commands:

- ceph dashboard set-rgw-api-access-key
- ceph dashboard set-rgw-api-secret-key
- ceph dashboard set-rgw-api-host
- ceph dashboard set-rgw-api-port
- ceph dashboard set-rgw-api-scheme

Which are replaced by `ceph dashboard set-rgw-credentials`

The RGW user creation task is also removed.

Finally moving the delegate_to statement from the rgw tasks at the block
level.

[1] https://github.com/ceph/ceph/pull/42252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-volume: hide OSD keyring during creation
Dimitri Savineau [Thu, 12 Aug 2021 15:08:27 +0000 (11:08 -0400)]
ceph-volume: hide OSD keyring during creation

When using ceph-volume lvm create/prepare/batch then the keyring of each
OSD created is displayed in the output.
Let's replace those by some '*' chars.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: use old build of ceph@master
Guillaume Abrioux [Thu, 12 Aug 2021 21:45:06 +0000 (23:45 +0200)]
tests: use old build of ceph@master

for unlocking the ci.
this is intended to be reverted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-mon: do not log monitor keyring
Dimitri Savineau [Wed, 11 Aug 2021 20:01:08 +0000 (16:01 -0400)]
ceph-mon: do not log monitor keyring

We don't want to display the keyring in the ansible log.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocommon: do not log keyring secret
Guillaume Abrioux [Mon, 9 Aug 2021 12:57:33 +0000 (14:57 +0200)]
common: do not log keyring secret

let's not display any keyring secret by default in ansible log.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-dashboard: fix TLS cert openssl generation
Dimitri Savineau [Mon, 9 Aug 2021 14:33:40 +0000 (10:33 -0400)]
ceph-dashboard: fix TLS cert openssl generation

With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext
doesn't exist.
As a solution, this uses the default openssl.cnf configuration file as a
template and add the subjectAltName in the v3_ca section. This temp openssl
configuration file is removed after the TLS certificate creation.
This patch also move the run_once statement at the block level.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoFixes typo in rgw-add-users-buckets playbook
VasishtaShastry [Fri, 6 Aug 2021 10:40:19 +0000 (16:10 +0530)]
Fixes typo in rgw-add-users-buckets playbook

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
4 years agodashboard: subj_alt_names fact refactor
Guillaume Abrioux [Thu, 5 Aug 2021 13:00:49 +0000 (15:00 +0200)]
dashboard: subj_alt_names fact refactor

the current way the variable is built results in:

```
2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false
  ansible_facts:
    subj_alt_names: |-
      subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/
```

which is incorrect.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoadopt: import rgw ssl certificate into kv store
Guillaume Abrioux [Wed, 28 Jul 2021 19:50:15 +0000 (21:50 +0200)]
adopt: import rgw ssl certificate into kv store

Without this, when rgw is managed by cephadm, it fails to start because
the ssl certificate isn't present in the kv store.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987010
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1988404
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: remove nfs pool and namespace
Dimitri Savineau [Wed, 4 Aug 2021 19:11:59 +0000 (15:11 -0400)]
cephadm-adopt: remove nfs pool and namespace

This has been removed from the code (orch apply name).
The default pool name is now .nfs

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoinfra: use dedicated variables for balancer status
Dimitri Savineau [Tue, 3 Aug 2021 15:58:49 +0000 (11:58 -0400)]
infra: use dedicated variables for balancer status

The balancer status is registered during the cephadm-adopt, rolling_update
and swith2container playbooks. But it is also used in the ceph-handler role
which is included in those playbooks too.
Even if the ceph-handler tasks are skipped for rolling_update and
switch2container, the balancer_status variable is erased with the skip task
result.

play1:
  register: balancer_status
play2:
  register: balancer_status <-- skipped
play3:
  when: (balancer_status.stdout | from_json)['active'] | bool

This leads to issue like:

The conditional check '(balancer_status.stdout | from_json)['active'] | bool'
failed. The error was: Unexpected templating type error occurred on
({% if (balancer_status.stdout | from_json)['active'] | bool %} True
{% else %} False {% endif %}): expected string or buffer.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopodman pids.max default value is 2048, docker's one is 4096 which are
Teoman ONAY [Tue, 3 Aug 2021 14:06:53 +0000 (16:06 +0200)]
podman pids.max default value is 2048, docker's one is 4096 which are
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.

pids-limit set to unlimited regardless of the container engine.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
4 years agoceph-defaults: remove radosgw_civetweb_ variables
Dimitri Savineau [Thu, 29 Jul 2021 15:42:03 +0000 (11:42 -0400)]
ceph-defaults: remove radosgw_civetweb_ variables

radosgw_civetweb_xxx variables are legacy variables and users should
have switched to radosgw_frontend_xxx variables instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoosds: use osd pool ls instead of osd dump command
Dimitri Savineau [Wed, 28 Jul 2021 18:54:15 +0000 (14:54 -0400)]
osds: use osd pool ls instead of osd dump command

The ceph osd pool ls detail command is a subset of the ceph osd dump
command.

$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: exit on user creation failure
Dimitri Savineau [Wed, 28 Jul 2021 16:27:00 +0000 (12:27 -0400)]
library: exit on user creation failure

When the ceph dashboard user creation fails then the issue is hidden
as we don't check the return code and don't print the error message
in the module output.

This ends up with a failure on the ceph dashboard set roles command saying
that the user doesn't exist.

By failing on the user creation, we will have an explicit explaination of
the issue (like weak password).

Closes: #6197
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agorolling_update: get ceph version when mons exist
Dimitri Savineau [Thu, 29 Jul 2021 16:26:33 +0000 (12:26 -0400)]
rolling_update: get ceph version when mons exist

eec3878 introduced a regression for upgrade scenarios where there's no
monitor nodes at all (like ganesha standalone, external clients, etc..)

TASK [get the ceph release being deployed] ************************************
task path: infrastructure-playbooks/rolling_update.yml:121
Thursday 29 July 2021  15:55:29 +0000 (0:00:00.484)       0:00:15.802 *********
fatal: [client0]: FAILED! =>
  msg: '''dict object'' has no attribute ''mons'''

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoinfrastructure-playbooks: Get Ceph info in check mode
Benoît Knecht [Mon, 26 Jul 2021 15:10:19 +0000 (17:10 +0200)]
infrastructure-playbooks: Get Ceph info in check mode

In the `set osd flags` block, run the Ceph commands that gather information
from the cluster (and don't make any changes to it) even when running in check
mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
4 years agoceph-handler: Fix osd handler in check mode
Benoît Knecht [Mon, 26 Jul 2021 11:03:56 +0000 (13:03 +0200)]
ceph-handler: Fix osd handler in check mode

Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
4 years agoceph-defaults: add missing grafana dashboards
Dimitri Savineau [Tue, 27 Jul 2021 14:30:30 +0000 (10:30 -0400)]
ceph-defaults: add missing grafana dashboards

The radosgw-sync-overview and rbd-details grafana dashboars were missing
from the list.

Closes: #6758
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoupdate: check the ceph release
Guillaume Abrioux [Mon, 26 Jul 2021 09:19:36 +0000 (11:19 +0200)]
update: check the ceph release

Check early which Ceph release is going to be deployed and fail if it
doesn't correspond to the ceph-ansible version being used.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoalertmanager: allow disable dashboard tls verify
Dimitri Savineau [Fri, 23 Jul 2021 14:27:55 +0000 (10:27 -0400)]
alertmanager: allow disable dashboard tls verify

When using self-signed/untrusted CA certificates, alertmanager displays
an error in logs. With this commit this should make those messages
disappear.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agomultisite: use node fqdn for endpoints when https
Dimitri Savineau [Fri, 9 Jul 2021 21:24:09 +0000 (17:24 -0400)]
multisite: use node fqdn for endpoints when https

When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge: support osd_auto_discovery
Guillaume Abrioux [Wed, 21 Jul 2021 21:16:59 +0000 (23:16 +0200)]
purge: support osd_auto_discovery

This adds a task that zaps by osd id so we can support the scenario
where osds were deployed with `osd_auto_discovery` is true.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agopurge: merge playbooks
Guillaume Abrioux [Tue, 13 Jul 2021 16:48:42 +0000 (18:48 +0200)]
purge: merge playbooks

This refactor merges the two playbooks so we only have to maintain 1
playbook.
(Symlink the old purge-container-cluster.yml playbook for backward
 compatibility).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>