]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agoswitch_to_container: fix osd systemd regex
Dimitri Savineau [Thu, 4 Jun 2020 20:57:17 +0000 (16:57 -0400)]
switch_to_container: fix osd systemd regex

The systemd LOAD and ACTIVE fileds could have more than one space between
both values.
This update the systemd regex the same way we're using it in different
part of the code.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw multisite: add master zone endpoints to zonegroup
Ali Maredia [Fri, 5 Jun 2020 21:21:27 +0000 (21:21 +0000)]
rgw multisite: add master zone endpoints to zonegroup

We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228
Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agomergify: remove merge on skip ci
Dimitri Savineau [Tue, 9 Jun 2020 13:23:04 +0000 (09:23 -0400)]
mergify: remove merge on skip ci

This rule will probably never be applyied and at the moment this is
creating a cancelled job in the CI status.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgwloadbalancer undefined index variable
Ansible Deployment User [Tue, 26 May 2020 11:18:03 +0000 (13:18 +0200)]
rgwloadbalancer undefined index variable

The vrrp_instances variable is using a loop with index but the index_var
wasn't defined.
As a result, the fact task was failing on this undefined index variable.

The task includes an option with an undefined variable. The error was:
'index' is undefined

Closes: #5395
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
5 years agoceph-nfs: add stable noarch repository
Dimitri Savineau [Fri, 15 May 2020 15:20:08 +0000 (11:20 -0400)]
ceph-nfs: add stable noarch repository

When using the stable nfs ganesha repository, we need have both arch
and noarch repositories enabled.
Currently the noarch repository is missing which cause the non
containerized deployment to fail.

Closes: #5375
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_container: refact wait for pg check
Guillaume Abrioux [Fri, 15 May 2020 08:58:40 +0000 (10:58 +0200)]
switch_to_container: refact wait for pg check

There is no need to make this check with several steps.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: report coverage status for unittests
Guillaume Abrioux [Tue, 12 May 2020 18:13:35 +0000 (20:13 +0200)]
tests: report coverage status for unittests

This commit adds pytest-cov usage in unittests

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: add tests
Guillaume Abrioux [Tue, 12 May 2020 12:39:20 +0000 (14:39 +0200)]
ceph_pool: add tests

Add unit tests for ceph_pool module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: support setting application at pool creation
Guillaume Abrioux [Tue, 12 May 2020 12:33:36 +0000 (14:33 +0200)]
ceph_pool: support setting application at pool creation

This commit adds the required changes in order to support
setting application pool at initial pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: refact exec_commands()
Guillaume Abrioux [Tue, 12 May 2020 11:54:29 +0000 (13:54 +0200)]
ceph_pool: refact exec_commands()

We never multiple ceph command at a time, so there's no need to have this design.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update pools definitions
Guillaume Abrioux [Wed, 29 Apr 2020 23:23:20 +0000 (01:23 +0200)]
tests: update pools definitions

setting attributes with empty string is a bad user input.
Also, removing `rule_name` attribute when creating a code erasure pool.
(this rule isnt intended for code erasure pool type).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: introduce ceph_pool module calls
Guillaume Abrioux [Tue, 28 Apr 2020 16:08:59 +0000 (18:08 +0200)]
common: introduce ceph_pool module calls

This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agolibrary: add ceph_pool module
Guillaume Abrioux [Sat, 4 Apr 2020 01:44:05 +0000 (03:44 +0200)]
library: add ceph_pool module

This commit adds a new module `ceph_pool`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: fix target_size_ratio task enablement
Guillaume Abrioux [Thu, 14 May 2020 09:00:12 +0000 (11:00 +0200)]
common: fix target_size_ratio task enablement

The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: always set ceph_run_cmd and ceph_admin_command
Guillaume Abrioux [Thu, 14 May 2020 09:06:41 +0000 (11:06 +0200)]
facts: always set ceph_run_cmd and ceph_admin_command

always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests/library: parametrize ceph_volume objecstore
Dimitri Savineau [Tue, 31 Mar 2020 20:51:55 +0000 (16:51 -0400)]
tests/library: parametrize ceph_volume objecstore

This adds the objectstore testing for both filestore and bluestore on
the ceph_volume module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests/library: define container cmd once
Dimitri Savineau [Tue, 31 Mar 2020 20:31:40 +0000 (16:31 -0400)]
tests/library: define container cmd once

In containerized deployment, the ceph_volume module will always uses
the same container command prefix for all actions.
Instead of duplicate this code in all container tests we can define it
once.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: force using the more recent build
Guillaume Abrioux [Thu, 14 May 2020 14:24:59 +0000 (16:24 +0200)]
tests: force using the more recent build

We should use  `latest-master-devel` for switch_to_containers job.
Otherwise it might happen we actually downgrade the ceph version when
the image used is older than the rpm initially used for installing ceph.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotest: set sitepackages=false in tox
Guillaume Abrioux [Wed, 13 May 2020 15:49:07 +0000 (17:49 +0200)]
test: set sitepackages=false in tox

Otherwise it might try to use the system installed version of ansible
when there's one available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: manage dashboard nodes
Dimitri Savineau [Thu, 16 Apr 2020 16:17:12 +0000 (12:17 -0400)]
docker2podman: manage dashboard nodes

The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocker2podman: pull images from docker daemon
Dimitri Savineau [Thu, 16 Apr 2020 15:30:11 +0000 (11:30 -0400)]
docker2podman: pull images from docker daemon

The docker2podman playbook only installs the podman package and updates
the systemd units with the right container_binary value.

We never pull the container image so if one service is restarted then
the container image will be pulled first before the service can start
which could cause longer downstream.

To avoid to download the container image from internet again we can just
pull it from the local docker daemon.

The container_{binding,package,service}_name variables are removed
because they are only used in the ceph-container-engine role which
isn't call in this playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: fix rbdmirror group name
Dimitri Savineau [Thu, 30 Apr 2020 20:06:55 +0000 (16:06 -0400)]
rolling_update: fix rbdmirror group name

The rbdmirror group name was using the wrong variable definition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: allow disabling grafana api ssl verify
Dimitri Savineau [Tue, 28 Apr 2020 17:31:01 +0000 (13:31 -0400)]
dashboard: allow disabling grafana api ssl verify

When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.

Closes: #5324
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: bind mount ganesha log directory
Dimitri Savineau [Mon, 4 May 2020 22:39:05 +0000 (18:39 -0400)]
ceph-nfs: bind mount ganesha log directory

The current ganesha log directory is only present in the container
and not bind mount on the host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-validate: Expand templates in rgw_create_pools
Benoît Knecht [Mon, 11 May 2020 14:21:55 +0000 (16:21 +0200)]
ceph-validate: Expand templates in rgw_create_pools

Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.

See #5348 for details.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-rgw: Make sure pool name templates are expanded
Benoît Knecht [Mon, 11 May 2020 13:49:32 +0000 (15:49 +0200)]
ceph-rgw: Make sure pool name templates are expanded

It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agodocs: minor fixes to README-MULTISITE.md
Ali Maredia [Mon, 27 Apr 2020 22:04:58 +0000 (18:04 -0400)]
docs: minor fixes to README-MULTISITE.md

Make all of the hosts start at 1 and not 0,
also make some minor changes in scenario 3 to
remova an inconsistency.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agoceph-validate: Fix "fail on unsupported CentOS release"
Benoît Knecht [Fri, 8 May 2020 12:39:52 +0000 (14:39 +0200)]
ceph-validate: Fix "fail on unsupported CentOS release"

The `dashboard_enabled` condition used a `true` filter (which doesn't exist)
instead of the `bool` filter.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-rgw: use match instead of equalto from jinja2
Dimitri Savineau [Wed, 6 May 2020 17:32:18 +0000 (13:32 -0400)]
ceph-rgw: use match instead of equalto from jinja2

The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix internal ganesha deployment
Dimitri Savineau [Wed, 6 May 2020 13:31:34 +0000 (09:31 -0400)]
ceph-nfs: fix internal ganesha deployment

Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix keyring copy for external ganesha
Dimitri Savineau [Tue, 5 May 2020 14:46:14 +0000 (10:46 -0400)]
ceph-nfs: fix keyring copy for external ganesha

Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agonfs: fix 2 typo
Guillaume Abrioux [Thu, 30 Apr 2020 14:21:14 +0000 (16:21 +0200)]
nfs: fix 2 typo

The condition is missing an index here which makes the playbook failing.

Typical error:
```
The conditional check 'not item.get('skipped', False)' failed. The error was: error while evaluating conditional (not item.get('skipped', False)): 'list object' has no attribute 'get'",
```

Also, adds the missing '/keyring' on the `exec_cmd_nfs` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831342
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: fix IPv6 _radosgw_address interface
Dimitri Savineau [Mon, 27 Apr 2020 20:01:24 +0000 (16:01 -0400)]
ceph-facts: fix IPv6 _radosgw_address interface

When using radosgw_interface and IPv6 setup then the _radosgw_address
fact doesn't use square brackets compared to the radosgw_address and
radosgw_address_block configuration.

Closes: #5325
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRefresh ceph dashboard user role
fmount [Fri, 10 Apr 2020 13:04:52 +0000 (15:04 +0200)]
Refresh ceph dashboard user role

This change allows the operator to refresh the
ceph dashboard admin role on multiple ceph-ansible
executions.
In the current state the role is set only when the
user is created, and there's no way to change it if
the user exists.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826002
Signed-off-by: fmount <fpantano@redhat.com>
5 years agoceph-dashboard: fix mgr dashboard IPv6 fact
Dimitri Savineau [Thu, 23 Apr 2020 18:34:39 +0000 (14:34 -0400)]
ceph-dashboard: fix mgr dashboard IPv6 fact

15ed9ee introduced a regression for the mgr dashboard daemon using
IPv6 since the mgr dashboard configuration doesn't support brackets.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827299
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocs: fix multisite docs add endpoints var in rgw_instances section
Ali Maredia [Thu, 23 Apr 2020 15:12:13 +0000 (11:12 -0400)]
docs: fix multisite docs add endpoints var in rgw_instances section

+ Mention of this variable was missing in the original version.

+ Minor revisions around the concept of secondary zone.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agoReadd CentOS 7 with conditions
Dimitri Savineau [Thu, 2 Apr 2020 19:58:11 +0000 (15:58 -0400)]
Readd CentOS 7 with conditions

The CentOS 7 distribution could still be used be deploying ceph if
  - it's a containerized deployment
  - it's a non containerized deployment without the dashboard (due to
missing python3 libraries).

The ceph_stable_redhat_distro variable has been remove because we can
rely on the ansible_distribution_major_version fact instead.

The copr el8 repository configuration is only applied for CentOS 8.

The ceph-mgr-dashboard package is only installed when the
dashboard_enabled variable is set to true.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: add back nfs testing on master
Guillaume Abrioux [Thu, 26 Mar 2020 07:21:25 +0000 (08:21 +0100)]
tests: add back nfs testing on master

This commit adds back nfs testing on master branch (containerized
scenario only).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agomds: don't enable application pool on cephfs pools
Guillaume Abrioux [Tue, 21 Apr 2020 08:29:23 +0000 (10:29 +0200)]
mds: don't enable application pool on cephfs pools

this commit removes the task which enable application on cephfs pools.

See: https://tracker.ceph.com/issues/43761

Fixes: #5278
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotypo: updating type check on rc
ianwatsonrh [Tue, 21 Apr 2020 13:14:46 +0000 (14:14 +0100)]
typo: updating type check on rc

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826884
Signed-off-by: ianwatsonrh <ianwatson@redhat.com>
5 years agodoc: add day-2 operations documentation
Guillaume Abrioux [Tue, 21 Apr 2020 07:50:27 +0000 (09:50 +0200)]
doc: add day-2 operations documentation

This commit is the first of a serie in order to describe all day-2 operations
that are possible via ceph-ansible using a set of playbook provided in
`infrastructure-playbooks` directory.

Fixes: #5061
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: fix py2 on skipped tasks
Dimitri Savineau [Mon, 20 Apr 2020 13:47:31 +0000 (09:47 -0400)]
filestore-to-bluestore: fix py2 on skipped tasks

When using skipped variables with from_json filter and python2 then we
need to have a default value otherwise the skipped task will fail.

Unexpected templating type error occurred on
({{ (ceph_volume_lvm_list.stdout | from_json) }}): expected string or
buffer

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoUpdated use of deprecated filter
abaird-rh [Fri, 17 Apr 2020 16:34:32 +0000 (17:34 +0100)]
Updated use of deprecated filter

This was removed in Ansible 2.9.

[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.

Rename 'version_compare' to the function 'version'.

version_compose was renamed to version since ansible 2.5

Signed-off-by: abaird-rh <abaird@redhat.com>
5 years agolibrary/ceph_volume: look for error messages in stderr
Rishabh Dave [Tue, 7 Apr 2020 11:50:35 +0000 (17:20 +0530)]
library/ceph_volume: look for error messages in stderr

Error message were moved to from stdout in stderr here -
https://github.com/ceph/ceph/commit/b8d6dcbe9f803c96c0af68da54f1262e9b6a9e77#diff-20f7c578a4e69ec61a5869d706567a24R137.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1793542
Signed-off-by: Rishabh Dave <ridave@redhat.com>
5 years agodocs: Update and consolidate rgw multisite documentation
Ali Maredia [Thu, 16 Apr 2020 19:47:17 +0000 (19:47 +0000)]
docs: Update and consolidate rgw multisite documentation

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agorolling_update: require_osd_release pacific
Dimitri Savineau [Thu, 16 Apr 2020 18:37:02 +0000 (14:37 -0400)]
rolling_update: require_osd_release pacific

Since [1] we need to set pacific for the required OSD release during the
upgrade.

[1] https://github.com/ceph/ceph/commit/cc99c3bc

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agomds: fix --limit run against mds nodes
Guillaume Abrioux [Thu, 9 Apr 2020 23:02:06 +0000 (01:02 +0200)]
mds: fix --limit run against mds nodes

This commit fixes --limit runs against mds nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agonfs: create empty rados index object for nfs standalone
Guillaume Abrioux [Fri, 10 Apr 2020 09:05:25 +0000 (11:05 +0200)]
nfs: create empty rados index object for nfs standalone

This commit creates an empty rados index object even when deploying
standalone nfs-ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822328
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-validate: update RHEL requirement for RHCS
Dimitri Savineau [Thu, 9 Apr 2020 18:00:52 +0000 (14:00 -0400)]
ceph-validate: update RHEL requirement for RHCS

We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: fix monitor_name error when scaling out OSDs
Guillaume Abrioux [Thu, 9 Apr 2020 12:48:53 +0000 (14:48 +0200)]
osd: fix monitor_name error when scaling out OSDs

This commit fixes a bug when trying to scale out osd nodes with
`crush_rule_config` is enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822599
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAllow user to specify grafana_server_fqdn
Paulo Matias [Tue, 17 Mar 2020 02:40:20 +0000 (23:40 -0300)]
Allow user to specify grafana_server_fqdn

This is needed to get a TLS certificate to validate correctly.

If unspecified, auto-detected grafana_server_addr is used.

Signed-off-by: Paulo Matias <matias@ufscar.br>
5 years agoPrometheus APIs are only available through plain http
Paulo Matias [Tue, 17 Mar 2020 02:39:58 +0000 (23:39 -0300)]
Prometheus APIs are only available through plain http

Trying to access these APIs through TLS produces "Could not reach
external API" errors in Ceph dashboard.

Signed-off-by: Paulo Matias <matias@ufscar.br>
5 years agoUse a tempfile directory to store restart scripts
Matthew Vernon [Thu, 28 Nov 2019 17:28:53 +0000 (17:28 +0000)]
Use a tempfile directory to store restart scripts

Make a tempfile directory and copy the restart scripts there (and then
execute them from there), rather than using insecure known filenames
in /tmp/

This is a partial fix for ceph/ceph-ansible#2937

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
5 years agoswitch-to-containers: set and unset osd flags
Guillaume Abrioux [Fri, 3 Apr 2020 13:36:23 +0000 (15:36 +0200)]
switch-to-containers: set and unset osd flags

The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: use tasks_from when including ceph-facts
Guillaume Abrioux [Fri, 3 Apr 2020 13:07:54 +0000 (15:07 +0200)]
update: use tasks_from when including ceph-facts

When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-mgr: add saml python lib for dashboard SSO
Dimitri Savineau [Fri, 3 Apr 2020 20:33:11 +0000 (16:33 -0400)]
ceph-mgr: add saml python lib for dashboard SSO

The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_key: fetch key when needed
Guillaume Abrioux [Fri, 3 Apr 2020 16:23:00 +0000 (18:23 +0200)]
ceph_key: fetch key when needed

Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_key: fix idempotency when no secret is passed
Guillaume Abrioux [Fri, 3 Apr 2020 08:24:32 +0000 (10:24 +0200)]
ceph_key: fix idempotency when no secret is passed

553584cbd0d014429e665f998776e8d198f72d2b introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotox: replace testinfra by pytest for add-mgrs
Dimitri Savineau [Thu, 2 Apr 2020 20:26:48 +0000 (16:26 -0400)]
tox: replace testinfra by pytest for add-mgrs

The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox-docker2podman: update container image tag
Dimitri Savineau [Wed, 1 Apr 2020 18:51:49 +0000 (14:51 -0400)]
tox-docker2podman: update container image tag

The current docker to podman scenario is using the nautilus container
image tag instead of master.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agovagrant: force centos 8.1 libvirt image
Dimitri Savineau [Wed, 1 Apr 2020 19:46:20 +0000 (15:46 -0400)]
vagrant: force centos 8.1 libvirt image

The current centos/8 vagrant image (libvirt) is still using the
CentOS 8.0 release (1905) while the 8.1 release (1911) is already
available since few months.
Using an update CentOS 8 release fixes slow ceph-volume/lvm commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocker2podman: call `container_options_facts.yml` on osd nodes
Guillaume Abrioux [Wed, 1 Apr 2020 12:20:05 +0000 (14:20 +0200)]
docker2podman: call `container_options_facts.yml` on osd nodes

We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_key: remove 'update' state
Guillaume Abrioux [Tue, 17 Mar 2020 14:34:11 +0000 (15:34 +0100)]
ceph_key: remove 'update' state

With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: use default crush rule name when needed
Guillaume Abrioux [Fri, 27 Mar 2020 15:21:09 +0000 (16:21 +0100)]
osd: use default crush rule name when needed

When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add more coverage in external_clients scenario
Guillaume Abrioux [Fri, 27 Mar 2020 16:56:26 +0000 (17:56 +0100)]
tests: add more coverage in external_clients scenario

Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: support changing default rule even when osd_crush_location isn't defined
Guillaume Abrioux [Thu, 12 Mar 2020 11:14:01 +0000 (12:14 +0100)]
osd: support changing default rule even when osd_crush_location isn't defined

Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoremove *docker*.yml symlinks
Guillaume Abrioux [Tue, 31 Mar 2020 12:08:30 +0000 (14:08 +0200)]
remove *docker*.yml symlinks

This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge-container: get *all* osds id
Guillaume Abrioux [Tue, 31 Mar 2020 11:59:23 +0000 (13:59 +0200)]
purge-container: get *all* osds id

Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocontainer: remove ulimit nofile parameter
Dimitri Savineau [Fri, 10 Jan 2020 20:35:58 +0000 (15:35 -0500)]
container: remove ulimit nofile parameter

Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_volume: fix multiple db/wal/journal devices
Dimitri Savineau [Fri, 27 Mar 2020 21:16:41 +0000 (17:16 -0400)]
ceph_volume: fix multiple db/wal/journal devices

When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: drop debian support
Dimitri Savineau [Thu, 26 Mar 2020 21:39:09 +0000 (17:39 -0400)]
rhcs: drop debian support

Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: update release to 5 for octopus
Dimitri Savineau [Thu, 26 Mar 2020 18:41:14 +0000 (14:41 -0400)]
rhcs: update release to 5 for octopus

RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: remove legacy comment
Guillaume Abrioux [Thu, 26 Mar 2020 06:38:36 +0000 (07:38 +0100)]
defaults: remove legacy comment

This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: change nfs_ganesha_stable_branch
Guillaume Abrioux [Wed, 25 Mar 2020 21:09:32 +0000 (22:09 +0100)]
defaults: change nfs_ganesha_stable_branch

In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agomergify: Update with stable-5.0 branch
Dimitri Savineau [Tue, 24 Mar 2020 14:10:30 +0000 (10:10 -0400)]
mergify: Update with stable-5.0 branch

Add action to backport commits to stable-5.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: update ceph_stable_redhat_distro
Dimitri Savineau [Wed, 25 Mar 2020 18:35:55 +0000 (14:35 -0400)]
ceph-defaults: update ceph_stable_redhat_distro

Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agogithub: update issue report template
Guillaume Abrioux [Wed, 25 Mar 2020 12:11:16 +0000 (13:11 +0100)]
github: update issue report template

This commit updates the 'issue report template' to ask for full
ceph-ansible log.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: remove some legacy in tox.ini
Guillaume Abrioux [Tue, 24 Mar 2020 14:04:42 +0000 (15:04 +0100)]
tests: remove some legacy in tox.ini

This commit removes some leftover in tox.ini

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: fix rgw_instances_all fact
Dimitri Savineau [Tue, 24 Mar 2020 21:11:44 +0000 (17:11 -0400)]
ceph-facts: fix rgw_instances_all fact

The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodoc/tests: bump to ansible 2.9 on master
Guillaume Abrioux [Tue, 24 Mar 2020 14:48:58 +0000 (15:48 +0100)]
doc/tests: bump to ansible 2.9 on master

Add testing against ansible 2.9 on master branch.
This commit also updates the documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update mgr dashboard socket listening test
Dimitri Savineau [Tue, 24 Mar 2020 18:28:51 +0000 (14:28 -0400)]
tests: update mgr dashboard socket listening test

Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: register mark in pytest configuration
Dimitri Savineau [Wed, 15 Jan 2020 17:48:10 +0000 (12:48 -0500)]
tests: register mark in pytest configuration

Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: add dashboard testinfra configuration
Dimitri Savineau [Fri, 12 Jul 2019 18:56:01 +0000 (14:56 -0400)]
tests: add dashboard testinfra configuration

This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: add a default value for 'default' in crush_rules v6.0.0alpha1
Guillaume Abrioux [Tue, 24 Mar 2020 08:56:45 +0000 (09:56 +0100)]
osd: add a default value for 'default' in crush_rules

Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd pacific release
Dimitri Savineau [Mon, 23 Mar 2020 18:22:46 +0000 (14:22 -0400)]
Add pacific release

Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofacts: fix typo
Guillaume Abrioux [Mon, 23 Mar 2020 15:14:29 +0000 (16:14 +0100)]
facts: fix typo

This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agonfs: fix nfs with external ceph cluster support
Guillaume Abrioux [Thu, 19 Mar 2020 19:44:20 +0000 (20:44 +0100)]
nfs: fix nfs with external ceph cluster support

This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: allow to set read-only admin user
Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user

This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: add registry name on dashboard vars
Dimitri Savineau [Tue, 17 Mar 2020 00:45:03 +0000 (20:45 -0400)]
ceph-defaults: add registry name on dashboard vars

We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: update grafana container tag
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag

Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: Fix system_secret_key variable handling
petruha [Mon, 16 Mar 2020 16:35:20 +0000 (17:35 +0100)]
ceph-facts: Fix system_secret_key variable handling

This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
5 years agorhcs_edits: Update grafana version
Boris Ranto [Mon, 16 Mar 2020 16:08:03 +0000 (17:08 +0100)]
rhcs_edits: Update grafana version

We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
5 years agoconfig: remove legacy option in ceph.conf.j2
Guillaume Abrioux [Mon, 16 Mar 2020 08:55:20 +0000 (09:55 +0100)]
config: remove legacy option in ceph.conf.j2

This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agohandler: add rgw multi-instances support
Dimitri Savineau [Thu, 12 Mar 2020 16:06:55 +0000 (17:06 +0100)]
handler: add rgw multi-instances support

This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: add multi-instances support when deploying multisite
Guillaume Abrioux [Mon, 9 Mar 2020 10:05:01 +0000 (11:05 +0100)]
rgw: add multi-instances support when deploying multisite

This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: open radosgw ports for multi instances
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances

When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-container: clean legacy code
Guillaume Abrioux [Thu, 12 Mar 2020 11:22:02 +0000 (12:22 +0100)]
purge-container: clean legacy code

This commit removes a register which isn't used in this playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate osd pool set size command
Dimitri Savineau [Wed, 11 Mar 2020 00:50:55 +0000 (20:50 -0400)]
update osd pool set size command

Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: fix a typo in create_realm_zonegroup_zone_lists
Guillaume Abrioux [Tue, 10 Mar 2020 13:07:24 +0000 (14:07 +0100)]
rgw: fix a typo in create_realm_zonegroup_zone_lists

This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: add retries/until on firewalld start task
Guillaume Abrioux [Mon, 9 Mar 2020 09:40:54 +0000 (10:40 +0100)]
infra: add retries/until on firewalld start task

This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>