]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
4 years agoceph-osd: add prepare_osd tag to lvm-batch scenario
Matthew Vernon [Wed, 10 Mar 2021 16:36:52 +0000 (16:36 +0000)]
ceph-osd: add prepare_osd tag to lvm-batch scenario

Sometimes it's useful to be able to skip the OSD creation step when
running ceph-ansible (cf #1777). The lvm scenario has a prepare_osd
tag on the relevant play. This commit adds the same tag to the
lvm-batch scenario.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
4 years agoDocs: fix some typos
Matthew Vernon [Wed, 10 Mar 2021 16:53:45 +0000 (16:53 +0000)]
Docs: fix some typos

While working on the previous PR, I found a couple of typos in the
docs. This fixes those.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
4 years agoUse ansible_facts
Alex Schultz [Wed, 3 Mar 2021 14:43:50 +0000 (07:43 -0700)]
Use ansible_facts

It has come to our attention that using ansible_* vars that are
populated with INJECT_FACTS_AS_VARS=True is not very performant.  In
order to be able to support setting that to off, we need to update the
references to use ansible_facts[<thing>] instead of ansible_<thing>.

Related: ansible#73654
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1935406
Signed-off-by: Alex Schultz <aschultz@redhat.com>
4 years agotests: increase nb of rerun in pytest
Guillaume Abrioux [Wed, 3 Mar 2021 07:51:25 +0000 (08:51 +0100)]
tests: increase nb of rerun in pytest

In order to avoid false positive in the CI that I've been unable to
reproduce.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoFix typo and broken link for documenting RGW frontends
Matthew Vernon [Mon, 22 Feb 2021 14:26:10 +0000 (14:26 +0000)]
Fix typo and broken link for documenting RGW frontends

http://docs.ceph.com/docs/nautilus/radosgw/frontends/ 404s so replace
it with a working "latest" docs link, and correct the spelling of
"additional" while I'm at it.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
4 years agodashboard: add missing parameter in `ceph_cmd`
Guillaume Abrioux [Mon, 1 Mar 2021 14:22:22 +0000 (15:22 +0100)]
dashboard: add missing parameter in `ceph_cmd`

the `ceph_cmd` fact is missing the `--net=host` parameter.

Some tasks consuming this fact can fail like following:

```
Error: error configuring network namespace for container b8ec913db1fb694ae683faf202680de7a59c714a004e533aba87e8503d29261f: Missing CNI default network
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1931365
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agodefaults: update rhcs dashboard images versions
Guillaume Abrioux [Wed, 17 Feb 2021 03:22:34 +0000 (04:22 +0100)]
defaults: update rhcs dashboard images versions

The current dashboard images deployed have a bad health index.
Updating to a newer version fixes this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925350
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorequirements.txt: Move the six dependency into the general requirements
Florian Haas [Fri, 12 Feb 2021 08:29:00 +0000 (09:29 +0100)]
requirements.txt: Move the six dependency into the general requirements

config_template.py depends on six, which isn't listed in the default
requirements.txt. This previously frequently wasn't a problem, because
six used to be a standard package being installed into a venv, and
lots of other projects depended on it.

It also does get installed for unit and integration tests via
tests/requirements.txt, so any broken dependency on six wouldn't be
detected by tox runs.

However, as other projects and distributions have phased out Python
2.7 support the dependency on six becomes less common. Thus, as long
as ceph-ansible does require it for config_template.py, add it to the
base requirements.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
4 years agolibrary: do not always add --yes in batch mode
Guillaume Abrioux [Tue, 9 Feb 2021 14:28:08 +0000 (15:28 +0100)]
library: do not always add --yes in batch mode

When asking `ceph-volume` to report only in `lvm batch` context, there's
a bug described in bz1896803 [1] when `--yes` is passed (which by the
way isn't necessary with `--report`).
This commit ensure `--yes` isn't passed to `ceph-volume` when `--report`
is used.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896803

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896803
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoAdd quincy release
Dimitri Savineau [Mon, 1 Feb 2021 22:39:07 +0000 (17:39 -0500)]
Add quincy release

Add the 17th ceph release: quincy.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge: rm service-cid files
Guillaume Abrioux [Tue, 2 Feb 2021 20:22:50 +0000 (21:22 +0100)]
purge: rm service-cid files

This commit makes sure purge playbooks remove those file if for any reason they
have been left.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1920900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoswitch2container: do not serialize the ceph-crash migration
Guillaume Abrioux [Thu, 11 Feb 2021 15:28:31 +0000 (16:28 +0100)]
switch2container: do not serialize the ceph-crash migration

There's no need to slow down the playbook execution time by migrating
all the `ceph-crash` instances in a serial way. Let's remove the
`serial: 1` so the migration is achieved in a parallel way.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: increase `mon_max_pg_per_osd`
Guillaume Abrioux [Wed, 10 Feb 2021 14:49:38 +0000 (15:49 +0100)]
tests: increase `mon_max_pg_per_osd`

we aren't deploying enough OSD daemon, so it fails like following:

```
  stderr: 'Error ERANGE: pool id 10 pg_num 256 size 2 would mean 1536 total pgs, which exceeds max 1500 (mon_max_pg_per_osd 250 * num_in_osds 6)'
```

Let's increase the value of `mon_max_pg_per_osd` in order to get around
this issue in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agodoc: add a note about "latest" tags
Guillaume Abrioux [Thu, 11 Feb 2021 12:58:27 +0000 (13:58 +0100)]
doc: add a note about "latest" tags

See the change for details.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agomergify: add stable-6.0 backport configuration v7.0.0alpha1
Guillaume Abrioux [Wed, 10 Feb 2021 13:53:47 +0000 (14:53 +0100)]
mergify: add stable-6.0 backport configuration

This adds the stable-6.0 backport configuration in mergify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agocephadm-adopt: remove prometheus workaround v6.0.0
Dimitri Savineau [Thu, 21 Jan 2021 20:26:09 +0000 (15:26 -0500)]
cephadm-adopt: remove prometheus workaround

This was fixed by [1][2]

[1] https://tracker.ceph.com/issues/45120
[2] https://github.com/ceph/ceph/commit/252d4b30

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agodoc: update containerized deployment
Dimitri Savineau [Tue, 26 Jan 2021 19:03:27 +0000 (14:03 -0500)]
doc: update containerized deployment

This adds more documentation to the configuration and usage of
containerizerd deployment.

Closes: #6198
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agodoc: update the documentation
Guillaume Abrioux [Wed, 10 Feb 2021 12:47:21 +0000 (13:47 +0100)]
doc: update the documentation

- mention `stable-6.0` requirements.
- update some patterns.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorolling_update: enforce ceph-container-engine
Dimitri Savineau [Wed, 3 Feb 2021 22:39:49 +0000 (17:39 -0500)]
rolling_update: enforce ceph-container-engine

When running the rolling_update.yml playbook and adding the dashboard
component in the same time then the requirement (like container packages)
aren't installed.
This could lead to a failure in case of using authentication on the
container registry because the playbook will try to login on the registry
but podman/docker aren't yet installed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903504
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-common: enable rhcs tools repo for monitoring
Dimitri Savineau [Wed, 3 Feb 2021 15:28:24 +0000 (10:28 -0500)]
ceph-common: enable rhcs tools repo for monitoring

The monitoring node running grafana needs the rhcs tools repostory
enabled in non containerized deployment to be able to install the
ceph-grafana-dashboards rpm package.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918650
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: pin ansible-lint version
Guillaume Abrioux [Wed, 10 Feb 2021 06:41:43 +0000 (07:41 +0100)]
tests: pin ansible-lint version

This commit pins the ansible-lint version to 4.3.7 as ceph-ansible isn't
compatible with recent changes in 5.0.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: set `mon_max_pg_per_osd` in rgw_multisite
Guillaume Abrioux [Tue, 9 Feb 2021 14:50:43 +0000 (15:50 +0100)]
tests: set `mon_max_pg_per_osd` in rgw_multisite

Otherwise, the job fails when it tries to create a bucket with `s3cmd mb`
command because we have too many PGs per OSD.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorgw: fix a typo in multisite
Guillaume Abrioux [Thu, 4 Feb 2021 16:45:05 +0000 (17:45 +0100)]
rgw: fix a typo in multisite

if `rgw_zonegroupmaster` is not defined at the rgw instance level in
`rgw_instances` it will fallback to a wrong variable (`rgw_zonemaster`).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1925247
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorolling_update: exclude clients from node-exporter
Dimitri Savineau [Wed, 3 Feb 2021 18:07:24 +0000 (13:07 -0500)]
rolling_update: exclude clients from node-exporter

Since b105549 we don't install node-exporter on client nodes so we should
also exclude the client node from the node-exporter upgrade.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agodocs: nautilus uses ansible 2.9
Dimitri Savineau [Thu, 28 Jan 2021 20:28:14 +0000 (15:28 -0500)]
docs: nautilus uses ansible 2.9

This updates the ansible release required to deploy nautilus with the
stable-4.0 branch.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agodashboard: update with the new monitoring group
Dimitri Savineau [Wed, 3 Feb 2021 17:59:14 +0000 (12:59 -0500)]
dashboard: update with the new monitoring group

Since eefe11d the grafana-server group has been renamed to monitoring
but the dashboard playbook wasn't updated.
This was still working due to the backward compatibility added in the
ceph-facts role.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agovagrant: remove centos/8 workaround
Dimitri Savineau [Thu, 4 Feb 2021 22:10:20 +0000 (17:10 -0500)]
vagrant: remove centos/8 workaround

The CentOS 8 vagrant box has finally been updated [1] with a recent
version (the latest one 2011 which means CentOS 8.3).
We don't need to download the vagrant libvirt box with a direct url
anymore from the CentOS infrastructure.

[1] https://app.vagrantup.com/centos/boxes/8

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoupdate: update ceph release pattern in complete upgrade play
Guillaume Abrioux [Fri, 5 Feb 2021 21:27:38 +0000 (22:27 +0100)]
update: update ceph release pattern in complete upgrade play

since master is now deploying quincy, we must update this.
Otherwise, it will fail like following:

```
Error EPERM: require_osd_release cannot be lowered once it has been set
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agocommon: ensure shaman returns right repo
Guillaume Abrioux [Fri, 5 Feb 2021 19:41:21 +0000 (20:41 +0100)]
common: ensure shaman returns right repo

Due to recent changes in shaman, there's a chance it returns the wrong
repository from architecture point of view.
We can query shaman and ask for the correct architecture to get around
this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorolling_update: pg check refactor
Guillaume Abrioux [Thu, 4 Feb 2021 15:24:03 +0000 (16:24 +0100)]
rolling_update: pg check refactor

There's no need to achieve this in two tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agovalidate: fix a typo
Guillaume Abrioux [Thu, 4 Feb 2021 16:05:19 +0000 (17:05 +0100)]
validate: fix a typo

fixes a typo

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: remove legacy
Guillaume Abrioux [Thu, 4 Feb 2021 15:42:10 +0000 (16:42 +0100)]
tests: remove legacy

remove a legacy in tox environment definition

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: follow up on 7c9063b
Guillaume Abrioux [Wed, 3 Feb 2021 08:33:14 +0000 (09:33 +0100)]
tests: follow up on 7c9063b

7c9063b1d2b1af22feb65e70cd8c4dd2de179fb9 broke some scenarios.
This commit fixes them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agolibrary: fix idempotency in ceph_mgr_module
Dimitri Savineau [Thu, 21 Jan 2021 02:02:17 +0000 (21:02 -0500)]
library: fix idempotency in ceph_mgr_module

The ceph mgr command output is printed on stderr instead of stdout which
prevent to set the changed flag to false if the module is already enabled.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: use ceph_osd_flag module
Dimitri Savineau [Thu, 21 Jan 2021 17:12:17 +0000 (12:12 -0500)]
cephadm-adopt: use ceph_osd_flag module

There's no reason to not use the ceph_osd_flag module to set/unset osd
flags.
Also if there's no OSD nodes in the inventory then we don't need to
execute the set/unset play.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge-cluster: use parted ansible module
Dimitri Savineau [Tue, 12 Jan 2021 20:47:42 +0000 (15:47 -0500)]
purge-cluster: use parted ansible module

Instead of doing some scripting via the shell module, we can use the
parted ansible module to check the boot flag on partitions.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary/cephadm_bootstrap: add registry support
Dimitri Savineau [Wed, 6 Jan 2021 19:06:48 +0000 (14:06 -0500)]
library/cephadm_bootstrap: add registry support

This adds the custom registry auth support when using a registry with
authentication.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-defaults: use https for download.ceph.com
Dimitri Savineau [Mon, 1 Feb 2021 16:47:10 +0000 (11:47 -0500)]
ceph-defaults: use https for download.ceph.com

There's no reason to still use http on download.ceph.com instead of
https.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: use lvm batch on osd2 (all_daemons)
Guillaume Abrioux [Mon, 1 Feb 2021 19:32:37 +0000 (20:32 +0100)]
tests: use lvm batch on osd2 (all_daemons)

in order to test lvm batch in purge scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agopurge: zap and destroy db and wal devices for lvm batch
Guillaume Abrioux [Mon, 1 Feb 2021 15:51:07 +0000 (16:51 +0100)]
purge: zap and destroy db and wal devices for lvm batch

Those devices (db/wal) are never zapped in lvm batch deployment.
Iterating over `dedicated_devices` and `bluestore_wal_devices` fixes
this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1922926
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-facts: set rgw_instances_all fact once
Dimitri Savineau [Mon, 25 Jan 2021 19:40:00 +0000 (14:40 -0500)]
ceph-facts: set rgw_instances_all fact once

There's no need to set the rgw_instances_all fact for each node. We can
rely on run_once for that one.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: retrieve realm id for zone/zonegroup v6.0.0alpha7
Dimitri Savineau [Fri, 22 Jan 2021 17:45:32 +0000 (12:45 -0500)]
library: retrieve realm id for zone/zonegroup

When the zonegroup or the zone doesn't have a realm associated then
it's not possible to modify that ressource.
This patch allows to retrieve the current realm id and compare it to
the realm id from the realm in parameter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: use radosgw modules for idempotency
Dimitri Savineau [Thu, 21 Jan 2021 22:42:33 +0000 (17:42 -0500)]
cephadm-adopt: use radosgw modules for idempotency

When rerunning the cephadm-adopt.yml playbook the radosgw realm,
zonegroup and zone tasks will fail because the task isn't
idempotent.
Using the radosgw ansible modules solves that problem.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotox: test cephadm-adopt.yml playbook idempotency
Dimitri Savineau [Thu, 21 Jan 2021 16:27:10 +0000 (11:27 -0500)]
tox: test cephadm-adopt.yml playbook idempotency

Rerun the cephadm-adopt.yml playbook a second time for idempotency
purpose.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: make cephadm_adopt module idempotent
Dimitri Savineau [Thu, 21 Jan 2021 16:19:44 +0000 (11:19 -0500)]
library: make cephadm_adopt module idempotent

Rerunning the cephadm_adopt module on an already adopted daemon will
fail because the cephadm adopt command isn't idempotent.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: make the playbook idempotent
Dimitri Savineau [Wed, 20 Jan 2021 22:39:44 +0000 (17:39 -0500)]
cephadm-adopt: make the playbook idempotent

If the cephadm-adopt.yml fails during the first execution and some
daemons have already been adopted by cephadm then we can't rerun
the playbook because the old container won't exist anymore.

Error: no container with name or ID ceph-mon-xxx found: no such container

If the daemons are adopted then the old systemd unit doesn't exist anymore
so any call to that unit with systemd will fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1918424
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-mon: add ExecStartPre docker stop to systemd
Dimitri Savineau [Wed, 13 Jan 2021 15:17:56 +0000 (10:17 -0500)]
ceph-mon: add ExecStartPre docker stop to systemd

We already do that in the other systemd templates (mgr, mds, etc..)
and would present to add workaround in other orchestration tool.
This change is for containerized deployment only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1882724
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agorgw: avoid useless call to ceph-rgw
Guillaume Abrioux [Wed, 27 Jan 2021 17:36:13 +0000 (18:36 +0100)]
rgw: avoid useless call to ceph-rgw

since `ceph-rgw` may be called from `ceph-handler` in some contexts we
should avoid rerunning it unnecessarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agofs2bs: remove a legacy fact
Guillaume Abrioux [Mon, 11 Jan 2021 15:55:40 +0000 (16:55 +0100)]
fs2bs: remove a legacy fact

since cf7345f143148a6be2d71954f829a8f7fe11ab22, we don't need to set
this fact anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agorgw: multisite refact
Guillaume Abrioux [Thu, 14 Jan 2021 16:52:39 +0000 (17:52 +0100)]
rgw: multisite refact

Add the possibility to deploy rgw multisite configuration with a mix of
secondary and primary zones on a same rgw node.
Before that, on a same node, all instances were either primary
zones *OR* secondary.

Now you can define a rgw instance like following:

```
rgw_instances:
  - instance_name: 'rgw0'
    rgw_zonemaster: false
    rgw_zonesecondary: true
    rgw_zonegroupmaster: false
    rgw_realm: 'france'
    rgw_zonegroup: 'zonegroup-france'
    rgw_zone: paris-00
    radosgw_address: "{{ _radosgw_address }}"
    radosgw_frontend_port: 8080
    rgw_zone_user: jacques.chirac
    rgw_zone_user_display_name: "Jacques Chirac"
    system_access_key: P9Eb6S8XNyo4dtZZUUMy
    system_secret_key: qqHCUtfdNnpHq3PZRHW5un9l0bEBM812Uhow0XfB
    endpoint: http://192.168.101.12:8080
```

Basically it's now possible to define `rgw_zonemaster`,
`rgw_zonesecondary` and `rgw_zonegroupmaster` at the intsance
level instead of the whole node level.

Also, this commit adds an option `deploy_secondary_zones` (default True)
which can be set to `False` in order to explicitly ask the playbook to
not deploy secondary zones in case where the corresponding endpoint are
not deployed yet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1915478
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agolibrary: fix bug in radosgw_zone.py
Guillaume Abrioux [Sun, 17 Jan 2021 19:46:31 +0000 (20:46 +0100)]
library: fix bug in radosgw_zone.py

If for some reason `get_zonegroup()` returns a failure, we must handle
and make the module exit properly instead of failing with the following
python trace:

```
Traceback (most recent call last):
  File "./AnsiballZ_radosgw_zone.py", line 247, in <module>
    _ansiballz_main()
  File "./AnsiballZ_radosgw_zone.py", line 234, in _ansiballz_main
    exitcode = debug(sys.argv[1], zipped_mod, ANSIBALLZ_PARAMS)
  File "./AnsiballZ_radosgw_zone.py", line 202, in debug
    runpy.run_module(mod_name='ansible.modules.radosgw_zone', init_globals=None, run_name='__main__', alter_sys=True)
  File "/usr/lib64/python3.6/runpy.py", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 467, in <module>
    main()
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 463, in main
    run_module()
  File "/home/vagrant/.ansible/tmp/ansible-tmp-1610728441.41-685133-218973990589597/debug_dir/ansible/modules/radosgw_zone.py", line 425, in run_module
    zonegroup = json.loads(_out)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agolibrary: move `fatal()` into ca_common.py
Guillaume Abrioux [Sun, 17 Jan 2021 19:17:30 +0000 (20:17 +0100)]
library: move `fatal()` into ca_common.py

this function is defined in various modules, let's move it to
`ca_common.py`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agografana: update container tag to 6.7.4
Dimitri Savineau [Tue, 19 Jan 2021 19:24:22 +0000 (14:24 -0500)]
grafana: update container tag to 6.7.4

This update the grafana container tag to 6.7.4.
The RHCS version is now based on the RHCS 5 container image which is
also based on 6.7.4.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-defaults: change default ceph container tag
Dimitri Savineau [Fri, 22 Jan 2021 15:01:10 +0000 (10:01 -0500)]
ceph-defaults: change default ceph container tag

The "latest" ceph container tag references the latest stable release
(octopus at the moment). "latest" is an alias on "latest-octopus".
On the devel branch we should use "latest-master" tag instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm-adopt: add grafana group conversion v6.0.0alpha6
Dimitri Savineau [Mon, 18 Jan 2021 17:15:04 +0000 (12:15 -0500)]
cephadm-adopt: add grafana group conversion

The grafana group conversion task wasn't present in the cephadm-adopt.yml
playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1917530
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agomon: fix cephx disabled deployment
Guillaume Abrioux [Wed, 13 Jan 2021 10:07:50 +0000 (11:07 +0100)]
mon: fix cephx disabled deployment

Due to missing condition on `cephx` variable, cephx disabled deployments
are broken.
This commit fixes this.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910151
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agomodule_utils: don't add newline to the data
Dimitri Savineau [Thu, 14 Jan 2021 02:11:39 +0000 (21:11 -0500)]
module_utils: don't add newline to the data

When executing a command via the run_command method and passing some
data with stdin then the default behavior is to add append a newline.
This breaks the value of password used by our modules.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests/library: remove duplicate parameter
Dimitri Savineau [Thu, 14 Jan 2021 02:32:33 +0000 (21:32 -0500)]
tests/library: remove duplicate parameter

Remove duplicate fake_params parameter as it's already defined later
as a dict (instead of an empty list).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agofs2bs: skip migration when a mix of fs and bs is detected v6.0.0alpha5
Guillaume Abrioux [Tue, 15 Dec 2020 16:49:32 +0000 (17:49 +0100)]
fs2bs: skip migration when a mix of fs and bs is detected

Since the default of `osd_objectstore` has changed as of 3.2, some
deployments might have a mix of filestore and bluestore OSDs on a same
node. In some specific cases, there's a possibility that a filestore OSD
shares a journal/db device with a bluestore OSD. We shouldn't try to
redeploy in this context because ceph-volume will complain. (either
because in lvm batch you can't pass partition or about gpt header).
The safest option is to skip the migration on the node when such a mix
is detected or force all osds including those already using bluestore
(option `force_filestore_to_bluestore=True` has to be passed as an extra var).
If all OSDs are using filestore, then they will be migrated to
bluestore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875777
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agovalidate: check virtual_ips variable
Guillaume Abrioux [Mon, 11 Jan 2021 09:06:08 +0000 (10:06 +0100)]
validate: check virtual_ips variable

This commit checks the length of `virtual_ips` doesn't exceed the length
of `groups[rgwloadbalancer_group_name]`.
It also ensure this variable is defined when
`groups[rgwloadbalancer_group_name]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-rgw-loadbalancer: Fix keepalived master selection
Benoît Knecht [Mon, 1 Jun 2020 15:09:18 +0000 (17:09 +0200)]
ceph-rgw-loadbalancer: Fix keepalived master selection

While 2ca33641 fixed a bug in the way the `keepalived.conf.j2` template matched
hostnames to set the VRRP `MASTER`/`BACKUP` states, it also introduced a
regression in the case where `virtual_ips` is a list of more than one IP
address.

The previous behavior would result in each host in the `rgwloadbalancers` group
to be `MASTER` for one of the `virtual_ips`, but the new behavior caused the
first host to be `MASTER` for all the IP address in `virtual_ips`.

This commit restores the original behavior.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
4 years agoswitch2container: fix mon quorum check
Guillaume Abrioux [Fri, 18 Dec 2020 09:33:44 +0000 (10:33 +0100)]
switch2container: fix mon quorum check

The current check makes no sense because it checks any of other monitor
than the one being played (either a previous one already converted or a
next that isn't yet converted) is present on the quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1909011
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoRevert "tests: temporarily use david's flavor"
Guillaume Abrioux [Fri, 8 Jan 2021 05:28:01 +0000 (06:28 +0100)]
Revert "tests: temporarily use david's flavor"

This reverts commit ed9f0641eee3da314a66f5ed7c2722ac973481d3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-osd: replace sysctl command task by slurp
Dimitri Savineau [Fri, 8 Jan 2021 22:31:03 +0000 (17:31 -0500)]
ceph-osd: replace sysctl command task by slurp

Instead of using the command module for retrieving a sysctl value then
we can use the slurp module and read the value directly from /proc.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: temporarily use david's flavor
Guillaume Abrioux [Thu, 7 Jan 2021 13:17:01 +0000 (14:17 +0100)]
tests: temporarily use david's flavor

master nfs ganesha builds are broken, let's use this flavor instead for
now.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agodashboard: configure passwords via stdin
Guillaume Abrioux [Thu, 7 Jan 2021 11:40:18 +0000 (12:40 +0100)]
dashboard: configure passwords via stdin

Due to recent changes in ceph, the few dashboard passwors
must be passed via `-i`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agolibrary: refact ceph_dashboard_user
Guillaume Abrioux [Wed, 6 Jan 2021 13:07:38 +0000 (14:07 +0100)]
library: refact ceph_dashboard_user

refact this module due to recent changes in ceph pacific.
The password must be passed with `-i` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agospec: add module_utils directory v6.0.0alpha4
Dimitri Savineau [Wed, 6 Jan 2021 19:22:04 +0000 (14:22 -0500)]
spec: add module_utils directory

Since d7fd468 the ansible modules are using the common code shared in
the module_utils directory but that one wasn't added to the spec file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1910214
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoPath for ceph config missing in crash template
Mike Currin [Thu, 24 Dec 2020 07:25:24 +0000 (09:25 +0200)]
Path for ceph config missing in crash template

The path where ceph.conf is located (/etc/ceph) missing in the Docker container bind mounts, this throws errors

Signed-off-by: Mike Currin <currin@gmail.com>
4 years agorgw: support switching from single-site to multisite
Guillaume Abrioux [Wed, 6 Jan 2021 09:37:12 +0000 (10:37 +0100)]
rgw: support switching from single-site to multisite

When collocating rgw with either a mon, mgr or osd, switching from
single site to a multisite rgw setup failed because of the handlers
triggered between the ansible play of the collocated daemon and the play
of the rgw. Since the multisite changes are not yet applied the handlers
fail.
The idea here is to ensure we run the multisite configuration from the
ceph-handler role before the restart happens, this way it won't complain
because of non existing multisite configuration.

(Note: this is also valid when simply changing a multisite configuration)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1888630
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agolibrary: remove containerized parameter from cv
Dimitri Savineau [Fri, 18 Dec 2020 15:25:54 +0000 (10:25 -0500)]
library: remove containerized parameter from cv

The ceph-volume module relies on environment variables to determine if
the command should be executed within a container or not.
The containerized parameter isn't used anymore and we can remove it.

Fixes: #6153
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: add no_log to {access,secret}_key params
Dimitri Savineau [Tue, 5 Jan 2021 22:24:35 +0000 (17:24 -0500)]
library: add no_log to {access,secret}_key params

This sets the no_log parameter on both the access and the secret
RGW key variables.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocephadm: remove loop on host add tasks
Dimitri Savineau [Wed, 9 Dec 2020 22:05:25 +0000 (17:05 -0500)]
cephadm: remove loop on host add tasks

Instead of iterate over the host list for adding the node/label to the
host orchestrator configuration then we can do it parallelly.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: add missing `target_size_ratio` parameter support in ceph_pool module
Fabien Brachere [Wed, 16 Dec 2020 06:33:36 +0000 (07:33 +0100)]
library: add missing `target_size_ratio` parameter support in ceph_pool module

When creating a new pool, target_size_ratio was ignored by ansible module ceph_pool.py.
target_size_ratio is now used when pg_autoscale_mode is on.
Tests added to library tests.
This adds too the use in the role ceph-rgw.

Signed-off-by: Fabien Brachere <fabien.brachere@celeste.fr>
4 years agoceph-config: fix ceph-volume lvm batch report
Dimitri Savineau [Tue, 15 Dec 2020 18:52:43 +0000 (13:52 -0500)]
ceph-config: fix ceph-volume lvm batch report

Since the major ceph-volume lvm batch refactoring, the report value
is different.
Before the refact, the report was a dict with the OSDs list to be created
under the "osds" key.
After the refact, the report is a list of dict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoRevert "mergify: add configuration for 4.2z1 branch"
Guillaume Abrioux [Tue, 15 Dec 2020 16:25:40 +0000 (17:25 +0100)]
Revert "mergify: add configuration for 4.2z1 branch"

This reverts commit fb7dced59869cc8cd5d0f7920f86ea5d836b5ec7.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agomergify: add configuration for 4.2z1 branch
Guillaume Abrioux [Tue, 15 Dec 2020 08:55:23 +0000 (09:55 +0100)]
mergify: add configuration for 4.2z1 branch

So we get backports against 4.2z1 branch (downstream related) automatically
created by mergify

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: force box removal
Guillaume Abrioux [Mon, 14 Dec 2020 09:03:33 +0000 (10:03 +0100)]
tests: force box removal

This avoids interactive mode for `vagrant box remove`.
This can happen for some reason when there's leftover from previous
deployment (VMs not destroyed as expected)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agotests: rgw_multisite playbook test refactor
Guillaume Abrioux [Fri, 11 Dec 2020 13:36:00 +0000 (14:36 +0100)]
tests: rgw_multisite playbook test refactor

Currently we create an object from the primary sites but we try to read
that object still from the master which doesn't make sense, we should
try to read it from a secondary site.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agofix broken ceph-fetch-keys role
Karl-Heinz Preuß [Thu, 26 Nov 2020 09:48:49 +0000 (10:48 +0100)]
fix broken ceph-fetch-keys role

set fetch_directory variable in default/main.yml instead of using the
defaults jinja filter in tasks/main.yml.

Fixes: #6072
Signed-off-by: Karl-Heinz Preuß <karl-heinz.preuss@cms.hu-berlin.de>
4 years agoceph-osd: use global crush_device_class in lvm_volumes v6.0.0alpha3
Seena Fallah [Sat, 5 Dec 2020 21:55:46 +0000 (01:25 +0330)]
ceph-osd: use global crush_device_class in lvm_volumes

Use global crush_device_class variable if it's not set per OSD

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
4 years agoRevert "config: Always use osd_memory_target if set"
Dimitri Savineau [Wed, 9 Dec 2020 19:02:45 +0000 (14:02 -0500)]
Revert "config: Always use osd_memory_target if set"

This reverts commit 4d1fdd2b05d55f8028fb5593d41fa61dbddd7095.

This breaks the backward compatibility with previous osd_memory_target
calculation and we could have a value lower than the minimum value allowed
(896M) which causes some ceph commands to fail (like ceph assimilate-conf).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agomonitoring: use config_template module for config
Dimitri Savineau [Fri, 11 Dec 2020 18:07:04 +0000 (13:07 -0500)]
monitoring: use config_template module for config

The alertmanager, grafana and prometheus configuration file are
generated with the template module which doesn't allow for using
config overrides.
Instead we could use the config_template plugin action and add a
new variable for overrides (one for each component).

With this patch, one should be able to add configuration to
prometheus with the following:

---
alertmanager_conf_overrides:
  global:
    smtp_smarthost: 'localhost:25'
...

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902999
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-rgw: add cluster parameter on ceph_ec_profile
Dimitri Savineau [Fri, 11 Dec 2020 19:18:51 +0000 (14:18 -0500)]
ceph-rgw: add cluster parameter on ceph_ec_profile

81233dd introduced a regression with the ceph_ec_profile module call in
the ceph-rgw role due the missing cluster module parameter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-facts: fix grafana group conversion
Dimitri Savineau [Mon, 7 Dec 2020 17:11:54 +0000 (12:11 -0500)]
ceph-facts: fix grafana group conversion

The conversion fact task was only executed when the grafana_server_group_name
variable was explicitly set in the user configuration. If an user was using
the default value then the conversion wasn't executed.

This also adds back the default grafana_server_group_name value in case user
was using the default value and to avoid undefined variable error.

Instead of hardcoding the "monitoring" group name then we can reuse the
monitoring_group_name variable.

There's no need to override the monitoring_group_name variable, it's either
using the default value or the one defined by the user.

Finally removing the delegate_to statement on the add_host task since it's
always executed on the ansible controller.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1903732
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests: remove pyyaml workaround on OSD nodes
Dimitri Savineau [Wed, 9 Dec 2020 16:08:11 +0000 (11:08 -0500)]
tests: remove pyyaml workaround on OSD nodes

Since [1] has been resolved then we don't need to apply this workaround
anymore.

[1] https://tracker.ceph.com/issues/46759

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agopurge-container-cluster: always prune force
Dimitri Savineau [Wed, 9 Dec 2020 15:38:42 +0000 (10:38 -0500)]
purge-container-cluster: always prune force

Since podman 2.x, there's now a confirmation when running podman
container prune command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agotests/vagrant: update box version to CentOS 8.3
Dimitri Savineau [Mon, 7 Dec 2020 20:48:38 +0000 (15:48 -0500)]
tests/vagrant: update box version to CentOS 8.3

This updates the CentOS libvirt box version to 8.3

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agorhcs: drop fetch_directory override
Dimitri Savineau [Wed, 2 Dec 2020 22:45:18 +0000 (17:45 -0500)]
rhcs: drop fetch_directory override

Since the fetch_directory variable has been dropped then we don't need
the override in rhcs file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoceph-mon: No become during gen mon initial keyring
Jukka Nousiainen [Wed, 2 Dec 2020 09:07:25 +0000 (11:07 +0200)]
ceph-mon: No become during gen mon initial keyring

Since the backing generate_secret() just hands out urandom output,
running as privileged doesn't seem to be required. It's not
desireable to provide sudo in some Ansible runner environments.

Signed-off-by: Jukka Nousiainen <jukka.nousiainen@csc.fi>
4 years agolibrary: add cephadm_adopt module
Dimitri Savineau [Mon, 30 Nov 2020 19:32:54 +0000 (14:32 -0500)]
library: add cephadm_adopt module

This adds cephadm_adopt ansible module for replacing the command module
usage with the cephadm adopt command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agocommon: do not use pipefail when not needed
Guillaume Abrioux [Mon, 30 Nov 2020 16:08:18 +0000 (17:08 +0100)]
common: do not use pipefail when not needed

Let's discard the ansible lint error 306 and add a "# noqa 306" on tasks
where we don't need `set -o pipefail`

Fixes: #6090
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoconsume ceph_volume module when possible
Dimitri Savineau [Wed, 18 Nov 2020 22:20:45 +0000 (17:20 -0500)]
consume ceph_volume module when possible

We should always use the ceph_volume ansible module when possible.
This patch replace the ceph-volume inventory and lvm {list,zap} commands
called via the command/shell modules by the corresponding call with the
ceph_volume module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: add ceph_crush_rule module
Dimitri Savineau [Mon, 9 Nov 2020 17:16:41 +0000 (12:16 -0500)]
library: add ceph_crush_rule module

This adds ceph_crush_rule ansible module for replacing the command
module usage with the ceph osd crush rule commands.
This module can manage both erasure and replicated crush rules.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoosd: add tag on 'wait for all osd to be up' task
Guillaume Abrioux [Thu, 26 Nov 2020 08:53:04 +0000 (09:53 +0100)]
osd: add tag on 'wait for all osd to be up' task

This allows skipping this task if really desired.
Use it carefully. Use it at your own risk.

Fixes: #6073
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-client: use group_by instead of add_host
Dimitri Savineau [Mon, 30 Nov 2020 17:15:48 +0000 (12:15 -0500)]
ceph-client: use group_by instead of add_host

Instead of iterate over all client nodes with a loop sequentially, we
can use the group_by ansible buildin.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: add cephadm_bootstrap module
Dimitri Savineau [Fri, 16 Oct 2020 00:42:00 +0000 (20:42 -0400)]
library: add cephadm_bootstrap module

This adds cephadm_bootstrap ansible module for replacing the command module
usage with the cephadm bootstrap command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agolibrary: add ceph_osd_flag module
Dimitri Savineau [Tue, 3 Nov 2020 21:44:58 +0000 (16:44 -0500)]
library: add ceph_osd_flag module

This adds ceph_osd_flag ansible module for replacing the command module
usage with the ceph osd set/unset commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
4 years agoiscsigw: remove `--cap-add=all` from `podman run` cmd
Guillaume Abrioux [Mon, 30 Nov 2020 13:55:16 +0000 (14:55 +0100)]
iscsigw: remove `--cap-add=all` from `podman run` cmd

As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive
options.

```
Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agocontainer: remove `--ignore` from `podman rm` command
Guillaume Abrioux [Mon, 30 Nov 2020 13:52:47 +0000 (14:52 +0100)]
container: remove `--ignore` from `podman rm` command

As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>