]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
4 years agogroup_vars: remove useless files
Dimitri Savineau [Tue, 14 Jan 2020 19:08:17 +0000 (14:08 -0500)]
group_vars: remove useless files

Delete legacy files that aren't used anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e790b0851d3d2a4b98d75d7a5ae4006aebd56e0e)

4 years agocommon: drop `fetch_directory` feature
Guillaume Abrioux [Tue, 6 Oct 2020 05:53:06 +0000 (07:53 +0200)]
common: drop `fetch_directory` feature

This commit drops the `fetch_directory` feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cc9666c09c616cc743b4f2f05d2b52cfd6c32cb)

4 years agoceph-config: ceph.conf rendering refactor
Guillaume Abrioux [Mon, 5 Oct 2020 15:41:20 +0000 (17:41 +0200)]
ceph-config: ceph.conf rendering refactor

This commit cleans up the `main.yml` task file of `ceph-config`.
It drops the local ceph.conf generation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 900c0f44925ec0c6c1acb16433044ac40717e00e)

4 years agotests: rgw_multisite playbook test refactor
Guillaume Abrioux [Fri, 11 Dec 2020 13:36:00 +0000 (14:36 +0100)]
tests: rgw_multisite playbook test refactor

Currently we create an object from the primary sites but we try to read
that object still from the master which doesn't make sense, we should
try to read it from a secondary site.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e2ea403d5ef938a3ea12276004eaa71f9919a4a3)

4 years agomergify: add mergify configuration
Guillaume Abrioux [Tue, 15 Dec 2020 09:03:33 +0000 (10:03 +0100)]
mergify: add mergify configuration

This adds mergify configuration file on `stable-4.0` branch so we can
get backports automatically created by mergify.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agomon: refact initial keyring generation 4.2rc v4.0.41
Guillaume Abrioux [Tue, 24 Nov 2020 10:33:46 +0000 (11:33 +0100)]
mon: refact initial keyring generation

adding monitor is no longer possible because we generate a new mon
keyring each time the playbook is run.

Fixes: #5864
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 970c6a4ee6923588adb81d8c49185ff8e340d52e)

4 years agoceph_key: set state as optional
Dimitri Savineau [Fri, 11 Sep 2020 13:34:05 +0000 (09:34 -0400)]
ceph_key: set state as optional

Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit abb4023d762305c368facd3fab5a5b7e3a839d66)

4 years agoceph_key: support using different keyring
Guillaume Abrioux [Sat, 3 Oct 2020 04:56:06 +0000 (06:56 +0200)]
ceph_key: support using different keyring

Currently the `ceph_key` module doesn't support using a different
keyring than `client.admin`.
This commit adds the possibility to use a different keyring.

Usage:
```
      ceph_key:
        name: "client.rgw.myrgw-node.rgw123"
        cluster: "ceph"
        user: "client.bootstrap-rgw"
        user_key: /var/lib/ceph/bootstrap-rgw/ceph.keyring
        dest: "/var/lib/ceph/radosgw/ceph-rgw.myrgw-node.rgw123/keyring"
        caps:
          osd: 'allow rwx'
          mon: 'allow rw'
          import_key: False
        owner: "ceph"
        group: "ceph"
        mode: "0400"
```

Where:
`user` corresponds to `-n (--name)`
`user_key` corresponds to `-k (--keyring)`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 12e6260266dec04b4b2d25f3508aa7149fd16714)

4 years agolibrary: Fix new-style modules check mode
Benoît Knecht [Tue, 1 Sep 2020 11:06:57 +0000 (13:06 +0200)]
library: Fix new-style modules check mode

Running the `ceph_crush.py`, `ceph_key.py` or `ceph_volume.py` modules in check
mode resulted in the following error:

```
New-style module did not handle its own exit
```

This was due to the fact that they simply returned a `dict` in that case,
instead of calling `module.exit_json()`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 85dd4058145436e86a12ad9f015f5228189437d5)

4 years agoceph_key: refact the code and minor fixes
Guillaume Abrioux [Tue, 4 Aug 2020 11:53:24 +0000 (13:53 +0200)]
ceph_key: refact the code and minor fixes

This commit refactors the code to remove a duplicate condition and it
makes the `state: absent` code idempotent

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13e2311cbe78b7b51930a3a1210629bf036a20c5)

4 years agoRevert "library: Fix new-style modules check mode"
Guillaume Abrioux [Fri, 27 Nov 2020 07:53:55 +0000 (08:53 +0100)]
Revert "library: Fix new-style modules check mode"

This reverts commit bff21234303b9086dd8a352b52f2f47c5f3d2251.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoRevert "ceph_key: support using different keyring"
Guillaume Abrioux [Fri, 27 Nov 2020 07:53:48 +0000 (08:53 +0100)]
Revert "ceph_key: support using different keyring"

This reverts commit 74eb7cbecbeff1e96d261d69246d04d183783241.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoiscsigw: remove `--cap-add=all` from `podman run` cmd
Guillaume Abrioux [Mon, 30 Nov 2020 13:55:16 +0000 (14:55 +0100)]
iscsigw: remove `--cap-add=all` from `podman run` cmd

As of podman `2.0.5`, `--cap-add` and `--privileged` are exclusive
options.

```
Nov 30 13:56:30 magna089 podman[171677]: Error: invalid config provided: CapAdd and privileged are mutually exclusive options
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1902149
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d40dd764e004f9765e5d4e12507cdf3c707a3271)

4 years agocontainer: remove `--ignore` from `podman rm` command
Guillaume Abrioux [Mon, 30 Nov 2020 13:52:47 +0000 (14:52 +0100)]
container: remove `--ignore` from `podman rm` command

As of podman 2.0.5, `--ignore` param conflicts with `--storage`.
```
Nov 30 13:53:10 magna089 podman[164443]: Error: --storage conflicts with --volumes, --all, --latest, --ignore and --cidfile
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68b124ba89e0e4e7c4845b6dd1ce98be8e074d4)

4 years agocommon: add a default value for ceph_directories_mode
Guillaume Abrioux [Tue, 21 Jan 2020 14:30:16 +0000 (15:30 +0100)]
common: add a default value for ceph_directories_mode

Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 483adb5d790ea74db01154b7dacbbf2bef030acc)

4 years agoosd: ensure /var/lib/ceph/osd/{cluster}-{id} is present
Guillaume Abrioux [Tue, 17 Nov 2020 09:45:14 +0000 (10:45 +0100)]
osd: ensure /var/lib/ceph/osd/{cluster}-{id} is present

This commit ensures that the `/var/lib/ceph/osd/{{ cluster }}-{{ osd_id }}` is
present before starting OSDs.

This is needed specificly when redeploying an OSD in case of OS upgrade
failure.
Since ceph data are still present on its devices then the node can be
redeployed, however those directories aren't present since they are
initially created by ceph-volume. We could recreate them manually but
for better user experience we can ask ceph-ansible to recreate them.

NOTE:
this only works for OSDs that were deployed with ceph-volume.
ceph-disk deployed OSDs would have to get those directories recreated
manually.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898486
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 873fc8ec0ff12fa1d1b45c5400050f15d0417480)

4 years agoceph-facts: fix read osd pool default crush fact v4.0.40
Dimitri Savineau [Wed, 18 Nov 2020 15:43:57 +0000 (10:43 -0500)]
ceph-facts: fix read osd pool default crush fact

We don't need to use run_once on that task when having running monitors
otherwise the read task could be skip and the set task will fail.

The conditional check 'crush_rule_variable.rc == 0' failed. The error
was: error while evaluating conditional (crush_rule_variable.rc == 0):
'dict object' has no attribute 'rc'

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1898856
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e150df789edb966549ba2a8f2415a844ce612d46)

4 years agotests: use github workflow for pytest
Dimitri Savineau [Fri, 23 Oct 2020 14:24:50 +0000 (10:24 -0400)]
tests: use github workflow for pytest

Move the pytest testing from TravisCI to Github workflow.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3e79f0322a703003ab1af51104b69a1d2162951e)

4 years agotests: enforce pytest-rerunfailures version
Guillaume Abrioux [Wed, 1 Jul 2020 16:22:00 +0000 (18:22 +0200)]
tests: enforce pytest-rerunfailures version

This commit enforces the pytest-rerunfailures installed so it's <9.0

This is to avoid the following error:

```
ERROR: pytest-rerunfailures 9.0 has requirement pytest>=5.0, but you'll have pytest 4.6.11 which is incompatible.
```

latest version of pytest-rerunfailures isn't compatible with the version
of pytest we are using.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 19097026fbda71752a500119b0c99c1a9f8d523d)

4 years agocontainer: force rm --storage on ExecStartPre v4.0.39
Guillaume Abrioux [Thu, 12 Nov 2020 10:34:41 +0000 (11:34 +0100)]
container: force rm --storage on ExecStartPre

This is a workaround to avoid error like following:
```
Error: error creating container storage: the container name "ceph-mgr-magna022" is already in use by "4a5f674e113f837a0cc561dea5d2cd55d16ca159a647b7794ab06c4c276ef701"
```

that doesn't seem to be 100% reproducible but it shows up after a
reboot. The only workaround we came up with at the moment is to run
`podman rm --storage <container>` before starting it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1887716
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ba7824c55e4a5d6732208859293ac3f47bb54a2)

4 years agoswitch2container: chown symlink in mon/mgr plays
Dimitri Savineau [Mon, 16 Nov 2020 15:31:11 +0000 (10:31 -0500)]
switch2container: chown symlink in mon/mgr plays

fa2bb3a only fix the symlink owner/group issue in the OSD play. If the
OSDs are collocated with other services like MONs and MGRs then the
chown command will fail.

$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1896448
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 35ed9977aac9afbcad4f726a865891f0e84b4680)

4 years agoconfig: Always use osd_memory_target if set
Gaudenz Steinlin [Mon, 28 Oct 2019 09:41:26 +0000 (10:41 +0100)]
config: Always use osd_memory_target if set

The osd_memory_target variable was only used if it was higher than the
calculated value based on the number of OSDs. This is changed to always
use the value if it is set in the configuration. This allows this value
to be intentionally set lower so that it does not have to be changed
when more OSDs are added later.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 4d1fdd2b05d55f8028fb5593d41fa61dbddd7095)

4 years agoceph-facts: Fix osd_pool_default_crush_rule fact
Benoît Knecht [Wed, 7 Oct 2020 07:44:29 +0000 (09:44 +0200)]
ceph-facts: Fix osd_pool_default_crush_rule fact

The `osd_pool_default_crush_rule` is set based on `crush_rule_variable`, which
is the output of a `grep` command.

However, two consecutive tasks can set that variable, and if the second task is
skipped, it still overwrites the `crush_rule_variable`, leading the
`osd_pool_default_crush_rule` to be set to `ceph_osd_pool_default_crush_rule`
instead of the output of the first task.

This commit ensures that the fact is set right after the `crush_rule_variable`
is assigned, before it can be overwritten.

Closes #5912

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit c5f7343a2f696ab3bfef77e735eafdeae4e4883b)

4 years agomain: followup on pr 6012
Guillaume Abrioux [Thu, 12 Nov 2020 14:19:42 +0000 (15:19 +0100)]
main: followup on pr 6012

This tag can be set at the play level.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2fa17520c425117f87048a1d555c2e73c9e6cf6e)

4 years agoAdd ceph_client tag to execute or skip the playbook
Francesco Pantano [Mon, 9 Nov 2020 16:25:17 +0000 (17:25 +0100)]
Add ceph_client tag to execute or skip the playbook

There are some use cases where there's a need to skip the execution
of the ceph-ansible client role even though the client section of the
inventory isn't empty.
This can happen in contexts where the services are colocated or when
a all-in-one deployment is performed.
The purpose of this change is adding a 'ceph_client' tag to avoid
altering the ceph-ansible execution flow but at the same time be able
to include or exclude a set of tasks using this tag.

Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit fafd5f871a81f5e8cdba6e531e499a9678b2dcad)

4 years agoswitch2container: disable ceph-osd enabled-runtime v4.0.38
Dimitri Savineau [Mon, 19 Oct 2020 21:22:31 +0000 (17:22 -0400)]
switch2container: disable ceph-osd enabled-runtime

When deploying the ceph OSD via the packages then the ceph-osd@.service
unit is configured as enabled-runtime.
This means that each ceph-osd service will inherit from that state.
The enabled-runtime systemd state doesn't survive after a reboot.
For non containerized deployment the OSD are still starting after a
reboot because there's the ceph-volume@.service and/or ceph-osd.target
units that are doing the job.

$ systemctl list-unit-files|egrep '^ceph-(volume|osd)'|column -t
ceph-osd@.service     enabled-runtime
ceph-volume@.service  enabled
ceph-osd.target       enabled

When switching to containerized deployment we are stopping/disabling
ceph-osd@XX.servive, ceph-volume and ceph.target and then removing the
systemd unit files.
But the new systemd units for containerized ceph-osd service will still
inherit from ceph-osd@.service unit file.

As a consequence, if an OSD host is rebooting after the playbook execution
then the ceph-osd service won't come back because they aren't enabled at
boot.

This patch also adds a reboot and testinfra run after running the switch
to container playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881288
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa2bb3af86b48befd3901939d38eda20dff6f5e5)

4 years agodashboard: change dashboard_grafana_api_no_ssl_verify default value 6010/head
Guillaume Abrioux [Tue, 3 Nov 2020 15:32:17 +0000 (16:32 +0100)]
dashboard: change dashboard_grafana_api_no_ssl_verify default value

This sets the `dashboard_grafana_api_no_ssl_verify` default value
according to the length of `dashboard_crt` and `dashboard_key`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5cadfea42e8dd31e019568cdfe1b0f3d64f5dcc4)

4 years agodashboard: enable https by default
Guillaume Abrioux [Tue, 3 Nov 2020 12:49:59 +0000 (13:49 +0100)]
dashboard: enable https by default

see linked bz for details

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1889426
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 767d3c898e2d8f7dddb655fd98827d5da8b338e8)

4 years agoosd: Fix number of OSD calculation
Gaudenz Steinlin [Tue, 27 Aug 2019 13:15:35 +0000 (15:15 +0200)]
osd: Fix number of OSD calculation

If some OSDs are to be created and others already exist the calculation
only counted the to be created OSDs. This changes the calculation to
take all OSDs into account.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 15044da03052fcb4a3c45f344f41e06b0d418e4d)

4 years agorolling_update: use ceph health instead of ceph -s
Dimitri Savineau [Mon, 26 Oct 2020 23:35:06 +0000 (19:35 -0400)]
rolling_update: use ceph health instead of ceph -s

The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the cluster health, we're using the health structure in the
ceph status output.
To optimize this, we could use the ceph health command which contains
the same needed information.

$ ceph status -f json | wc -c
2001
$ ceph health -f json | wc -c
46

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit acddf4fb679f5f5251b3414793680042ee3be394)

4 years agorgw/rbdmirror: use service dump instead of ceph -s
Dimitri Savineau [Mon, 26 Oct 2020 21:49:47 +0000 (17:49 -0400)]
rgw/rbdmirror: use service dump instead of ceph -s

The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the rgw/rbdmirror services status, we're only using the
servicmap structure in the ceph status output.
To optimize this, we could use the ceph service dump command which contains
the same needed information.
This command returns less information and is slightly faster than the ceph
status command.

$ ceph status -f json | wc -c
2001
$ ceph service dump -f json | wc -c
1105
$ time ceph status -f json > /dev/null

real 0m0.557s
user 0m0.516s
sys 0m0.040s
$ time ceph service dump -f json > /dev/null

real 0m0.454s
user 0m0.434s
sys 0m0.020s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3f9081931f8a369b075060083cdb225e3477f99a)

4 years agomonitor: use quorum_status instead of ceph status
Dimitri Savineau [Mon, 26 Oct 2020 21:33:45 +0000 (17:33 -0400)]
monitor: use quorum_status instead of ceph status

The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the quorum status, we're only using the quorum_names
structure in the ceph status output.
To optimize this, we could use the ceph quorum_status command which contains
the same needed information.
This command returns less information.

$ ceph status -f json  | wc -c
2001
$ ceph quorum_status -f json  | wc -c
957
$ time ceph status -f json > /dev/null

real 0m0.577s
user 0m0.538s
sys 0m0.029s
$ time ceph quorum_status -f json > /dev/null

real 0m0.544s
user 0m0.527s
sys 0m0.016s

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 88f91d8c12169e08fc299dbd2fcaecc9d42dedca)

4 years agoosds: use pg stat command instead of ceph status
Dimitri Savineau [Mon, 26 Oct 2020 15:23:01 +0000 (11:23 -0400)]
osds: use pg stat command instead of ceph status

The ceph status command returns a lot of information stored in variables
and/or facts which could consume resources for nothing.
When checking the pgs state, we're using the pgmap structure in the ceph
status output.
To optimize this, we could use the ceph pg stat command which contains
the same needed information.
This command returns less information (only about pgs) and is slightly
faster than the ceph status command.

$ ceph status -f json | wc -c
2000
$ ceph pg stat -f json | wc -c
240
$ time ceph status -f json > /dev/null

real 0m0.529s
user 0m0.503s
sys 0m0.024s
$ time ceph pg stat -f json > /dev/null

real 0m0.426s
user 0m0.409s
sys 0m0.016s

The data returned by the ceph status is even bigger when using the
nautilus release.

$ ceph status -f json | wc -c
35005
$ ceph pg stat -f json | wc -c
240

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ee505885908ac2ae15bf201a638359faaf78d251)

4 years agoosds: use ceph osd stat instead of ceph status
wangxiaotong [Sat, 24 Oct 2020 13:59:17 +0000 (21:59 +0800)]
osds: use ceph osd stat instead of ceph status

Improve the checked way of the OSD created checking process.
This replaces the ceph status command by the ceph osd stat command.
The osdmap structure isn't needed anymore.

$ ceph status -f json | wc -c
2001
$ ceph osd stat -f json | wc -c
132
$ time ceph status -f json > /dev/null

real    0m0.563s
user    0m0.526s
sys     0m0.036s
$ time ceph osd stat -f json > /dev/null

real 0m0.457s
user 0m0.411s
sys 0m0.045s

Signed-off-by: wangxiaotong <wangxiaotong@fiberhome.com>
(cherry picked from commit b9cb0f12e9e79600f1a974dd88ba1ed1d833211f)

4 years agocommon: follow up on #5948 v4.0.37
Guillaume Abrioux [Mon, 2 Nov 2020 14:56:28 +0000 (15:56 +0100)]
common: follow up on #5948

In addition to f7e2b2c608eef4bbba47586f1e24d6ade1572758

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 371d854a5c03cfb30d27d5cdbaaad61f7f8d6c58)

4 years agoceph-mon: Don't set monitor directory mode recursively
Benoît Knecht [Wed, 28 Oct 2020 15:09:58 +0000 (16:09 +0100)]
ceph-mon: Don't set monitor directory mode recursively

After rolling updates performed with
`infrastructure-playbooks/rolling_updates.yml`, files located in
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` had mode 0755 (including
the keyring), making them world-readable.

This commit separates the task that configured permissions recursively on
`/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }}` into two separate tasks:

1. Set the ownership and mode of the directory itself;
2. Recursively set ownership in the directory, but don't modify the mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 0d76826bbb7b0b9303583c31147ebad9e5c420f9)

4 years agoopenstack: use ceph_keyring_permissions by default
Gaudenz Steinlin [Mon, 10 Aug 2020 09:52:56 +0000 (11:52 +0200)]
openstack: use ceph_keyring_permissions by default

Otherwise this task fails if no permission is set on the item.
Previously the code omited the mode parameter if it was not set, but
this was lost with commit ab370b6ad823e551cfc324fd9c264633a34b72b5.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 79ff79c422e88e5ec848bec880ef01a87ceeb298)

4 years agopodman: force log driver to journald
Dimitri Savineau [Thu, 22 Oct 2020 14:59:15 +0000 (10:59 -0400)]
podman: force log driver to journald

Since we've changed to podman configuration using the detach mode and
systemd type to forking then the container logs aren't present in the
journald anymore.
The default conmon log driver is using k8s-file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16cd183b9cb827156ab83cba6c9b85d341d681be)

4 years agoceph-handler: fix curl ipv6 command with rgw
Dimitri Savineau [Thu, 22 Oct 2020 19:05:12 +0000 (15:05 -0400)]
ceph-handler: fix curl ipv6 command with rgw

When using the curl command with ipv6 address and brackets then we need
to use the -g option otherwise the command fails.

$ curl http://[fdc2:328:750b:6983::6]:8080
curl: (3) [globbing] error: bad range specification after pos 9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cdb7b09cd7631eb1af0c70360c0f9959526bc795)

4 years agoiscsi: fix ownership on iscsi-gateway.cfg
Guillaume Abrioux [Wed, 21 Oct 2020 12:26:57 +0000 (14:26 +0200)]
iscsi: fix ownership on iscsi-gateway.cfg

This file is currently deployed with '0644' ownership making this file
readable by any user on the system.
Since it contains sensitive information it should be readable by the
owner only.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1890119
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a822f773002a010ebedddcc2c8cd8f5a03dc786a)

4 years agoceph-osd: Fix check mode for start osds tasks
Benoît Knecht [Mon, 19 Oct 2020 09:39:06 +0000 (11:39 +0200)]
ceph-osd: Fix check mode for start osds tasks

Correctly set `osd_ids_non_container.stdout_lines` to an empty list if it's
undefined (i.e. in check mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b0023cb77ce79ab4669783f16a5c295d54ce247)

4 years agoceph-mon: Fix check mode for deploy monitor tasks
Benoît Knecht [Mon, 19 Oct 2020 09:23:59 +0000 (11:23 +0200)]
ceph-mon: Fix check mode for deploy monitor tasks

Skip the `get initial keyring when it already exists` task when both commands
whose `stdout` output it requires have been skipped (e.g. when running in check
mode).

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8f436ab5d80c924d8841215307c17e38a70fb4bd)

4 years agocrash: refact caps definition
Guillaume Abrioux [Mon, 19 Oct 2020 14:57:53 +0000 (16:57 +0200)]
crash: refact caps definition

there is no need to use `{{ }}` syntax here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a8bd947c7dbd62b8acb9ac4fc1a2aad08a06546f)

4 years agoceph-volume: refresh lvm metadata cache
Guillaume Abrioux [Mon, 19 Oct 2020 08:22:21 +0000 (10:22 +0200)]
ceph-volume: refresh lvm metadata cache

When running rhel8 containers on a rhel7 host, after zapping an OSD
there's a discrepancy with the lvmetad cache that needs to be refreshed.
Otherwise, the host still sees the lv and can makes the user confused.
If user tries to redeploy an OSD, it will fail because the LV isn't
present and need to be recreated.

ie:

```
 stderr: lsblk: ceph-block-8/block-8: not a block device
 stderr: blkid: error: ceph-block-8/block-8: No such file or directory
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
usage: ceph-volume lvm prepare [-h] --data DATA [--data-size DATA_SIZE]
                               [--data-slots DATA_SLOTS] [--filestore]
                               [--journal JOURNAL]
                               [--journal-size JOURNAL_SIZE] [--bluestore]
                               [--block.db BLOCK_DB]
                               [--block.db-size BLOCK_DB_SIZE]
                               [--block.db-slots BLOCK_DB_SLOTS]
                               [--block.wal BLOCK_WAL]
                               [--block.wal-size BLOCK_WAL_SIZE]
                               [--block.wal-slots BLOCK_WAL_SLOTS]
                               [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
                               [--cluster-fsid CLUSTER_FSID]
                               [--crush-device-class CRUSH_DEVICE_CLASS]
                               [--dmcrypt] [--no-systemd]
ceph-volume lvm prepare: error: Unable to proceed with non-existing device: ceph-block-8/block-8
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886534
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bb106045ee10e08c157134b6e00ab846ce26e1f)

4 years agoceph-crash: Only deploy key to targeted hosts
Gaudenz Steinlin [Mon, 10 Aug 2020 09:38:47 +0000 (11:38 +0200)]
ceph-crash: Only deploy key to targeted hosts

The current task installs the ceph-crash key to "most" hosts via
"delegate_to". This key is only used by the ceph-crash daemon and should
just be installed on all hosts targeted by this role. There is no need
for using a delegated task.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
(cherry picked from commit 68cc93fb18d516a04e288418811787355fb0582e)

4 years agoflake8: run the workflow conditionally
Dimitri Savineau [Fri, 2 Oct 2020 16:14:36 +0000 (12:14 -0400)]
flake8: run the workflow conditionally

We don't need to run flake8 on ansible modules and their tests if we
don't have any modifitions.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 00b7ee27df59fb0d5a537f6c0ad11c910695126d)

4 years agoceph-osd: start osd after systemd overrides v4.0.36
Guillaume Abrioux [Wed, 14 Oct 2020 06:52:02 +0000 (08:52 +0200)]
ceph-osd: start osd after systemd overrides

The service should be started after the ceph-osd systemd overrides has
been added, otherwise, the latter isn't considered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 59d0f0199243de40bde714d1a9019b1715c57dbf)

4 years agoceph-osd: don't start the OSD services twice
Dimitri Savineau [Wed, 14 Oct 2020 00:43:53 +0000 (20:43 -0400)]
ceph-osd: don't start the OSD services twice

Using the + operation on two lists doesn't filter out the duplicate
keys.
Currently each OSDs is started (via systemd) twice.
Instead we could use the union filter.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4eaa65c36256189b352c88c1058e888550adbd0f)

4 years agohandler: refact check_socket_non_container
Guillaume Abrioux [Tue, 6 Oct 2020 12:58:46 +0000 (14:58 +0200)]
handler: refact check_socket_non_container

the `stat --printf=%n` returns something like following:

```
ok: [osd0] => changed=false
  cmd: |-
    stat --printf=%n /var/run/ceph/ceph-osd*.asok
  delta: '0:00:00.009388'
  end: '2020-10-06 06:18:28.109500'
  failed_when_result: false
  rc: 0
  start: '2020-10-06 06:18:28.100112'
  stderr: ''
  stderr_lines: <omitted>
  stdout: /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  stdout_lines: <omitted>
```

it makes the next task "check if the ceph osd socket is in-use" grep
like this:

```
ok: [osd0] => changed=false
  cmd:
  - grep
  - -q
  - /var/run/ceph/ceph-osd.2.asok/var/run/ceph/ceph-osd.5.asok
  - /proc/net/unix
```

which will obviously fail because this path never exists. It makes the
OSD handler broken.

Let's use `find` module instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 46d4d97da9c6078a6c5ff60a39db4b4072fb902b)

4 years agolibrary: Fix new-style modules check mode
Benoît Knecht [Tue, 1 Sep 2020 11:06:57 +0000 (13:06 +0200)]
library: Fix new-style modules check mode

Running the `ceph_crush.py`, `ceph_key.py` or `ceph_volume.py` modules in check
mode resulted in the following error:

```
New-style module did not handle its own exit
```

This was due to the fact that they simply returned a `dict` in that case,
instead of calling `module.exit_json()`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 85dd4058145436e86a12ad9f015f5228189437d5)

4 years agoFix Ansible check mode for site.yml.sample playbook
Benoît Knecht [Tue, 1 Sep 2020 09:24:59 +0000 (11:24 +0200)]
Fix Ansible check mode for site.yml.sample playbook

Make sure the `site.yml.sample` playbook can be run in check mode by skipping
tasks that try to read the output of commands that have been skipped.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 54ba38e35ea67c1c342b008be675103e120982d0)

4 years agotests: change cephfs pool size
Guillaume Abrioux [Tue, 6 Oct 2020 08:55:37 +0000 (10:55 +0200)]
tests: change cephfs pool size

`all_daemons` scenario can't handle pools with `size: 3` because we have
1 osd node in root=HDD and two nodes in root=default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5713ea5d51868855df368e285faff484342ef04)

4 years agoceph_key: support using different keyring
Guillaume Abrioux [Sat, 3 Oct 2020 04:56:06 +0000 (06:56 +0200)]
ceph_key: support using different keyring

Currently the `ceph_key` module doesn't support using a different
keyring than `client.admin`.
This commit adds the possibility to use a different keyring.

Usage:
```
      ceph_key:
        name: "client.rgw.myrgw-node.rgw123"
        cluster: "ceph"
        user: "client.bootstrap-rgw"
        user_key: /var/lib/ceph/bootstrap-rgw/ceph.keyring
        dest: "/var/lib/ceph/radosgw/ceph-rgw.myrgw-node.rgw123/keyring"
        caps:
          osd: 'allow rwx'
          mon: 'allow rw'
          import_key: False
        owner: "ceph"
        group: "ceph"
        mode: "0400"
```

Where:
`user` corresponds to `-n (--name)`
`user_key` corresponds to `-k (--keyring)`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 12e6260266dec04b4b2d25f3508aa7149fd16714)

4 years agorgw: fix multi instances scaleout in baremetal
Guillaume Abrioux [Wed, 23 Sep 2020 15:47:20 +0000 (17:47 +0200)]
rgw: fix multi instances scaleout in baremetal

When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook in baremetal deployments.

When ceph-osd notifies handlers, it means rgw handlers are triggered
too. The issue with this is that they are triggered before the role
ceph-rgw is run.
In the case a scaleout operation is expected on `radosgw_num_instances`
it causes an issue because keyrings haven't been created yet so the new
instances won't start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881313
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a802fa2810e50e87f61e3a64c27f8826ba6aa250)

4 years agotests: reboot and test idempotency on collocation
Guillaume Abrioux [Wed, 23 Sep 2020 15:58:39 +0000 (17:58 +0200)]
tests: reboot and test idempotency on collocation

test reboot and idempotency on collocation scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f83f798206566b714adbc55e2543cbd9529897fa)

4 years agoflake8: fix pep8 syntax on tests/functional/tests/
Guillaume Abrioux [Sun, 4 Oct 2020 08:32:45 +0000 (10:32 +0200)]
flake8: fix pep8 syntax on tests/functional/tests/

tests/conftest.py and tests present in tests/functional/tests/ has been
missed from previous commit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8596f1d52c27d22d0e8b7052c9b56c0c350e7fc7)

# Conflicts:
# .github/workflows/flake8.yml

4 years agoflake8: fix all tests/library/*.py files
Guillaume Abrioux [Thu, 1 Oct 2020 20:28:17 +0000 (22:28 +0200)]
flake8: fix all tests/library/*.py files

This commit modifies all *.py files in ./tests/library/ so flake8
passes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e49a5241f007083e7ec8618115738b3d3d8873cc)

4 years agotests: refact flake8 workflow
Guillaume Abrioux [Thu, 1 Oct 2020 19:59:53 +0000 (21:59 +0200)]
tests: refact flake8 workflow

drop ricardochaves/python-lint action and use `run` steps instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2d3432cad0d8256356a9b64bf9bb2827948f40c)

(cherry picked from commit 8f6e0b2a186dbb2a25bc7eec93d4dde3b94e3b15)

4 years agotests: add github workflows
Guillaume Abrioux [Mon, 7 Sep 2020 07:55:41 +0000 (09:55 +0200)]
tests: add github workflows

Add github workflow. Especially for flake8 for now.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1ee626a1b3c277c3b33e1c30ab65416a776c7b96)
(cherry picked from commit ed6ae6815d665069c9b6f7d790390dceb0f037c9)

4 years agolibrary: flake8 ceph-ansible modules
Wong Hoi Sing Edison [Sun, 6 Sep 2020 02:17:02 +0000 (10:17 +0800)]
library: flake8 ceph-ansible modules

This commit ensure all ceph-ansible modules pass flake8 properly.

Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 268a39ca0e8698dff9faec1b558d1d99006215aa)
(cherry picked from commit 32a2f04cbc288d79e9a228593be854469a0ecefb)

4 years agolibrary: remove legacy file
Guillaume Abrioux [Thu, 1 Oct 2020 09:25:19 +0000 (11:25 +0200)]
library: remove legacy file

This file is a leftover and should have been removed when we dropped the
validate module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8603cba9abb6adcc4c0bd8d8fab1a5772584a159)

4 years agofs2bs: support `osd_auto_discovery` scenario v4.0.35
Guillaume Abrioux [Wed, 23 Sep 2020 14:21:21 +0000 (16:21 +0200)]
fs2bs: support `osd_auto_discovery` scenario

This commit adds the `osd_auto_discovery` scenario support in the
filestore-to-bluestore playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1881523
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8b1eeef18a4ae953cf568e750c5cf1e3e96e0b78)

4 years agoceph-facts: add get default crush rule from running monitor
Seena Fallah [Sun, 27 Sep 2020 17:11:07 +0000 (20:41 +0330)]
ceph-facts: add get default crush rule from running monitor

In case of deploying new monitor node to an existing cluster,
osd_pool_default_crush_rule should be taken from running monitor because
ceph-osd role won't be run and the new monitor will have different
osd_pool_default_crush_role from other monitors.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ff9f4d138f988d908b5f5583e0c1fcf5dd72e36d)

4 years agorgw multisite: check connection for realm endpoint
Ali Maredia [Thu, 17 Sep 2020 04:19:45 +0000 (00:19 -0400)]
rgw multisite: check connection for realm endpoint

This commit adds connection checks before realm pulls
Curls are performed on the endpoint being pulled from
the mons and the rgws

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731158
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 902575369c73dfdb0e94d898bbabb9ebd5b12c93)

4 years agoceph-handler: set handler on xxx_stat result
Dimitri Savineau [Fri, 25 Sep 2020 18:27:33 +0000 (14:27 -0400)]
ceph-handler: set handler on xxx_stat result

In non containerized deployment we check if the service is running
via the socket file presence.
This is done via the xxx_socket_stat variable that check the file
socket in the /var/run/ceph/ directory.
In some scenarios, we could have the socket file still present in
that directory but not used by any process.
That's why we have the xxx_stat variable which clean those leftovers.

The problem here is that we're set the variable for the handlers status
(like handler_mon_status) based on xxx_socket_stat instead of xxx_stat.
That means we will trigger the handlers if there's an old socket file
present on the system without any process associated.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1866834
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 733596582d0788a52795bc40b1a5cd94ddef0446)

4 years agoceph-facts: check for mon socket in its own host
Seena Fallah [Sun, 27 Sep 2020 17:04:14 +0000 (20:34 +0330)]
ceph-facts: check for mon socket in its own host

delegate to its own host after checking mon socket to findout if mon socket is in-use or not.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 69f7e353823fffd9dff505679f9c9dbeb4fd810d)

4 years agomds: support enabling pg autoscaler on rerun
Guillaume Abrioux [Mon, 28 Sep 2020 15:31:08 +0000 (17:31 +0200)]
mds: support enabling pg autoscaler on rerun

This commit add the pg autoscaler enablement support on ceph-ansible
rerun.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1836431
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoceph-config: remove ceph_release from ceph.conf.j2
Dimitri Savineau [Fri, 25 Sep 2020 14:44:08 +0000 (10:44 -0400)]
ceph-config: remove ceph_release from ceph.conf.j2

We don't use ceph_release variable in the ceph.conf jinja template.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 62bd41f0d4e20d73793d8e8d6a8db722aac146c2)

4 years agoansible.cfg: remove cfg file in infrastructure-playbooks
Guillaume Abrioux [Thu, 24 Sep 2020 02:20:34 +0000 (04:20 +0200)]
ansible.cfg: remove cfg file in infrastructure-playbooks

There's no need ot have a copy of this file in infrastructure-playbooks
directory.
playbooks in that directory can be run from the root dir of
ceph-ansible.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f906caa6dada4a88bfaca1ab9a42ed63358e8ae3)

4 years agoansible.cfg: set force_valid_group_names param
Guillaume Abrioux [Thu, 24 Sep 2020 01:51:56 +0000 (03:51 +0200)]
ansible.cfg: set force_valid_group_names param

As of 2.10, group names containing a dash are invalid.
However, setting this option makes it still possible to use a dash in
group names and prevent this warning to show up.
It might need to be definitely addressed in a future ansible release.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6938ed13021201cc1cfcd3f3087849fed3157a69)

4 years agolibrary/ceph_key: set no_log on secret
Dimitri Savineau [Wed, 23 Sep 2020 16:00:30 +0000 (12:00 -0400)]
library/ceph_key: set no_log on secret

We don't need to show this information during the module execution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a3f4e2b4d11d3185f4064be5ab2969f0df894ff2)

4 years agoRemove libjemalloc1 installation task
Dmitriy Rabotyagov [Wed, 23 Sep 2020 13:06:33 +0000 (16:06 +0300)]
Remove libjemalloc1 installation task

libjemalloc1 package is not required neither for ganesha dependency nor
for the package build process. So this task can be simply dropped.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
(cherry picked from commit 297532ca411dbdc6ec96258875058b323008abfe)

5 years agoREADME-MULTISITE: Fix syntax issues from markdownlint
Benoît Knecht [Thu, 3 Sep 2020 07:01:16 +0000 (09:01 +0200)]
README-MULTISITE: Fix syntax issues from markdownlint

This commit makes the following changes:

- Remove trailing whitespace;
- Use consistent header levels;
- Fix code blocks;
- Remove hard tabs;
- Fix ordered lists;
- Fix bare URLs;
- Use markdown list of sections.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 2c244425ecef8752e434fb7920ab64d8534dd59c)

5 years agodocs: update URLs to point to the RTD links
Kefu Chai [Thu, 24 Sep 2020 16:46:30 +0000 (00:46 +0800)]
docs: update URLs to point to the RTD links

Fixes #5798
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit f3a78371d9e1336595f5ce8ae7932a5f97004bbe)

5 years agofacts: refact `ceph_uid` fact v4.0.34
Guillaume Abrioux [Wed, 8 Jul 2020 13:49:47 +0000 (15:49 +0200)]
facts: refact `ceph_uid` fact

There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bcc673f66c22364766beb4b5ebb971bd3f517693)

5 years agoceph-facts: move facts to defaults value
Dimitri Savineau [Thu, 16 Jan 2020 14:38:08 +0000 (09:38 -0500)]
ceph-facts: move facts to defaults value

There's no need to define a variable via a fact if we can do it via a
default value. Using a fact could be interesseting to override the
default value on some condition.

- ceph_uid could be set to 167 by default because it's only different on
non containerized deployment on Debian/Ubuntu.
- rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770
by default install of null as we are doing in the facts.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1875058
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7f997e623a7171fa6f00c43cd5b60f3882f8ed04)

5 years agocontainer: quote registry password
Dimitri Savineau [Fri, 18 Sep 2020 14:03:13 +0000 (10:03 -0400)]
container: quote registry password

When using a quote in the registry password then we have the following
error:

The error was: ValueError: No closing quotation

To fix this we need to use the quote filter.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1880252

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6dcfdf17d43635fcd0dc658c199702945a1228dd)

5 years agofacts: fix 'set_fact rgw_instances with rgw multisite'
Guillaume Abrioux [Fri, 18 Sep 2020 07:09:57 +0000 (09:09 +0200)]
facts: fix 'set_fact rgw_instances with rgw multisite'

the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff19c1d851ebf0bd7bd8a744367e8893dc6103a8)

5 years agoceph-infra: include iscsi nodes for logrotate
Dimitri Savineau [Thu, 17 Sep 2020 18:11:22 +0000 (14:11 -0400)]
ceph-infra: include iscsi nodes for logrotate

The iscsi nodes aren't included in the logrotate condition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 85643edfe382d34a77fabbd97a4d937b8b74d4e6)

5 years agoinfra: support log rotation for tcmu-runner v4.0.33
Guillaume Abrioux [Tue, 15 Sep 2020 07:48:31 +0000 (09:48 +0200)]
infra: support log rotation for tcmu-runner

This commit adds the log rotation support for tcmu-runner.

ceph-container related PR: ceph/ceph-container#1726

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f576c02ff7b15c207b77b3f206a3213184b89889)

5 years agocontainer: add optional http(s) proxy option
Dimitri Savineau [Tue, 15 Sep 2020 00:13:13 +0000 (20:13 -0400)]
container: add optional http(s) proxy option

When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
  1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
  2/ podman uses the environment variables so we need to add them to
the login/pull tasks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bda3581294c8f29eda598522c331a4c009243884)

5 years agoceph-prometheus: update pool stat counter
Dimitri Savineau [Tue, 15 Sep 2020 13:30:42 +0000 (09:30 -0400)]
ceph-prometheus: update pool stat counter

Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.

Closes: #5781
[1] https://github.com/ceph/ceph/commit/71fe9149

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e54b924eaf05a7223ec7525657d14e8892ce8957)

5 years agoswitch2container: chown symlink for devices
Dimitri Savineau [Tue, 15 Sep 2020 13:59:06 +0000 (09:59 -0400)]
switch2container: chown symlink for devices

If the OSD directory is using symlinks for referencing devices (like
block, db, wal for bluestore and journal for filestore) then the chown
command could fail to change the owner:group on some system.

$ ls -hl /var/lib/ceph/osd/ceph-0/
total 28K
lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-45113532-95ca-471b-bd75-51de46f1339c/osd-data-570a1aee-60c0-44c9-8036-ffed7d67a4e6
-rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid
-rw------- 1 ceph ceph 37 Sep 15 01:53 fsid
-rw------- 1 ceph ceph 55 Sep 15 01:53 keyring
-rw------- 1 ceph ceph  6 Sep 15 01:53 ready
-rw------- 1 ceph ceph  3 Sep 15 02:00 require_osd_release
-rw------- 1 ceph ceph 10 Sep 15 01:53 type
-rw------- 1 ceph ceph  2 Sep 15 01:53 whoami
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
$ find /var/lib/ceph/osd/ceph-0 -not -user 167
/var/lib/ceph/osd/ceph-0/block

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da4280e243f50114e1ae6455a46360012feb8f3d)

5 years agoswitch2container: remove deb systemd units
Dimitri Savineau [Tue, 15 Sep 2020 13:46:30 +0000 (09:46 -0400)]
switch2container: remove deb systemd units

When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c1af69a7e79a5909903490028f7ae13e519c98e0)

5 years agoansible: bump to ansible 2.9
Dimitri Savineau [Mon, 24 Aug 2020 19:50:16 +0000 (15:50 -0400)]
ansible: bump to ansible 2.9

Prior this commit we were supporting both ansible 2.8 and 2.9.
Let's drop 2.8 now.

Closes: #5459
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1879178
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: remove potential socket leftover
Guillaume Abrioux [Fri, 11 Sep 2020 15:30:33 +0000 (17:30 +0200)]
purge: remove potential socket leftover

This commit ensure we remove any socket left by ceph and the
`ceph-osd-run.sh` script.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e91e0f3e24da0492b6f5dd2bc808215b5066ddc)

5 years agotests: do not run node_exporter test on clients
Guillaume Abrioux [Mon, 14 Sep 2020 13:14:24 +0000 (15:14 +0200)]
tests: do not run node_exporter test on clients

We need to skip these tests on client nodes since we don't deploy
node_exporter on them anymore

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5650a6d7d0a0e2b2fa0ceb080e7d582dc9ceb447)

5 years agonode-exporter: exclude client nodes
Dimitri Savineau [Fri, 11 Sep 2020 15:25:57 +0000 (11:25 -0400)]
node-exporter: exclude client nodes

We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b105549ed858eb034d97f5fcad4890e17ee2ebfd)

5 years agoRevert "Make 'disable ssl for dashboard task' idempotent." v4.0.32
Guillaume Abrioux [Fri, 11 Sep 2020 08:23:08 +0000 (10:23 +0200)]
Revert "Make 'disable ssl for dashboard task' idempotent."

This reverts commit f607857f2a58b2ed14faf49f2b10d056a7f96b30.

> That commit [1] introduced a regression in the dashboard configuration
> because the ceph config get mgr xxxx command doesn't work with
> nautilus.
> In that release the get operation needs an entity.

> [1] f607857

Signed-off-by: Dimitri Savineau dsavinea@redhat.com
5 years agofacts: refact and optimize memory consumption
Guillaume Abrioux [Mon, 17 Aug 2020 08:31:11 +0000 (10:31 +0200)]
facts: refact and optimize memory consumption

there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0fe193d8ec48414447aa4a7d50b1a9859c71295)

5 years agoconfig: only add related rgw section
Guillaume Abrioux [Thu, 23 Jul 2020 19:12:46 +0000 (21:12 +0200)]
config: only add related rgw section

there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a581a6e6007812cdad935e4f65909b4306046b2)

5 years agoceph-iscsi: remove python rtslib shaman repository
Dimitri Savineau [Tue, 7 Jan 2020 15:18:28 +0000 (10:18 -0500)]
ceph-iscsi: remove python rtslib shaman repository

The rtslib python library is now available in the distribution so we
shouldn't have to use the shaman repository

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 254ab54f8038c7af2f730dc5abc213490aa60b71)

5 years agoAdd CentOS 8 support for rpm deployment
Dimitri Savineau [Wed, 2 Sep 2020 15:52:34 +0000 (11:52 -0400)]
Add CentOS 8 support for rpm deployment

We were only supporting CentOS 8 for containerized deployment.
Since Nautilus 14.2.10 we now have el8 rpm packages so we should be
able to deploy a nautilus ceph cluster with el8.
Note that the nfs-ganesha isn't supported because there's no el8 rpm
packages for nfs-ganesha V2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoEnable HAProxy backend checks for Ceph RGW
Niko Smeds [Thu, 5 Mar 2020 22:24:56 +0000 (14:24 -0800)]
Enable HAProxy backend checks for Ceph RGW

Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
(cherry picked from commit a951c1a3f0a34e086964f52b0bbf7a8d89481aad)

5 years agodashboard: refact admin user creation task
Guillaume Abrioux [Wed, 19 Aug 2020 21:33:51 +0000 (23:33 +0200)]
dashboard: refact admin user creation task

this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 54d3e9650f77466ae4207502e0a2da638d82954d)

5 years agoMake 'disable ssl for dashboard task' idempotent.
George Shuklin [Mon, 13 Jul 2020 10:40:17 +0000 (13:40 +0300)]
Make 'disable ssl for dashboard task' idempotent.

This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit 73d4bb6bd6b560de7f2b3042bdc7d17c901e815a)

5 years agoComment out ceph_custom_key
Rafał Wądołowski [Thu, 20 Aug 2020 08:13:43 +0000 (10:13 +0200)]
Comment out ceph_custom_key

Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
(cherry picked from commit 55cd6e83e475ab9ad8d684b88da5325d869e9d1c)

5 years agoceph_custom_repo: define apt and rpm key for custom repo
Anthony Rusdi [Sun, 25 Aug 2019 18:47:32 +0000 (01:47 +0700)]
ceph_custom_repo: define apt and rpm key for custom repo

This commit also remove the notify on new added debian repo,
force update_cache to yes and define sample ceph_custom_key vars.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
(cherry picked from commit 4c592066b7c1caaec700af347fc9edf2109c1659)

5 years agoceph-rgw: allow specifying crush rule on pool
Dimitri Savineau [Mon, 17 Aug 2020 17:55:47 +0000 (13:55 -0400)]
ceph-rgw: allow specifying crush rule on pool

We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cb8f0237e1fe7b890d20d47b5d023a6c618cbd4c)

5 years agocontainer: run engine/common roles on first client
Dimitri Savineau [Thu, 10 Sep 2020 15:27:37 +0000 (11:27 -0400)]
container: run engine/common roles on first client

We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ecbdc6ede7e26d053f87acde99986fddb0fe070)