]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agocontainer: run engine/common roles on first client
Dimitri Savineau [Thu, 10 Sep 2020 15:27:37 +0000 (11:27 -0400)]
container: run engine/common roles on first client

We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: only get fsid when monitor are present
Dimitri Savineau [Thu, 10 Sep 2020 14:12:13 +0000 (10:12 -0400)]
ceph-facts: only get fsid when monitor are present

When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: use ceph_pool module
Dimitri Savineau [Wed, 9 Sep 2020 22:38:33 +0000 (18:38 -0400)]
ceph-rgw: use ceph_pool module

Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.

[1] bddcb439ce1b46735946e9fd5d147bc6604bcda3

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: use grafana from quay.io
Dimitri Savineau [Tue, 8 Sep 2020 14:36:20 +0000 (10:36 -0400)]
tests: use grafana from quay.io

This changes the grafana container image regitry from docker.io to
quay.io to avoid rate limit.
This also adds the missing container image values for docker2podman
and podman scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: clean legacy
Guillaume Abrioux [Tue, 8 Sep 2020 08:00:06 +0000 (10:00 +0200)]
tests: clean legacy

clean some legacies since quay.ceph.io migration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoFix hosts field in rolling_update playbook when mds are processed
Francesco Pantano [Tue, 8 Sep 2020 11:16:33 +0000 (13:16 +0200)]
Fix hosts field in rolling_update playbook when mds are processed

In the OSP context, during the rolling update the playbook fails
with the following error:

'''
ERROR! The field 'hosts' has an invalid value, which includes an
undefined variable. The error was: list object has no element 0
'''

This PR just change the hosts field providing a valid mons group
value.

Closes: https://bugzilla.redhat.com/1876803
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
5 years agoAdd --cluster option on ceph require-osd-release command
Francesco Pantano [Mon, 7 Sep 2020 12:02:06 +0000 (14:02 +0200)]
Add --cluster option on ceph require-osd-release command

On DCN environments, or when multiple ceph cluster are configured,
we need to specify the cluster name before running the command or
the rolling_update playbook will fail during minor updates.

Closes: https://bugzilla.redhat.com/1876447
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
5 years agotests: disable nfs-ganesha testing
Guillaume Abrioux [Mon, 7 Sep 2020 07:02:03 +0000 (09:02 +0200)]
tests: disable nfs-ganesha testing

This commit diables nfs-ganesha testing on master for non-containerized
deployment because the dev repos are broken at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: migrate to quay.ceph.io registry
Guillaume Abrioux [Fri, 4 Sep 2020 14:50:26 +0000 (16:50 +0200)]
tests: migrate to quay.ceph.io registry

in order to avoid docker.io rate limiting

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoFix typo shrink osd file name in day-2 docs
Dai Dang Van [Wed, 26 Aug 2020 10:02:34 +0000 (17:02 +0700)]
Fix typo shrink osd file name in day-2 docs

Signed-off-by: Dai Dang Van <daikk115@gmail.com>
5 years agotests: reenable ceph-iscsi testing
Dimitri Savineau [Thu, 27 Aug 2020 13:57:35 +0000 (09:57 -0400)]
tests: reenable ceph-iscsi testing

This re-adds the ceph-iscsi testing for both non containerized and
containerized deployment since the rados connection error on ceph
dev has been fixed [1].

[1] https://tracker.ceph.com/issues/47002

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoEnable HAProxy backend checks for Ceph RGW
Niko Smeds [Thu, 5 Mar 2020 22:24:56 +0000 (14:24 -0800)]
Enable HAProxy backend checks for Ceph RGW

Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.

Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.

Signed-off-by: Niko Smeds nikosmeds@gmail.com
5 years agorolling_update: remove 'ignore_errors'
Guillaume Abrioux [Fri, 21 Aug 2020 08:51:22 +0000 (10:51 +0200)]
rolling_update: remove 'ignore_errors'

There's no need to use `ignore_errors: true` on these tasks.

Using a loop on the task stopping mon daemons allows us to avoid
duplicating this task, the `ignore_errors` isn't needed here because it
won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn)

Using the right condition on the task starting the mgr daemon allows us
to avoid using an `ignore_errors: true` as well.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_key: refact the code and minor fixes
Guillaume Abrioux [Tue, 4 Aug 2020 11:53:24 +0000 (13:53 +0200)]
ceph_key: refact the code and minor fixes

This commit refactors the code to remove a duplicate condition and it
makes the `state: absent` code idempotent

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add more coverage for test_ceph_key
Guillaume Abrioux [Wed, 15 Jul 2020 15:28:51 +0000 (17:28 +0200)]
tests: add more coverage for test_ceph_key

This commit adds more coverage regarding the testing of ceph_key module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: refact admin user creation task
Guillaume Abrioux [Wed, 19 Aug 2020 21:33:51 +0000 (23:33 +0200)]
dashboard: refact admin user creation task

this commit splits this task in order to avoid using a `shell` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: refact and optimize memory consumption
Guillaume Abrioux [Mon, 17 Aug 2020 08:31:11 +0000 (10:31 +0200)]
facts: refact and optimize memory consumption

there's no need to run this task on all nodes.
This uses too much memory for nothing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: reenable nfs-ganesha testing
Dimitri Savineau [Mon, 6 Jul 2020 15:04:13 +0000 (11:04 -0400)]
tests: reenable nfs-ganesha testing

This re-adds the nfs-ganesha testing in non containerized deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoMake 'disable ssl for dashboard task' idempotent.
George Shuklin [Mon, 13 Jul 2020 10:40:17 +0000 (13:40 +0300)]
Make 'disable ssl for dashboard task' idempotent.

This should reduce number of 'changed' tasks during convergence test.

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
5 years agoComment out ceph_custom_key
Rafał Wądołowski [Thu, 20 Aug 2020 08:13:43 +0000 (10:13 +0200)]
Comment out ceph_custom_key

Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.

Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
5 years agoiscsigw: add retry/until
Guillaume Abrioux [Wed, 19 Aug 2020 16:11:02 +0000 (18:11 +0200)]
iscsigw: add retry/until

In order to avoid failures that could be fixed by simply
retrying.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: move erasure pool testing in lvm_osds
Guillaume Abrioux [Tue, 11 Aug 2020 13:26:16 +0000 (15:26 +0200)]
tests: move erasure pool testing in lvm_osds

This commit moves the erasure pool creation testing from `all_daemons`
to `lvm_osds` so we can decrease the number of osd nodes we spawn so the
OVH Jenkins slaves aren't less overwhelmed when a `all_daemons` based
scenario is being tested.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoSet default permission for prometheus config files
John Fulton [Tue, 18 Aug 2020 14:41:42 +0000 (10:41 -0400)]
Set default permission for prometheus config files

Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.

Closes: https://github.com/ceph/ceph-ansible/issues/5677
Signed-off-by: John Fulton <fulton@redhat.com>
5 years agoshrink-mds: use mds_to_kill_hostname instead
Guillaume Abrioux [Tue, 18 Aug 2020 18:35:17 +0000 (20:35 +0200)]
shrink-mds: use mds_to_kill_hostname instead

When using fqdn in inventory host file, this task will fail because the
mds is registered with its shortname.

It means we must use `mds_to_kill_hostname` in this task.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: only install logrotate on right nodes
Guillaume Abrioux [Thu, 13 Aug 2020 18:37:11 +0000 (20:37 +0200)]
infra: only install logrotate on right nodes

For intsance, there is no need to install logrotate on clients nodes.

This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotravis: enforce ansible-lint 4.2.0
Guillaume Abrioux [Tue, 18 Aug 2020 13:37:08 +0000 (15:37 +0200)]
travis: enforce ansible-lint 4.2.0

Let's pin to 4.2.0

(because of ansible/ansible-lint/issues/966)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: remove hosts-ubuntu inventories
Guillaume Abrioux [Tue, 18 Aug 2020 07:48:54 +0000 (09:48 +0200)]
tests: remove hosts-ubuntu inventories

Since we've dropped ubuntu testing, we don't need these inventories
anymore. Let's remove this leftover.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: disable iscsigw testing (container)
Guillaume Abrioux [Tue, 18 Aug 2020 07:45:53 +0000 (09:45 +0200)]
tests: disable iscsigw testing (container)

Temporarily disable iscsigw testing for containerized deployments
because it's broken upstream on ceph@master.
non-containerized deployments use stable build for iscsigw to get around
this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-rgw: allow specifying crush rule on pool
Dimitri Savineau [Mon, 17 Aug 2020 17:55:47 +0000 (13:55 -0400)]
ceph-rgw: allow specifying crush rule on pool

We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainer: don't install the engine on all clients
Dimitri Savineau [Mon, 17 Aug 2020 18:56:17 +0000 (14:56 -0400)]
container: don't install the engine on all clients

We only need the container engine to be installed on the first clients
node in order to execute the pools/keys operation. We already do the
same worflow with the ceph-container-common role which pull the ceph
container image.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: allow rgws to be concurrently with or without multisite
Ali Maredia [Thu, 4 Jun 2020 21:00:16 +0000 (21:00 +0000)]
rgw: allow rgws to be concurrently with or without multisite

Allows rgws in a ceph cluster to be run with
multisite and without multisite at the same time.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agopurge-cluster: use sysfs method for unmapping rbd devices
Guillaume Abrioux [Tue, 4 Aug 2020 15:29:41 +0000 (17:29 +0200)]
purge-cluster: use sysfs method for unmapping rbd devices

This way we keep consistency with purge-container-cluster.yml playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: add missing tag
Guillaume Abrioux [Thu, 13 Aug 2020 13:29:28 +0000 (15:29 +0200)]
infra: add missing tag

This commit adds the missing `with_pkg` tag on the logrotate
installation task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: test iscsigw against stable
Guillaume Abrioux [Wed, 12 Aug 2020 19:05:57 +0000 (21:05 +0200)]
tests: test iscsigw against stable

Since it is broken at the moment with dev repos, let's test against
stable builds so the CI is unlocked.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: import ceph-defaults in purge osd play
Guillaume Abrioux [Thu, 6 Aug 2020 07:46:12 +0000 (09:46 +0200)]
purge: import ceph-defaults in purge osd play

Otherwise, `ceph_volume_debug` variable is undefined

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: add log rotation support (containers)
Guillaume Abrioux [Tue, 4 Aug 2020 23:47:04 +0000 (01:47 +0200)]
infra: add log rotation support (containers)

This commit adds the log rotation support via logrotate in containerized
deployments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1848388
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: don't enable debug log on ceph-volume calls by default
Guillaume Abrioux [Wed, 5 Aug 2020 16:02:48 +0000 (18:02 +0200)]
common: don't enable debug log on ceph-volume calls by default

ceph-volume can generate large logs at some point.

debug logs by definition should be enabled only when debugging.

Let's make it customizable with a variable which is set to `False` by
default.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorgw: support 1+ rgw instance in `radosgw_frontend_port`
raul [Mon, 3 Aug 2020 10:58:50 +0000 (12:58 +0200)]
rgw: support 1+ rgw instance in `radosgw_frontend_port`

Change the radosgw_frontend_port to take in account more than 1 RGW instance,
in it's original form `radosgw_frontend_port: radosgw_frontend_port | int`,
it configured the 8080 port to all instances, with the following modification
`radosgw_frontend_port: radosgw_frontend_port | int + item|int` we increase in
1 the port count.

Co-authored-by: Daniel Parkes <dparkes@redhat.com>
Signed-off-by: raul <rmahique@redhat.com>
5 years agonfs: do not copy rgw keyring when `nfs_obj_gw` is true
Guillaume Abrioux [Fri, 7 Aug 2020 08:12:50 +0000 (10:12 +0200)]
nfs: do not copy rgw keyring when `nfs_obj_gw` is true

This keyring shouldn't be copied when `nfs_obj_gw` is `True` if the
cluster doesn't contain a rgw node, which can be the case given we are
using `nfs_obj_gw` instead of `nfs_file_gw` (cephfs vs. object), the
deployment will fail trying to copy a key that doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotox: only wait 30sec for right jobs
Guillaume Abrioux [Thu, 6 Aug 2020 13:26:24 +0000 (15:26 +0200)]
tox: only wait 30sec for right jobs

There's no need to call `sleep 30` for other job than `all_daemons` and
`all_in_one`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge-cluster: check if rbdmap exists
Benoît Knecht [Fri, 31 Jul 2020 06:11:31 +0000 (08:11 +0200)]
purge-cluster: check if rbdmap exists

When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the
second time on the `ensure rbd devices are unmapped` task, because `rbdmap`
isn't installed anymore at that point.

This commit adds a check that ensures `rbdmap` is available, and skips the
`ensure rbd devices are unmapped` task if it isn't.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agopytest: register ceph_crash mark
Dimitri Savineau [Wed, 5 Aug 2020 19:03:49 +0000 (15:03 -0400)]
pytest: register ceph_crash mark

Otherwise we see some pytest warning.

PytestUnknownMarkWarning: Unknown pytest.mark.ceph_crash - is this a typo?
You can register custom marks to avoid this warning - for details,
see https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-cluster: replace shell by command in a task
Guillaume Abrioux [Tue, 4 Aug 2020 15:14:29 +0000 (17:14 +0200)]
purge-cluster: replace shell by command in a task

There is no need to use `shell` here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: various fixes
Benoît Knecht [Tue, 28 Jul 2020 11:47:26 +0000 (13:47 +0200)]
shrink-osd: various fixes

This handles missing /etc/ceph/osd, by ensuring we actually found files in
`/etc/ceph/osd` before trying to slurp their content.

This also add a missing `| default(False)` to avoid fowlloing error:

```
fatal: [ceph01]: FAILED! =>
  msg: |-
    The conditional check 'ceph_osd_data_json[item.2]['encrypted'] | bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] | bool): 'dict object' has no attribute 'encrypted'
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoRemove ceph-radosgw.target when switching to containerize daemons
Kevin Coakley [Mon, 3 Aug 2020 17:03:34 +0000 (10:03 -0700)]
Remove ceph-radosgw.target when switching to containerize daemons

The task "remove old systemd unit file" under "switching from
non-containerized to containerized ceph rgw" only removes
the ceph-radosgw@.service file. The task should also remove
the ceph-radosgw.target file, like the "remove old systemd unit
files" tasks for the mons, mgrs, osds, etc, in order to clean up
all of the unused systemd unit files.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
5 years agotests: change subnet in lvm_osds container scenario
Guillaume Abrioux [Tue, 4 Aug 2020 09:02:23 +0000 (11:02 +0200)]
tests: change subnet in lvm_osds container scenario

This commit changes the subnets in container-lvm_osds scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoRevert "tests: add more coverage for test_ceph_key"
Guillaume Abrioux [Tue, 4 Aug 2020 09:25:50 +0000 (11:25 +0200)]
Revert "tests: add more coverage for test_ceph_key"

This reverts commit 1e46264bc19161cdf2d2f4529fc6b42f0077af83.

5 years agoRevert "ceph_key: refact the code and minor fixes"
Guillaume Abrioux [Tue, 4 Aug 2020 09:25:41 +0000 (11:25 +0200)]
Revert "ceph_key: refact the code and minor fixes"

This reverts commit 9a950b8f0fe0e60fe658a518f8f4cf066edddf73.

5 years agoceph_key: refact the code and minor fixes
Guillaume Abrioux [Thu, 16 Jul 2020 13:57:14 +0000 (15:57 +0200)]
ceph_key: refact the code and minor fixes

wip

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add more coverage for test_ceph_key
Guillaume Abrioux [Wed, 15 Jul 2020 15:28:51 +0000 (17:28 +0200)]
tests: add more coverage for test_ceph_key

This commit adds more coverage regarding the testing of ceph_key module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoconfig: only add related rgw section
Guillaume Abrioux [Thu, 23 Jul 2020 19:12:46 +0000 (21:12 +0200)]
config: only add related rgw section

there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink_osd: remove osd data directory
Guillaume Abrioux [Wed, 22 Jul 2020 14:08:15 +0000 (16:08 +0200)]
shrink_osd: remove osd data directory

Otherwise it leaves an empty directory.
When shrinking and redeploying multiple OSDs you have no guarantee it
will reuse the same osd id.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotox: split shrink_osd scenario
Guillaume Abrioux [Wed, 22 Jul 2020 09:38:55 +0000 (11:38 +0200)]
tox: split shrink_osd scenario

Let's split this scenario with a dedicated tox ini file.

This is for testing in two ways:

1/ shrinking OSDs one by one
2/ shrinking multiple OSDs with a single call of the playbook

ceph-build related PR: ceph/ceph-build#1629

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: refact shrink_osd scenario
Guillaume Abrioux [Tue, 21 Jul 2020 07:27:10 +0000 (09:27 +0200)]
tests: refact shrink_osd scenario

This adds more coverage on the shrink_osd scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: allow remote TLS cert/key copy
Dimitri Savineau [Thu, 30 Jul 2020 16:04:18 +0000 (12:04 -0400)]
dashboard: allow remote TLS cert/key copy

When using TLS on the ceph dashboard or grafana services, we can provide
the TLS certificate and key.
Those files should be present on the ansible controller and they will be
copyied to the right node(s).
In some situation, the TLS certificate and key could be already present
on the target node and not on the ansible controller.
For this scenario, we just need to copy the files locally (on each remote
host).

This patch adds the dashboard_tls_external variable (with default to
false) to allow users to achieve this scenario when configuring this
variable to true.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860815
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: restart mds after the upgrade
Dimitri Savineau [Wed, 29 Jul 2020 13:44:15 +0000 (09:44 -0400)]
rolling_update: restart mds after the upgrade

In addition of 155e2a2, the active mds daemons isn't stop/start
correctly as opposed as the other services so that daemon doesn't come
back after the upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: install pyyaml on osd nodes
Dimitri Savineau [Wed, 29 Jul 2020 16:23:06 +0000 (12:23 -0400)]
tests: install pyyaml on osd nodes

Due to [1], ceph-volume has now a dependency on pyyaml but it's not
installed by default via the package dependency.
This patch only add the required package on non containerized
deployment and as temporary workaround for the CI.

[1] https://tracker.ceph.com/issues/46759

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: refact dashboard workflow
Dimitri Savineau [Fri, 24 Jul 2020 15:21:54 +0000 (11:21 -0400)]
rolling_update: refact dashboard workflow

The dashboard upgrade workflow should do the same process than the ceph
upgrade otherwise any systemd unit modification won't be apply on the
monitoring/dashboard stack.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: stop/start instead of restart
Dimitri Savineau [Tue, 21 Jul 2020 18:51:20 +0000 (14:51 -0400)]
rolling_update: stop/start instead of restart

During the daemon upgrade we're
  - stopping the service when it's not containerized
  - running the daemon role
  - start the service when it's not containerized
  - restart the service when it's containerized

This implementation has multiple issue.

1/ We don't use the same service workflow when using containers
or baremetal.

2/ The explicity daemon start isn't required since we'are already
doing this in the daemon role.

3/ Any non backward changes in the systemd unit template (for
containerized deployment) won't work due to the restart usage.

This patch refacts the rolling_update playbook by using the same service
stop task for both containerized and baremetal deployment at the start
of the upgrade play.
It removes the explicit service start task because it's already included
in the dedicated role.
The service restart tasks for containerized deployment are also
removed.

Finally, this adds the missing service stop task for ceph crash upgrade
workflow.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: remove mds_name fact
Dimitri Savineau [Tue, 21 Jul 2020 19:27:59 +0000 (15:27 -0400)]
ceph-facts: remove mds_name fact

The mds_name fact always gets the ansible_hostname value so we don't
need to have a dedicated fact for this and use the ansible_hostname fact
instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-handler: remove iscsigws restart scripts
Dimitri Savineau [Tue, 21 Jul 2020 19:22:26 +0000 (15:22 -0400)]
ceph-handler: remove iscsigws restart scripts

The iscsigws restart scripts for tcmu-runner and rbd-target-{api,gw}
services only call the systemctl restart command.
We don't really need to copy a shell script to do it when we can use
the ansible service module instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopodman: always remove container on start
Dimitri Savineau [Tue, 21 Jul 2020 13:32:50 +0000 (09:32 -0400)]
podman: always remove container on start

In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by d43769d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: remove ubuntu references
Guillaume Abrioux [Wed, 22 Jul 2020 14:29:55 +0000 (16:29 +0200)]
tox: remove ubuntu references

since we've dropped ubuntu testing on PRs and nightlies, we don't need
these references anymore in tox files.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: lvm_setup.yml, add carriage return
Guillaume Abrioux [Wed, 22 Jul 2020 05:28:34 +0000 (07:28 +0200)]
tests: lvm_setup.yml, add carriage return

This commit adds crlf between each task.
It makes the playbook more readable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: (lvm_setup.yml), don't shrink lvol
Guillaume Abrioux [Tue, 21 Jul 2020 23:51:20 +0000 (01:51 +0200)]
tests: (lvm_setup.yml), don't shrink lvol

when rerunning lvm_setup.yml on existing cluster with OSDs already
deployed, it fails like following:

```
fatal: [osd0]: FAILED! => changed=false
  msg: Sorry, no shrinking of data-lv2 to 0 permitted.
```

because we are asking `lvol` module to create a volume on an empty VG
with size extents = `100%FREE`.

The default behavior of `lvol` is to shrink the volume if the LV's current
size is greater than the requested size.

Given the requested size is calculated like this:

`size_requested = size_percent * this_vg['free'] / 100`

in our case, it is similar to:

`size_requested = 100 * 0 / 100` which basically means `0`

So the current LV size is well greater than the requested size which
leads the module to attempt to shrink it to 0 which isn't obviously now
allowed.

Adding `shrink: false` to the module calls fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-handler: add missing condition on ceph-crash
Dimitri Savineau [Tue, 21 Jul 2020 19:14:30 +0000 (15:14 -0400)]
ceph-handler: add missing condition on ceph-crash

The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocrash: rm container in ExecPreStart even with docker
Guillaume Abrioux [Tue, 21 Jul 2020 18:27:28 +0000 (20:27 +0200)]
crash: rm container in ExecPreStart even with docker

We should ensure the container is removed in `ExecPreStart` even when
`{{ container_binary }}` is docker.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-crash: introduce new role ceph-crash
Guillaume Abrioux [Fri, 3 Jul 2020 08:21:49 +0000 (10:21 +0200)]
ceph-crash: introduce new role ceph-crash

This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: remove legacy
Guillaume Abrioux [Thu, 9 Jul 2020 13:03:48 +0000 (15:03 +0200)]
defaults: remove legacy

These variables aren't consummed anywhere else than in ceph-nfs role so
there is no need to have them in `ceph-defaults`'s defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocephadm: set the command as a fact
Dimitri Savineau [Mon, 20 Jul 2020 14:41:53 +0000 (10:41 -0400)]
cephadm: set the command as a fact

Set the cephadm cmd as a fact instead of rewriting the same command
over and over.
This also fix an issue when using docker as container engine because
the --docker cephadm parameter should be use before the subcommand
not after.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofacts: fix broken facts when using --limit
Guillaume Abrioux [Mon, 13 Jul 2020 07:42:25 +0000 (09:42 +0200)]
facts: fix broken facts when using --limit

This commit fixes these tasks when --limit is used.

It makes sure the fact is set on right nodes even when the playbook is
run with `--limit`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-dashboard: copy TLS cert/key on monitor
Dimitri Savineau [Fri, 17 Jul 2020 14:38:02 +0000 (10:38 -0400)]
ceph-dashboard: copy TLS cert/key on monitor

The ceph-dashboard role is executed on the mgr nodes so the TLS cert/key
files are copied to those nodes.
But we are running importing the cert/key files into the ceph
configuration on the monitor.

Closes: #5557
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm: add playbook
Dimitri Savineau [Fri, 10 Jul 2020 21:52:38 +0000 (17:52 -0400)]
cephadm: add playbook

This adds a new playbook for deploying ceph via cephadm.

This also adds a new dedicated tox file for CI purpose.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: delegate task for orch apply
Dimitri Savineau [Wed, 15 Jul 2020 22:25:57 +0000 (18:25 -0400)]
cephadm-adopt: delegate task for orch apply

This is a partial revert of b38019e because we don't want to execute
the whole play on the monitor otherwise if we have some empty group
like rgws or mdss then the orchestrator commands will still be
executed.
Instead we should keep the real target group name at play level and
delegate the orchestator commands to the monitor. The whole play
will be skipped is the group is empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: inform users about cephadm
Dimitri Savineau [Wed, 15 Jul 2020 19:21:25 +0000 (15:21 -0400)]
cephadm-adopt: inform users about cephadm

Print a message at the end of the playbook to inform users that they
don't have to user ceph-ansible playbooks anymore as everything else
need to be done via cephadm (day 2 operation).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: refresh the service/daemon list
Dimitri Savineau [Wed, 15 Jul 2020 19:15:06 +0000 (15:15 -0400)]
cephadm-adopt: refresh the service/daemon list

When reporting the orchestrator service/daemon list at the end of the
playbook, we can use the --refresh option otherwise we could have
an outdated output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "cephadm-adopt: remove the cephadm script"
Dimitri Savineau [Wed, 15 Jul 2020 19:14:23 +0000 (15:14 -0400)]
Revert "cephadm-adopt: remove the cephadm script"

This reverts commit c3bbc6b13cee5e566b277f3146e9e6bc4cec2f52.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_key: fix bug in 'info' feature
Guillaume Abrioux [Thu, 9 Jul 2020 14:24:15 +0000 (16:24 +0200)]
ceph_key: fix bug in 'info' feature

Fix 'info' feature from ceph_key.py module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocephadm-adopt: wait for monitor in quorum
Dimitri Savineau [Fri, 10 Jul 2020 21:41:32 +0000 (17:41 -0400)]
cephadm-adopt: wait for monitor in quorum

After adopting a monitor we need to wait that monitor to join back
the quorum before moving to the next node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: add osd flags during adoption
Dimitri Savineau [Fri, 10 Jul 2020 19:24:24 +0000 (15:24 -0400)]
cephadm-adopt: add osd flags during adoption

Like rolling_update or switch2container playbooks, we need to set/unset
some osd flags before and after the OSD daemons adoption.
This also adds a task for waiting for clean pgs at then of an OSd node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: add iscsi support
Dimitri Savineau [Fri, 10 Jul 2020 18:59:06 +0000 (14:59 -0400)]
cephadm-adopt: add iscsi support

The iSCSI support has been added recently in cephadm.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: remove the cephadm script
Dimitri Savineau [Fri, 10 Jul 2020 18:45:51 +0000 (14:45 -0400)]
cephadm-adopt: remove the cephadm script

At the end of the process when don't need the cephadm script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: show orchestrator status
Dimitri Savineau [Fri, 10 Jul 2020 18:13:15 +0000 (14:13 -0400)]
cephadm-adopt: show orchestrator status

At the end of the playbook we can show the orchestrator status like
we do with the ceph status in initial deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: use placement parameter
Dimitri Savineau [Fri, 10 Jul 2020 14:42:02 +0000 (10:42 -0400)]
cephadm-adopt: use placement parameter

It's better to use the --placement parameter when using ceph orch apply
commands to avoid confusion in the parameters.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: use custom dashboard images
Dimitri Savineau [Thu, 9 Jul 2020 22:38:17 +0000 (18:38 -0400)]
cephadm-adopt: use custom dashboard images

cephadm uses default value for dashboard container images which need to
be customized by ansible for upstream or downstream purpose.
This feature wasn't present when cephadm-adopt.yml has been designed.
Also set the container_image_base variable for upgrade purpose.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: run orch apply from monitors
Dimitri Savineau [Thu, 9 Jul 2020 22:28:49 +0000 (18:28 -0400)]
cephadm-adopt: run orch apply from monitors

It looks like we can't run the ceph orch apply commands on nodes other
than monitors even if it used to work in the past.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: don't fail on systemd reset-failed
Dimitri Savineau [Thu, 9 Jul 2020 15:23:33 +0000 (11:23 -0400)]
cephadm-adopt: don't fail on systemd reset-failed

If the systemd service exists successfully then we don't need to reset
the failed state.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: copy client.admin keyring
Dimitri Savineau [Thu, 9 Jul 2020 15:19:41 +0000 (11:19 -0400)]
cephadm-adopt: copy client.admin keyring

The ceph config assimilate-conf command requires the client.admin
keyring which isn't present on all nodes most of the time.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: add cephadm_adopt scenario
Dimitri Savineau [Mon, 6 Jul 2020 18:27:50 +0000 (14:27 -0400)]
tox: add cephadm_adopt scenario

This adds an optional cephadm_adopt scenario which is based on
all_daemons.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoplay: followup on cc0d969
Guillaume Abrioux [Thu, 9 Jul 2020 11:53:28 +0000 (13:53 +0200)]
play: followup on cc0d969

Remove two other pattern 'iscsigws' in main playbook that have been
missed in cc0d9697c554e459af11965fc2710e42abef4e13

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorgw: set container memory limit to 4g
Guillaume Abrioux [Thu, 9 Jul 2020 11:07:32 +0000 (13:07 +0200)]
rgw: set container memory limit to 4g

This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: refact `ceph_uid` fact
Guillaume Abrioux [Wed, 8 Jul 2020 13:49:47 +0000 (15:49 +0200)]
facts: refact `ceph_uid` fact

There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_volume: fix regression
Guillaume Abrioux [Tue, 7 Jul 2020 23:04:10 +0000 (01:04 +0200)]
ceph_volume: fix regression

do not skip zapping if osd_fsid is passed

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add docker hub authentication in jobs
Guillaume Abrioux [Tue, 7 Jul 2020 15:11:27 +0000 (17:11 +0200)]
tests: add docker hub authentication in jobs

This commit makes all jobs authenticating to docker hub in order to
avoid the rate limit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoplay: remove backward compatibility group name
Guillaume Abrioux [Wed, 8 Jul 2020 11:51:14 +0000 (13:51 +0200)]
play: remove backward compatibility group name

It's time to remove this old group name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-nfs: change ganesha devel source
Dimitri Savineau [Thu, 2 Jul 2020 19:23:09 +0000 (15:23 -0400)]
ceph-nfs: change ganesha devel source

The download.nfs-ganesha.org source for nfs-ganesha on CentOS isn't
available anymore.
Let's switch back to shaman since we have builds available now.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: remove nfs_ganesha_stable_branch variable
Dimitri Savineau [Mon, 6 Jul 2020 14:45:21 +0000 (10:45 -0400)]
tests: remove nfs_ganesha_stable_branch variable

We don't need to override this variable in the group_vars but use the
default value instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: update nfs-ganesha to V3.3-stable
Guillaume Abrioux [Sun, 5 Jul 2020 14:54:36 +0000 (16:54 +0200)]
tests: update nfs-ganesha to V3.3-stable

not really needed in master, commit intended to be backported in octopus
branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodoc: add a note about deprecated branches
Guillaume Abrioux [Fri, 3 Jul 2020 05:14:57 +0000 (07:14 +0200)]
doc: add a note about deprecated branches

This commit adds a note about `stable-3.0` `stable-3.1` branches which
are deprecated and not maintained anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodoc: add a note about containerized deployments
Guillaume Abrioux [Fri, 3 Jul 2020 04:58:49 +0000 (06:58 +0200)]
doc: add a note about containerized deployments

This commit updates the documentation to add a note about containerized
deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>