]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agoswitch-to-containers: set and unset osd flags
Guillaume Abrioux [Fri, 3 Apr 2020 13:36:23 +0000 (15:36 +0200)]
switch-to-containers: set and unset osd flags

The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cfaa056e020615bb99eb9db1520a977e5ac3ef4)

5 years agoupdate: use tasks_from when including ceph-facts
Guillaume Abrioux [Fri, 3 Apr 2020 13:07:54 +0000 (15:07 +0200)]
update: use tasks_from when including ceph-facts

When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6df7887f87e990d61f0de132113d7589117de9c5)

5 years agoceph-mgr: add saml python lib for dashboard SSO
Dimitri Savineau [Fri, 3 Apr 2020 20:33:11 +0000 (16:33 -0400)]
ceph-mgr: add saml python lib for dashboard SSO

The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6617d90733d08b03779657c0129b7a7089eb4a90)

5 years agoceph_key: fetch key when needed
Guillaume Abrioux [Fri, 3 Apr 2020 16:23:00 +0000 (18:23 +0200)]
ceph_key: fetch key when needed

Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ccfa249919b338197daec353cb5d4e535b3fb734)

5 years agoceph_key: fix idempotency when no secret is passed
Guillaume Abrioux [Fri, 3 Apr 2020 08:24:32 +0000 (10:24 +0200)]
ceph_key: fix idempotency when no secret is passed

553584cbd0d014429e665f998776e8d198f72d2b introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003defec0311af0f03da861f80d596852bdb9cf5)

5 years agotox: replace testinfra by pytest for add-mgrs
Dimitri Savineau [Thu, 2 Apr 2020 20:26:48 +0000 (16:26 -0400)]
tox: replace testinfra by pytest for add-mgrs

The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 92f538f1af56c8f4cd28a4409e32397efbc77e52)

5 years agovagrant: force centos 8.1 libvirt image
Dimitri Savineau [Wed, 1 Apr 2020 19:46:20 +0000 (15:46 -0400)]
vagrant: force centos 8.1 libvirt image

The current centos/8 vagrant image (libvirt) is still using the
CentOS 8.0 release (1905) while the 8.1 release (1911) is already
available since few months.
Using an update CentOS 8 release fixes slow ceph-volume/lvm commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6264f6979e34d65f6c7f9478fed47472271a0769)

5 years agodocker2podman: call `container_options_facts.yml` on osd nodes
Guillaume Abrioux [Wed, 1 Apr 2020 12:20:05 +0000 (14:20 +0200)]
docker2podman: call `container_options_facts.yml` on osd nodes

We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a4f54f6eeaac99f04faad82259129a906f4ccb9)

5 years agoceph_key: remove 'update' state
Guillaume Abrioux [Tue, 17 Mar 2020 14:34:11 +0000 (15:34 +0100)]
ceph_key: remove 'update' state

With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0d014429e665f998776e8d198f72d2b)

5 years agoosd: use default crush rule name when needed
Guillaume Abrioux [Fri, 27 Mar 2020 15:21:09 +0000 (16:21 +0100)]
osd: use default crush rule name when needed

When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd32a79587f053a417a841a57b6c0192)

5 years agotests: add more coverage in external_clients scenario
Guillaume Abrioux [Fri, 27 Mar 2020 16:56:26 +0000 (17:56 +0100)]
tests: add more coverage in external_clients scenario

Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c1c34b20157f91910ec9052dabcc543e30c2035)

5 years agoosd: support changing default rule even when osd_crush_location isn't defined
Guillaume Abrioux [Thu, 12 Mar 2020 11:14:01 +0000 (12:14 +0100)]
osd: support changing default rule even when osd_crush_location isn't defined

Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385ccb00a9edb9092a183c18e2637afd5d)

5 years agoremove *docker*.yml symlinks
Guillaume Abrioux [Tue, 31 Mar 2020 12:08:30 +0000 (14:08 +0200)]
remove *docker*.yml symlinks

This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9219991441ba4205234d904cc45f06fdd257e1ba)

5 years agopurge-container: get *all* osds id
Guillaume Abrioux [Tue, 31 Mar 2020 11:59:23 +0000 (13:59 +0200)]
purge-container: get *all* osds id

Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e7962ccf6c9fa35f5611888826131c4e9e8f043)

5 years agonfs: update repo url
Guillaume Abrioux [Thu, 26 Mar 2020 07:25:05 +0000 (08:25 +0100)]
nfs: update repo url

as of V3.2 / octopus, the url has changed.
This commit updates the url in the playbook accordingly.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: change nfs_ganesha_stable_branch
Guillaume Abrioux [Wed, 25 Mar 2020 21:09:32 +0000 (22:09 +0100)]
defaults: change nfs_ganesha_stable_branch

In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: bump nfs-ganesha version
Guillaume Abrioux [Thu, 26 Mar 2020 04:42:56 +0000 (05:42 +0100)]
tests: bump nfs-ganesha version

This commit change the nfs-ganesha version for all_daemons scenario
(V3.2)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: add el8 repo configuration
Guillaume Abrioux [Wed, 25 Mar 2020 17:07:24 +0000 (18:07 +0100)]
common: add el8 repo configuration

This commit adds el8 repo configuration since we are still missing these
packages in el8.

This is only intended to make CI passing on stable-5.0 and *must* be
reverted once these packages will be available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update testing
Guillaume Abrioux [Tue, 24 Mar 2020 15:11:20 +0000 (16:11 +0100)]
tests: update testing

This commit updates the testing so we test stable-5.0 against
ceph@octopus.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-defaults: update ceph_stable_redhat_distro
Dimitri Savineau [Wed, 25 Mar 2020 18:35:55 +0000 (14:35 -0400)]
ceph-defaults: update ceph_stable_redhat_distro

Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_volume: fix multiple db/wal/journal devices
Dimitri Savineau [Fri, 27 Mar 2020 21:16:41 +0000 (17:16 -0400)]
ceph_volume: fix multiple db/wal/journal devices

When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0b671d63e2d6ea32ec336edd2f8cd32)

5 years agocontainer: remove ulimit nofile parameter
Dimitri Savineau [Fri, 10 Jan 2020 20:35:58 +0000 (15:35 -0500)]
container: remove ulimit nofile parameter

Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 64701437de95850f52f38f10cc16f72db8000505)

5 years agorhcs: drop debian support
Dimitri Savineau [Thu, 26 Mar 2020 21:39:09 +0000 (17:39 -0400)]
rhcs: drop debian support

Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2dff5cf264e1b1632bf89583bff3a25)

5 years agorhcs: update release to 5 for octopus
Dimitri Savineau [Thu, 26 Mar 2020 18:41:14 +0000 (14:41 -0400)]
rhcs: update release to 5 for octopus

RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90ad1108619dc830ba7f9cd8f5263e0409f80389)

5 years agodefaults: remove legacy comment
Guillaume Abrioux [Thu, 26 Mar 2020 06:38:36 +0000 (07:38 +0100)]
defaults: remove legacy comment

This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a65e653540b5b5c7cb3a4f5d32b2540)

5 years agoceph-facts: fix rgw_instances_all fact
Dimitri Savineau [Tue, 24 Mar 2020 21:11:44 +0000 (17:11 -0400)]
ceph-facts: fix rgw_instances_all fact

The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938c91d4f3a48cb7157aa9b9a00f21f8f)

5 years agoosd: add a default value for 'default' in crush_rules v6.0.0alpha1
Guillaume Abrioux [Tue, 24 Mar 2020 08:56:45 +0000 (09:56 +0100)]
osd: add a default value for 'default' in crush_rules

Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd pacific release
Dimitri Savineau [Mon, 23 Mar 2020 18:22:46 +0000 (14:22 -0400)]
Add pacific release

Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofacts: fix typo
Guillaume Abrioux [Mon, 23 Mar 2020 15:14:29 +0000 (16:14 +0100)]
facts: fix typo

This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agonfs: fix nfs with external ceph cluster support
Guillaume Abrioux [Thu, 19 Mar 2020 19:44:20 +0000 (20:44 +0100)]
nfs: fix nfs with external ceph cluster support

This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: allow to set read-only admin user
Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user

This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: add registry name on dashboard vars
Dimitri Savineau [Tue, 17 Mar 2020 00:45:03 +0000 (20:45 -0400)]
ceph-defaults: add registry name on dashboard vars

We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: update grafana container tag
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag

Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: Fix system_secret_key variable handling
petruha [Mon, 16 Mar 2020 16:35:20 +0000 (17:35 +0100)]
ceph-facts: Fix system_secret_key variable handling

This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
5 years agorhcs_edits: Update grafana version
Boris Ranto [Mon, 16 Mar 2020 16:08:03 +0000 (17:08 +0100)]
rhcs_edits: Update grafana version

We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
5 years agoconfig: remove legacy option in ceph.conf.j2
Guillaume Abrioux [Mon, 16 Mar 2020 08:55:20 +0000 (09:55 +0100)]
config: remove legacy option in ceph.conf.j2

This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agohandler: add rgw multi-instances support
Dimitri Savineau [Thu, 12 Mar 2020 16:06:55 +0000 (17:06 +0100)]
handler: add rgw multi-instances support

This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: add multi-instances support when deploying multisite
Guillaume Abrioux [Mon, 9 Mar 2020 10:05:01 +0000 (11:05 +0100)]
rgw: add multi-instances support when deploying multisite

This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: open radosgw ports for multi instances
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances

When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-container: clean legacy code
Guillaume Abrioux [Thu, 12 Mar 2020 11:22:02 +0000 (12:22 +0100)]
purge-container: clean legacy code

This commit removes a register which isn't used in this playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate osd pool set size command
Dimitri Savineau [Wed, 11 Mar 2020 00:50:55 +0000 (20:50 -0400)]
update osd pool set size command

Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: fix a typo in create_realm_zonegroup_zone_lists
Guillaume Abrioux [Tue, 10 Mar 2020 13:07:24 +0000 (14:07 +0100)]
rgw: fix a typo in create_realm_zonegroup_zone_lists

This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: add retries/until on firewalld start task
Guillaume Abrioux [Mon, 9 Mar 2020 09:40:54 +0000 (10:40 +0100)]
infra: add retries/until on firewalld start task

This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoopenstack_keys: use openstack_cinder_pool.name
Christian Berendt [Sat, 7 Mar 2020 23:11:45 +0000 (00:11 +0100)]
openstack_keys: use openstack_cinder_pool.name

Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
5 years agotests/requirements: bump testinfra
Dimitri Savineau [Thu, 5 Mar 2020 14:52:56 +0000 (09:52 -0500)]
tests/requirements: bump testinfra

3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodoc: Fixed a minor typo in the document
Nizamudeen [Fri, 6 Mar 2020 10:38:31 +0000 (16:08 +0530)]
doc: Fixed a minor typo in the document

In the Demo part of the Document, for both Vagrant and Bare metal the description was mentioned as "Deployment from scratch on bare metal machines".
Changed "bare metal" to "vagrant" for Vagrant section

Signed-off-by: Nizamudeen <nia@redhat.com>
5 years agorgw: add retry/until on pools tasks
Guillaume Abrioux [Fri, 6 Mar 2020 07:06:37 +0000 (08:06 +0100)]
rgw: add retry/until on pools tasks

Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: stop ceph-volume services
Dimitri Savineau [Thu, 5 Mar 2020 19:18:33 +0000 (14:18 -0500)]
filestore-to-bluestore: stop ceph-volume services

We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoclient: skip create_users_keys.yml when rolling_update
Guillaume Abrioux [Wed, 4 Mar 2020 15:33:46 +0000 (16:33 +0100)]
client: skip create_users_keys.yml when rolling_update

There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorgw multisite: enable more than 1 realm per cluster
Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]
rgw multisite: enable more than 1 realm per cluster

Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agotests: add more osd nodes in all_daemons scenario
Guillaume Abrioux [Tue, 3 Mar 2020 18:01:27 +0000 (19:01 +0100)]
tests: add more osd nodes in all_daemons scenario

This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update ooo job
Guillaume Abrioux [Tue, 3 Mar 2020 14:06:40 +0000 (15:06 +0100)]
tests: update ooo job

This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: do not change pool size on erasure pool
Guillaume Abrioux [Tue, 3 Mar 2020 09:47:19 +0000 (10:47 +0100)]
osd: do not change pool size on erasure pool

This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add erasure pool creation test in CI
Guillaume Abrioux [Mon, 2 Mar 2020 16:05:18 +0000 (17:05 +0100)]
tests: add erasure pool creation test in CI

This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: enable pg autoscaler on 1 pool
Guillaume Abrioux [Fri, 28 Feb 2020 17:38:38 +0000 (18:38 +0100)]
tests: enable pg autoscaler on 1 pool

This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: add pg autoscaler support
Guillaume Abrioux [Fri, 28 Feb 2020 15:03:15 +0000 (16:03 +0100)]
osd: add pg autoscaler support

This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: refact osd pool creation
Guillaume Abrioux [Fri, 28 Feb 2020 10:41:00 +0000 (11:41 +0100)]
osd: refact osd pool creation

Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add lvm batch filestore testing
Guillaume Abrioux [Tue, 3 Mar 2020 16:18:17 +0000 (17:18 +0100)]
tests: add lvm batch filestore testing

This commit adds an OSD node in lvm-batch scenario in order to test
filestore backend.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: increase journal_size value
Guillaume Abrioux [Tue, 3 Mar 2020 15:56:33 +0000 (16:56 +0100)]
tests: increase journal_size value

Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).

Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
 stdout: Physical volume "/dev/sdc" successfully created.
 stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
-->  RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```

[1] https://tracker.ceph.com/issues/41374

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agolibrary: fix bug in ceph_volume
Guillaume Abrioux [Tue, 3 Mar 2020 15:12:10 +0000 (16:12 +0100)]
library: fix bug in ceph_volume

This commit fixes a regression introduced by
0326d992c2b6b92a6d4b8ccf2a51d9343652da48.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-rbdmirror: fix presence after removal
Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]
shrink-rbdmirror: fix presence after removal

We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink-mgr: fix systemd condition
Dimitri Savineau [Wed, 26 Feb 2020 16:16:56 +0000 (11:16 -0500)]
shrink-mgr: fix systemd condition

This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: update shrink scenario configuration
Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]
tox: update shrink scenario configuration

The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink: don't use localhost node
Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]
shrink: don't use localhost node

The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-validate: add key format validation
Dimitri Savineau [Fri, 28 Feb 2020 14:42:44 +0000 (09:42 -0500)]
ceph-validate: add key format validation

If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.

Closes: #5104
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: stop rgw instances by iteration
Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]
purge: stop rgw instances by iteration

It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: install firewalld python bindings
Dimitri Savineau [Fri, 15 Nov 2019 15:37:27 +0000 (10:37 -0500)]
ceph-infra: install firewalld python bindings

When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: split firewalld tasks
Dimitri Savineau [Fri, 15 Nov 2019 15:11:33 +0000 (10:11 -0500)]
ceph-infra: split firewalld tasks

Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoAdd ansible 2.9 support
Dimitri Savineau [Fri, 1 Nov 2019 13:28:03 +0000 (09:28 -0400)]
Add ansible 2.9 support

This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: add journal option in ceph_volume call (batch)
Guillaume Abrioux [Fri, 28 Feb 2020 18:59:22 +0000 (19:59 +0100)]
osd: add journal option in ceph_volume call (batch)

This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorequirements: enforce ansible version requirement
Guillaume Abrioux [Thu, 27 Feb 2020 12:33:31 +0000 (13:33 +0100)]
requirements: enforce ansible version requirement

See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: support OSDs with more than 2 digits
Guillaume Abrioux [Fri, 21 Feb 2020 09:22:32 +0000 (10:22 +0100)]
common: support OSDs with more than 2 digits

When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: support shrinking ceph-disk prepared osds
Guillaume Abrioux [Wed, 19 Feb 2020 17:30:14 +0000 (18:30 +0100)]
shrink-osd: support shrinking ceph-disk prepared osds

This commit adds the ceph-disk prepared osds support

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: don't run ceph-facts entirely
Guillaume Abrioux [Wed, 19 Feb 2020 12:51:49 +0000 (13:51 +0100)]
shrink-osd: don't run ceph-facts entirely

We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: reuse dedicated journal
Dimitri Savineau [Mon, 3 Feb 2020 20:03:17 +0000 (15:03 -0500)]
filestore-to-bluestore: reuse dedicated journal

If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.

When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodoc: update infra playbooks statements
Dimitri Savineau [Mon, 24 Feb 2020 18:58:38 +0000 (13:58 -0500)]
doc: update infra playbooks statements

We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: increase connection timeout to 10
Dimitri Savineau [Thu, 20 Feb 2020 14:49:17 +0000 (09:49 -0500)]
ceph-rgw: increase connection timeout to 10

5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoConfigure ceph dashboard backend and dashboard_frontend_vip
Francesco Pantano [Wed, 12 Feb 2020 12:58:59 +0000 (13:58 +0100)]
Configure ceph dashboard backend and dashboard_frontend_vip

This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
5 years agoinfrastructure-playbooks: Run shrink-osd tasks on monitor
Benoît Knecht [Fri, 3 Jan 2020 09:38:20 +0000 (10:38 +0100)]
infrastructure-playbooks: Run shrink-osd tasks on monitor

Instead of running shring-osd tasks on localhost and delegating most of
them to the first monitor, run all of them on the first monitor
directly.

This has the added advantage of becoming root on the monitor only, not
on localhost.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-dashboard: update create/get rgw user tasks
Dimitri Savineau [Mon, 17 Feb 2020 20:46:54 +0000 (15:46 -0500)]
ceph-dashboard: update create/get rgw user tasks

Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: allow SSL certificate content to supplied
Sam Choraria [Tue, 3 Dec 2019 12:23:13 +0000 (12:23 +0000)]
ceph-rgw: allow SSL certificate content to supplied

Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
5 years agoceph-defaults: remove bootstrap_dirs_xxx vars
Dimitri Savineau [Wed, 12 Feb 2020 19:34:30 +0000 (14:34 -0500)]
ceph-defaults: remove bootstrap_dirs_xxx vars

Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: extend automatic rgw pool creation capability
Ali Maredia [Tue, 10 Sep 2019 22:01:48 +0000 (22:01 +0000)]
rgw: extend automatic rgw pool creation capability

Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw-loadbalancer: Fix SSL newline issue
Florian Faltermeier [Wed, 18 Dec 2019 13:31:57 +0000 (14:31 +0100)]
ceph-rgw-loadbalancer: Fix SSL newline issue

The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.

[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.

Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
5 years agorgw: don't create user on secondary zones
Dimitri Savineau [Tue, 5 Nov 2019 16:32:06 +0000 (11:32 -0500)]
rgw: don't create user on secondary zones

The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.

Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-cluster: update package list to remove
Dimitri Savineau [Thu, 16 Jan 2020 21:58:35 +0000 (16:58 -0500)]
purge-cluster: update package list to remove

We only support python3 so renaming all ceph python packages.
Some ceph packages were missing from the list (ceph-mon, ceph-osd or
rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "vagrant: temp workaround for CentOS 8 cloud image"
Dimitri Savineau [Mon, 3 Feb 2020 16:15:33 +0000 (11:15 -0500)]
Revert "vagrant: temp workaround for CentOS 8 cloud image"

The CentOS 8 vagrant image download is now fixed.

This reverts commit a5385e104884a3692954e4691f3348847a35c7fa.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoThe _filtered_clients list should intersect with ansible_play_batch
John Fulton [Thu, 6 Feb 2020 02:23:54 +0000 (21:23 -0500)]
The _filtered_clients list should intersect with ansible_play_batch

Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>
5 years agotests: don't install s3cmd on containerized setup
Dimitri Savineau [Fri, 7 Feb 2020 20:58:17 +0000 (15:58 -0500)]
tests: don't install s3cmd on containerized setup

The s3cmd package should only be installed on non containerized
deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-iscsi: don't use ceph_dev_xxx variables
Dimitri Savineau [Thu, 6 Feb 2020 21:31:39 +0000 (16:31 -0500)]
ceph-iscsi: don't use ceph_dev_xxx variables

Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi
repositories from shaman doesn't make sense because the ceph devel
branches and sha1 aren't compatible with ceph-iscsi devel.
Instead we could rely on the master branch and the latest sha1.
Currently it's not possible to using a custom ceph branch/sha1 value
with iscsi setup otherwise the repository setup will fail.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix ceph_nfs_ceph_user variable
Dimitri Savineau [Mon, 10 Feb 2020 16:06:48 +0000 (11:06 -0500)]
ceph-nfs: fix ceph_nfs_ceph_user variable

The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: add nfs-ganesha-rados-urls package
Dimitri Savineau [Thu, 6 Feb 2020 20:41:46 +0000 (15:41 -0500)]
ceph-nfs: add nfs-ganesha-rados-urls package

Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-{mon,osd}: move default crush variables
Dimitri Savineau [Mon, 10 Feb 2020 18:43:31 +0000 (13:43 -0500)]
ceph-{mon,osd}: move default crush variables

Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-grafana: fix grafana_{crt,key} condition
Dimitri Savineau [Wed, 12 Feb 2020 15:38:25 +0000 (10:38 -0500)]
ceph-grafana: fix grafana_{crt,key} condition

The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter

Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-prometheus: add alertmanager HA config
Dimitri Savineau [Thu, 13 Feb 2020 20:56:23 +0000 (15:56 -0500)]
ceph-prometheus: add alertmanager HA config

When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainers: add KillMode=none to systemd templates
Dimitri Savineau [Tue, 11 Feb 2020 15:09:51 +0000 (10:09 -0500)]
containers: add KillMode=none to systemd templates

Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: allow configuring multiple grafana host
Dimitri Savineau [Mon, 27 Jan 2020 19:47:00 +0000 (14:47 -0500)]
dashboard: allow configuring multiple grafana host

When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.

grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).

We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_containers: increase health check values
Guillaume Abrioux [Thu, 30 Jan 2020 10:33:38 +0000 (11:33 +0100)]
switch_to_containers: increase health check values

This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoRevert "rhcs: update container image name"
Dimitri Savineau [Wed, 5 Feb 2020 00:51:13 +0000 (19:51 -0500)]
Revert "rhcs: update container image name"

This wasn't necesarry. The container image was fixed on the
RedHat's registry

This reverts commit 3bd250c7422a6eaaffb2d8c6a4750b232e6b1c7e.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: update container image name
Dimitri Savineau [Tue, 4 Feb 2020 20:19:53 +0000 (15:19 -0500)]
rhcs: update container image name

The RHCS 4 container image is rhceph/rhceph-4

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797743
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>