]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agoceph_key: fix idempotency when no secret is passed
Guillaume Abrioux [Fri, 3 Apr 2020 08:24:32 +0000 (10:24 +0200)]
ceph_key: fix idempotency when no secret is passed

553584cbd0d014429e665f998776e8d198f72d2b introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 003defec0311af0f03da861f80d596852bdb9cf5)

5 years agotox: replace testinfra by pytest for add-mgrs
Dimitri Savineau [Thu, 2 Apr 2020 20:26:48 +0000 (16:26 -0400)]
tox: replace testinfra by pytest for add-mgrs

The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 92f538f1af56c8f4cd28a4409e32397efbc77e52)

5 years agodocker2podman: call `container_options_facts.yml` on osd nodes
Guillaume Abrioux [Wed, 1 Apr 2020 12:20:05 +0000 (14:20 +0200)]
docker2podman: call `container_options_facts.yml` on osd nodes

We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a4f54f6eeaac99f04faad82259129a906f4ccb9)

5 years agoceph_key: remove 'update' state
Guillaume Abrioux [Tue, 17 Mar 2020 14:34:11 +0000 (15:34 +0100)]
ceph_key: remove 'update' state

With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 553584cbd0d014429e665f998776e8d198f72d2b)

5 years agoosd: use default crush rule name when needed
Guillaume Abrioux [Fri, 27 Mar 2020 15:21:09 +0000 (16:21 +0100)]
osd: use default crush rule name when needed

When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bb9860dfd32a79587f053a417a841a57b6c0192)

5 years agotests: add more coverage in external_clients scenario
Guillaume Abrioux [Fri, 27 Mar 2020 16:56:26 +0000 (17:56 +0100)]
tests: add more coverage in external_clients scenario

Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c1c34b20157f91910ec9052dabcc543e30c2035)

5 years agoosd: support changing default rule even when osd_crush_location isn't defined
Guillaume Abrioux [Thu, 12 Mar 2020 11:14:01 +0000 (12:14 +0100)]
osd: support changing default rule even when osd_crush_location isn't defined

Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b0476385ccb00a9edb9092a183c18e2637afd5d)

5 years agopurge-container: get *all* osds id
Guillaume Abrioux [Tue, 31 Mar 2020 11:59:23 +0000 (13:59 +0200)]
purge-container: get *all* osds id

Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5e7962ccf6c9fa35f5611888826131c4e9e8f043)

5 years agoFixes for Makefile
Javier Pena [Fri, 8 Nov 2019 09:38:32 +0000 (10:38 +0100)]
Fixes for Makefile

- Set default mock configuration to epel-8-x86_64, to match the
  default dist value.
- Add support for alpha tags, like the recently added v5.0.0alpha1

Signed-off-by: Javier Pena <jpena@redhat.com>
(cherry picked from commit 19a43ff26134b1a59b4cd74cc27364ab754f3d03)

5 years agoAllow setting dist and mock configuration in Makefile
Javier Pena [Wed, 23 Oct 2019 14:45:41 +0000 (16:45 +0200)]
Allow setting dist and mock configuration in Makefile

Curently, the dist and mock configurations are hardcoded in the
Makefile to be el8 and epel-7-x86_64, respectively. This commit
allows the user to override those settings using the DIST and
MOCK_CONFIG environment variables, falling back to the current
defaults if not set.

This provides additional flexibility when building the RPM directly
from the repository.

Signed-off-by: Javier Peña <jpena@redhat.com>
(cherry picked from commit ed8341568bd4f25243bed7eecb3c0dfa7f3e007b)

5 years agoceph_volume: fix multiple db/wal devices
Dimitri Savineau [Fri, 27 Mar 2020 21:16:41 +0000 (17:16 -0400)]
ceph_volume: fix multiple db/wal devices

When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the block_db_devices and wal_devices documentation
to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 760b6cd7b0b671d63e2d6ea32ec336edd2f8cd32)

5 years agoceph-defaults: update container tag for nautilus
Dimitri Savineau [Fri, 27 Mar 2020 19:53:07 +0000 (15:53 -0400)]
ceph-defaults: update container tag for nautilus

The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: drop debian support
Dimitri Savineau [Thu, 26 Mar 2020 21:39:09 +0000 (17:39 -0400)]
rhcs: drop debian support

Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4ac99223b2dff5cf264e1b1632bf89583bff3a25)

5 years agoAdd a release note
Dimitri Savineau [Thu, 26 Mar 2020 21:56:33 +0000 (17:56 -0400)]
Add a release note

This adds a release not for the Ceph Nautilus release used in the
stable-4.0 branch.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: update ganesha to 2.8
Dimitri Savineau [Thu, 26 Mar 2020 14:57:03 +0000 (10:57 -0400)]
ceph-defaults: update ganesha to 2.8

With Ceph Nautilus release we should use nfs-ganesha 2.8

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: remove legacy comment
Guillaume Abrioux [Thu, 26 Mar 2020 06:38:36 +0000 (07:38 +0100)]
defaults: remove legacy comment

This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e551b5ba1a65e653540b5b5c7cb3a4f5d32b2540)

5 years agotests: add inventory host for 5.0 upgrade job
Guillaume Abrioux [Thu, 26 Mar 2020 10:20:47 +0000 (11:20 +0100)]
tests: add inventory host for 5.0 upgrade job

This inventory is intended to be used in the upgrade scenario in
stable-5.0 branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: fix rgw_instances_all fact
Dimitri Savineau [Tue, 24 Mar 2020 21:11:44 +0000 (17:11 -0400)]
ceph-facts: fix rgw_instances_all fact

The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0487d21938c91d4f3a48cb7157aa9b9a00f21f8f)

5 years agoceph-defaults: regenerate group_vars samples v4.0.17
Dimitri Savineau [Mon, 16 Dec 2019 20:19:35 +0000 (15:19 -0500)]
ceph-defaults: regenerate group_vars samples

In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b46839bad0732486db8a8c698f5963716c3b2b5b)

5 years agonfs: fix nfs with external ceph cluster support
Guillaume Abrioux [Thu, 19 Mar 2020 19:44:20 +0000 (20:44 +0100)]
nfs: fix nfs with external ceph cluster support

This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cc28d9ec2669f14fa2e75627093aa65905af969f)

5 years agodashboard: allow to set read-only admin user
Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user

This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fb69f6990ce0bf4c9cd4caf9ce7a29e15ab07cfd)

5 years agoceph-defaults: update grafana container tag
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag

Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b97a4d520158319a12c06ac169f645f733e47cd9)

5 years agorhcs_edits: Update grafana version
Boris Ranto [Mon, 16 Mar 2020 16:08:03 +0000 (17:08 +0100)]
rhcs_edits: Update grafana version

We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 8e8aa735e0193786a11e36c84e3dfeb5ec50e95f)

5 years agoceph-facts: Fix system_secret_key variable handling
petruha [Mon, 16 Mar 2020 16:35:20 +0000 (17:35 +0100)]
ceph-facts: Fix system_secret_key variable handling

This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
(cherry picked from commit 73b3fadb0ed4cd9eba62669c189fa76178431f07)

5 years agoconfig: remove legacy option in ceph.conf.j2
Guillaume Abrioux [Mon, 16 Mar 2020 08:55:20 +0000 (09:55 +0100)]
config: remove legacy option in ceph.conf.j2

This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 152c2caa9f8e2d250f5d11db7543c951b2ee5d4d)

5 years agodoc: update infra playbooks statements
Dimitri Savineau [Mon, 24 Feb 2020 18:58:38 +0000 (13:58 -0500)]
doc: update infra playbooks statements

We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 195944b123c0dc6780f3e3ede070f2ad9235f54d)

5 years agohandler: add rgw multi-instances support v4.0.16
Dimitri Savineau [Thu, 12 Mar 2020 16:06:55 +0000 (17:06 +0100)]
handler: add rgw multi-instances support

This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 3626c688cfb655ef2ab31a68dd9aedc806ad9038)

5 years agorgw: add multi-instances support when deploying multisite
Guillaume Abrioux [Mon, 9 Mar 2020 10:05:01 +0000 (11:05 +0100)]
rgw: add multi-instances support when deploying multisite

This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60a2e28189758c239e543d360fef8e66e70e7bf7)

5 years agoceph-infra: open radosgw ports for multi instances
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances

When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e8bf0a0cf2fdd9d02e442b6778b8b3f76a1c9473)

5 years agorgw: fix a typo in create_realm_zonegroup_zone_lists
Guillaume Abrioux [Tue, 10 Mar 2020 13:07:24 +0000 (14:07 +0100)]
rgw: fix a typo in create_realm_zonegroup_zone_lists

This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3bbd6bb774de5281f117efd146557ca691aa69c)

5 years agoinfra: add retries/until on firewalld start task
Guillaume Abrioux [Mon, 9 Mar 2020 09:40:54 +0000 (10:40 +0100)]
infra: add retries/until on firewalld start task

This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3d943fe9f1edc2f62af5c147997ef8378c82d84)

5 years agofilestore-to-bluestore: stop ceph-volume services
Dimitri Savineau [Thu, 5 Mar 2020 19:18:33 +0000 (14:18 -0500)]
filestore-to-bluestore: stop ceph-volume services

We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 38a683e5bf18e0e8ae45ee004715cdb0fae1e09d)

5 years agofilestore-to-bluestore: reuse dedicated journal
Dimitri Savineau [Mon, 3 Feb 2020 20:03:17 +0000 (15:03 -0500)]
filestore-to-bluestore: reuse dedicated journal

If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.

When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 535da53d69bdb3702acfcc23167a5f6a872811ba)

5 years agotests/requirements: bump testinfra
Dimitri Savineau [Thu, 5 Mar 2020 14:52:56 +0000 (09:52 -0500)]
tests/requirements: bump testinfra

3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ccec67aa6a5e1a36f9a22b784baf0cefe0b1d002)

5 years agorgw: add retry/until on pools tasks
Guillaume Abrioux [Fri, 6 Mar 2020 07:06:37 +0000 (08:06 +0100)]
rgw: add retry/until on pools tasks

Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7a8a719e7568f7c3b9936230e506e37869007fd6)

5 years agotests: fix update scenario
Guillaume Abrioux [Wed, 4 Mar 2020 21:52:24 +0000 (22:52 +0100)]
tests: fix update scenario

Update this scenario due to recent changes.
e361c0ff94de34dd0d627ffe6e6dd616f41c26e3 added more nodes in order to
cover code erasure pool creation in the CI. Therefore, we must add more
nodes in the 3.2 deployment too.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoclient: skip create_users_keys.yml when rolling_update
Guillaume Abrioux [Wed, 4 Mar 2020 15:33:46 +0000 (16:33 +0100)]
client: skip create_users_keys.yml when rolling_update

There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eac207091b0574e16084c62e71372b510f25de6b)

5 years agotests: add more osd nodes in all_daemons scenario
Guillaume Abrioux [Tue, 3 Mar 2020 18:01:27 +0000 (19:01 +0100)]
tests: add more osd nodes in all_daemons scenario

This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f0c6df94fabdfcd177603dfa72aadbbb5c6539a)

5 years agotests: update ooo job
Guillaume Abrioux [Tue, 3 Mar 2020 14:06:40 +0000 (15:06 +0100)]
tests: update ooo job

This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 248978596a1b444e1f49e5083a58195474925e0a)

5 years agoosd: do not change pool size on erasure pool
Guillaume Abrioux [Tue, 3 Mar 2020 09:47:19 +0000 (10:47 +0100)]
osd: do not change pool size on erasure pool

This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e17c79b871600b5488148a32c994e888fff0919f)

5 years agotests: add erasure pool creation test in CI
Guillaume Abrioux [Mon, 2 Mar 2020 16:05:18 +0000 (17:05 +0100)]
tests: add erasure pool creation test in CI

This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8cacba1f544216720f9b555bbf8265ff41f4176c)

5 years agotests: enable pg autoscaler on 1 pool
Guillaume Abrioux [Fri, 28 Feb 2020 17:38:38 +0000 (18:38 +0100)]
tests: enable pg autoscaler on 1 pool

This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a3b797e059cdd69be74841c652cdcaaa9862a169)

5 years agoosd: add pg autoscaler support
Guillaume Abrioux [Fri, 28 Feb 2020 15:03:15 +0000 (16:03 +0100)]
osd: add pg autoscaler support

This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47adc2bb08b18845c34d90bd0dafb43298e6bee5)

5 years agoosd: refact osd pool creation
Guillaume Abrioux [Fri, 28 Feb 2020 10:41:00 +0000 (11:41 +0100)]
osd: refact osd pool creation

Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bf1f125d718c386264f19ce82cae5060ac166ce6)

5 years agorgw multisite: enable more than 1 realm per cluster
Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]
rgw multisite: enable more than 1 realm per cluster

Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 71f55bd54d923f610e451e11828378c14f1ec1cc)

5 years agoshrink-rbdmirror: fix presence after removal
Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]
shrink-rbdmirror: fix presence after removal

We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d1316ce77b796ce969c2fd36c975baca3ffe0bc9)

5 years agoshrink-mgr: fix systemd condition
Dimitri Savineau [Wed, 26 Feb 2020 16:16:56 +0000 (11:16 -0500)]
shrink-mgr: fix systemd condition

This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a664159061d377d3dc03703dc3927a4d022a6b97)

5 years agotox: update shrink scenario configuration
Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]
tox: update shrink scenario configuration

The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f4413f5ce620d736db895862c8023134736ec85)

5 years agoshrink: don't use localhost node
Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]
shrink: don't use localhost node

The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 08ac2e30340e5ed6c69a74e538a109731c5e6652)

5 years agopurge: stop rgw instances by iteration
Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]
purge: stop rgw instances by iteration

It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9d3b49293d785b9b2b6bb902bce0ae8bce6b4e6e)

5 years agoceph-infra: install firewalld python bindings
Dimitri Savineau [Fri, 15 Nov 2019 15:37:27 +0000 (10:37 -0500)]
ceph-infra: install firewalld python bindings

When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 90b1fc8fe94a5349ca6f2e42f578636a877dc5d3)

5 years agoceph-infra: split firewalld tasks
Dimitri Savineau [Fri, 15 Nov 2019 15:11:33 +0000 (10:11 -0500)]
ceph-infra: split firewalld tasks

Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 45fb9241c06ae46486c0fb9664decfaa87600acf)

5 years agoAdd ansible 2.9 support
Dimitri Savineau [Fri, 1 Nov 2019 13:28:03 +0000 (09:28 -0400)]
Add ansible 2.9 support

This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit aefba82a2e2ebecd24eb743f687469d070246ca2)

5 years agoceph-validate: start with ansible version test
Dimitri Savineau [Fri, 6 Dec 2019 21:11:51 +0000 (16:11 -0500)]
ceph-validate: start with ansible version test

It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1a77dd7e914bc96614aae0014b39e3dbe5ff526a)

5 years agocommon: support OSDs with more than 2 digits
Guillaume Abrioux [Fri, 21 Feb 2020 09:22:32 +0000 (10:22 +0100)]
common: support OSDs with more than 2 digits

When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a084a2a3470887bfaa3616acac1052664b52e12a)

5 years agorequirements: enforce ansible version requirement
Guillaume Abrioux [Thu, 27 Feb 2020 12:33:31 +0000 (13:33 +0100)]
requirements: enforce ansible version requirement

See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a2d2e70ac26b5e3a9dbb3acbf5bbbc2a04defbc2)

5 years agoshrink-osd: support shrinking ceph-disk prepared osds
Guillaume Abrioux [Wed, 19 Feb 2020 17:30:14 +0000 (18:30 +0100)]
shrink-osd: support shrinking ceph-disk prepared osds

This commit adds the ceph-disk prepared osds support

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1de2bf99919af771dbd34f5cc768a1ba12fcffc8)

5 years agoshrink-osd: don't run ceph-facts entirely
Guillaume Abrioux [Wed, 19 Feb 2020 12:51:49 +0000 (13:51 +0100)]
shrink-osd: don't run ceph-facts entirely

We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 55970b18f1506ee581ef6f2f8438a023f0f01807)

5 years agoinfrastructure-playbooks: Run shrink-osd tasks on monitor
Benoît Knecht [Fri, 3 Jan 2020 09:38:20 +0000 (10:38 +0100)]
infrastructure-playbooks: Run shrink-osd tasks on monitor

Instead of running shring-osd tasks on localhost and delegating most of
them to the first monitor, run all of them on the first monitor
directly.

This has the added advantage of becoming root on the monitor only, not
on localhost.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 8b3df4e4180c90141b3fc8ca25573171eb45bb0a)

5 years agoConfigure ceph dashboard backend and dashboard_frontend_vip
Francesco Pantano [Wed, 12 Feb 2020 12:58:59 +0000 (13:58 +0100)]
Configure ceph dashboard backend and dashboard_frontend_vip

This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit 15ed9eebf15c631b33c9de54f361e3d8ae993b3d)

5 years agoceph-rgw: increase connection timeout to 10
Dimitri Savineau [Thu, 20 Feb 2020 14:49:17 +0000 (09:49 -0500)]
ceph-rgw: increase connection timeout to 10

5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 44e750ee5d2e3a7a45d02d76f9adba5895d56667)

5 years agocontainers: add KillMode=none to systemd templates
Dimitri Savineau [Tue, 11 Feb 2020 15:09:51 +0000 (10:09 -0500)]
containers: add KillMode=none to systemd templates

Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5a03e0ee1c840e9632b21e1fd8f8d88c9b736d8b)

5 years agorgw: extend automatic rgw pool creation capability v4.0.15
Ali Maredia [Tue, 10 Sep 2019 22:01:48 +0000 (22:01 +0000)]
rgw: extend automatic rgw pool creation capability

Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1834c1e48de4627ac9b12f7d84691080c7fd8c7a)

5 years agoceph-rgw-loadbalancer: Fix SSL newline issue
Florian Faltermeier [Wed, 18 Dec 2019 13:31:57 +0000 (14:31 +0100)]
ceph-rgw-loadbalancer: Fix SSL newline issue

The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.

[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.

Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
(cherry picked from commit 9d081e2453285599b78ade93ff9617a28b3d1f02)

5 years agoceph-defaults: remove bootstrap_dirs_xxx vars
Dimitri Savineau [Wed, 12 Feb 2020 19:34:30 +0000 (14:34 -0500)]
ceph-defaults: remove bootstrap_dirs_xxx vars

Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c644ea904108b302a40f3ced77cf5a6ccaee6fa4)

5 years agorgw: don't create user on secondary zones
Dimitri Savineau [Tue, 5 Nov 2019 16:32:06 +0000 (11:32 -0500)]
rgw: don't create user on secondary zones

The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.

Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16e12bf2bbf645d78063ba7d0b7a89b2348e56d1)

5 years agoceph-{mon,osd}: move default crush variables
Dimitri Savineau [Mon, 10 Feb 2020 18:43:31 +0000 (13:43 -0500)]
ceph-{mon,osd}: move default crush variables

Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fc6b337142efdc76c10340c076653d298e11c68)

5 years agoceph-grafana: fix grafana_{crt,key} condition
Dimitri Savineau [Wed, 12 Feb 2020 15:38:25 +0000 (10:38 -0500)]
ceph-grafana: fix grafana_{crt,key} condition

The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter

Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 15bd4cd189d0c4009bcd9dd80b296492e336661e)

5 years agoceph-prometheus: add alertmanager HA config
Dimitri Savineau [Thu, 13 Feb 2020 20:56:23 +0000 (15:56 -0500)]
ceph-prometheus: add alertmanager HA config

When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b9d975385c2dceca3b06c18d4c37eadbe9f48c92)

5 years agoThe _filtered_clients list should intersect with ansible_play_batch
John Fulton [Thu, 6 Feb 2020 02:23:54 +0000 (21:23 -0500)]
The _filtered_clients list should intersect with ansible_play_batch

Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit e4bf4857f556465c60f89d32d5f2a92d25d5c90f)

5 years agoceph-nfs: add nfs-ganesha-rados-urls package
Dimitri Savineau [Thu, 6 Feb 2020 20:41:46 +0000 (15:41 -0500)]
ceph-nfs: add nfs-ganesha-rados-urls package

Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.
We don't have the same nfs-ganesha 2.8.x between the community and rhcs
repositories.

community: 2.8.1
rhcs: 2.8.3

As a workaround we will install that package only for rhcs setup.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0a3e85e8cabf69a68329c749db277f9527cfc053)

5 years agoceph-nfs: fix ceph_nfs_ceph_user variable
Dimitri Savineau [Mon, 10 Feb 2020 16:06:48 +0000 (11:06 -0500)]
ceph-nfs: fix ceph_nfs_ceph_user variable

The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 10951eeea8236e359d0c6f4b07c0355ffe800a11)

5 years agodashboard: allow configuring multiple grafana host
Dimitri Savineau [Mon, 27 Jan 2020 19:47:00 +0000 (14:47 -0500)]
dashboard: allow configuring multiple grafana host

When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.

grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).

We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c6e96699f71c634b3028d819bdc29aedd8b64ce9)

5 years agoswitch_to_containers: increase health check values
Guillaume Abrioux [Thu, 30 Jan 2020 10:33:38 +0000 (11:33 +0100)]
switch_to_containers: increase health check values

This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3700aa5385a986460f49370b9bfcb762c414d54d)

5 years agopurge/update: remove backward compatibility legacy
Guillaume Abrioux [Tue, 26 Nov 2019 13:43:07 +0000 (14:43 +0100)]
purge/update: remove backward compatibility legacy

This was introduced in 3.1 and marked as deprecation
We can definitely drop it in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0441812959c3283b2db9795eeddbaa022b35aff7)

5 years agoAdd option for HAproxy to act a SSL frontend termination point for loadbalanced RGW...
Stanley Lam [Thu, 21 Nov 2019 22:40:51 +0000 (14:40 -0800)]
Add option for HAproxy to act a SSL frontend termination point for loadbalanced RGW instances.

Signed-off-by: Stanley Lam <stanleylam_604@hotmail.com>
(cherry picked from commit ad7a5dad3f0b3f731107d3f7f1011dc129135e9a)

5 years agoswitch_to_containers: exclude clients nodes from facts gathering
Guillaume Abrioux [Mon, 9 Dec 2019 13:20:42 +0000 (14:20 +0100)]
switch_to_containers: exclude clients nodes from facts gathering

just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 332c39376b45375a2c0566406b49896ae316a293)

5 years agoceph-handler: Use /proc/net/unix for rgw socket
Dimitri Savineau [Tue, 6 Aug 2019 15:41:02 +0000 (11:41 -0400)]
ceph-handler: Use /proc/net/unix for rgw socket

If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with

test: xxxxxxxxx.asok: binary operator expected

$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok

We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 60cbfdc2a60520e4cd14ee6f54acdc511092b77e)

5 years agofilestore-to-bluestore: skip bluestore osd nodes
Dimitri Savineau [Thu, 23 Jan 2020 21:58:14 +0000 (16:58 -0500)]
filestore-to-bluestore: skip bluestore osd nodes

If the OSD node is already using bluestore OSDs then we should skip
all the remaining tasks to avoid purging OSD for nothing.
Instead we warn the user.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 83c5a1d7a831910f4baa11ab95f9354b233a796a)

5 years agoceph-container-engine: lvm2 on OSD nodes only
Dimitri Savineau [Wed, 29 Jan 2020 03:31:04 +0000 (22:31 -0500)]
ceph-container-engine: lvm2 on OSD nodes only

Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit fa8aa8c86470505a10ae2ce0573bd37958226624)

5 years agoupdate: remove legacy tasks
Guillaume Abrioux [Tue, 28 Jan 2020 09:29:03 +0000 (10:29 +0100)]
update: remove legacy tasks

These tasks should have been removed with backport #4756

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793564
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-common: rhcs 4 repositories for rhel 7
Dimitri Savineau [Fri, 31 Jan 2020 13:59:21 +0000 (08:59 -0500)]
ceph-common: rhcs 4 repositories for rhel 7

RHCS 4 is available for both RHEL 7 and 8 so we should also enable the
cdn repositories for that distribution.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9b40a959b9c42abb8b98ec0a0e458203b6331314)

5 years agoiscsi: Fix crashes during rolling update
Mike Christie [Tue, 28 Jan 2020 22:31:55 +0000 (16:31 -0600)]
iscsi: Fix crashes during rolling update

During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.

This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 77f3b5d51b84a6338847c5f6a93f22a3a6a683d2)

5 years agopurge: fix purge cluster failed
wujie1993 [Sun, 5 Jan 2020 07:31:46 +0000 (15:31 +0800)]
purge: fix purge cluster failed

Fix purge cluster failed when local container images does not exist.

Purge node-exporter and grafana-server only when dashboard_enabled is set to True.

Signed-off-by: wujie1993 qq594jj@gmail.com
(cherry picked from commit d8b0b3cbd94655a36bbaa828410977eba6b1aa21)

5 years agoconfig: fix external client scenario
Guillaume Abrioux [Fri, 31 Jan 2020 10:51:54 +0000 (11:51 +0100)]
config: fix external client scenario

When no monitor group is present in the inventory, this task fails.
This affects only non-containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e7bc0794054008ac2d6771f0d29d275493319665)

5 years agotests: add external_clients scenario
Guillaume Abrioux [Thu, 30 Jan 2020 12:00:47 +0000 (13:00 +0100)]
tests: add external_clients scenario

This commit adds a new 'external ceph clients' scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 641729357e5c8bc4dbf90b5c4f7b20e1d3d51f7d)

5 years agoceph-defaults: remove rgw from ceph_conf_overrides
Dimitri Savineau [Tue, 28 Jan 2020 15:27:34 +0000 (10:27 -0500)]
ceph-defaults: remove rgw from ceph_conf_overrides

The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2f07b8513158d3fc36c5b0d29386f46dd28b5efa)

5 years agodashboard: add quotes when passing password to the CLI
Guillaume Abrioux [Tue, 28 Jan 2020 14:32:27 +0000 (15:32 +0100)]
dashboard: add quotes when passing password to the CLI

Otherwise, if the variables contains a '$' it will be interpreted as a BASH
variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8c3759f8ce04a4c2df673290197ef12fe98d992a)

5 years agotests: set dashboard|grafana_admin_password
Guillaume Abrioux [Tue, 28 Jan 2020 13:04:45 +0000 (14:04 +0100)]
tests: set dashboard|grafana_admin_password

Set these 2 variables in all test scenarios where `dashboard_enabled` is
`True`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c040199c8f277b33bd1447f2cbce71861055711f)

5 years agovalidate: fail if dashboard|grafana_admin_password aren't set
Guillaume Abrioux [Tue, 28 Jan 2020 12:55:54 +0000 (13:55 +0100)]
validate: fail if dashboard|grafana_admin_password aren't set

This commit adds a task to make sure user set a custom password for
`grafana_admin_password` and `dashboard_admin_password` variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 99328545de07d94c4a2bdd67c6ac8bc9280f23c5)

5 years agoceph-facts: fix _container_exec_cmd fact value v4.0.14
Dimitri Savineau [Wed, 29 Jan 2020 02:34:24 +0000 (21:34 -0500)]
ceph-facts: fix _container_exec_cmd fact value

When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).

Later the container exec commands will fail because the container name
is wrong.

We should always set the _container_exec_cmd based on the
ansible_hostname fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1fcafffdad43476d1d99766a57b4087cfe43718b)

5 years agotox: set extras vars for filestore-to-bluestore
Dimitri Savineau [Mon, 27 Jan 2020 16:31:47 +0000 (11:31 -0500)]
tox: set extras vars for filestore-to-bluestore

The ansible extra variables aren't set with the ansible-playbook
command running the filestore-to-bluestore playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a27290bf98470236d2a1362224fb5555884259c9)

5 years agofilestore-to-bluestore: fix undefine osd_fsid_list
Dimitri Savineau [Mon, 27 Jan 2020 14:36:56 +0000 (09:36 -0500)]
filestore-to-bluestore: fix undefine osd_fsid_list

If the playbook is used on a host running bluestore OSDs then the
osd_fsid_list won't be filled because the bluestore OSDs are reported
with 'type: block' via ceph-volume lvm list command but we are looking
for 'type: data' (filestore).

TASK [zap ceph-volume prepared OSDs] *********
fatal: [xxxxx]: FAILED! =>
  msg: '''osd_fsid_list'' is undefined

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cd76054f76fa4ce618335a3693c6c0f95d9209e6)

5 years agotests: add 'all_in_one' scenario v4.0.13
Guillaume Abrioux [Mon, 27 Jan 2020 14:49:30 +0000 (15:49 +0100)]
tests: add 'all_in_one' scenario

Add new scenario 'all_in_one' in order to catch more collocated related
issues.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3e7dbb4b16dc7de3b46c18db4c00e7f2c2a50453)

5 years agofix calls to `container_exec_cmd` in ceph-osd role
Guillaume Abrioux [Mon, 27 Jan 2020 12:31:29 +0000 (13:31 +0100)]
fix calls to `container_exec_cmd` in ceph-osd role

We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f919f8971891c780ecb46ac233da27cc14ae2b9)

5 years agofilestore-to-bluestore: don't fail when with no PV
Dimitri Savineau [Fri, 24 Jan 2020 16:50:34 +0000 (11:50 -0500)]
filestore-to-bluestore: don't fail when with no PV

When the PV is already removed from the devices then we should not fail
to avoid errors like:

stderr: No PV found on device /dev/sdb.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a9c23005455a57f2fe1e5356a6ab24f47f1eaa2f)

5 years agohandler: read container_exec_cmd value from first mon v4.0.12
Guillaume Abrioux [Thu, 23 Jan 2020 14:51:17 +0000 (15:51 +0100)]
handler: read container_exec_cmd value from first mon

Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eb9112d8fbbd33e89a365029feaaed0459c9b86a)

5 years agoceph-facts: Fix for 'running_mon is undefined' error, so that
Vytenis Sabaliauskas [Thu, 23 Jan 2020 08:58:18 +0000 (10:58 +0200)]
ceph-facts: Fix for 'running_mon is undefined' error, so that
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'

Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
(cherry picked from commit ed1eaa1f38022dfea37d8352d81fc0aa6058fa23)

5 years agosite-container: don't skip ceph-container-common
Dimitri Savineau [Wed, 22 Jan 2020 19:45:38 +0000 (14:45 -0500)]
site-container: don't skip ceph-container-common

On HCI environment the OSD and Client nodes are collocated. Because we
aren't running the ceph-container-common role on the client nodes except
the first one (for keyring purpose) then the ceph-role execution fails
due to undefined variables.

Closes: #4970
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794195
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 671b1aba3c7ca9eca8c630bc4ce13d6dc2e185c5)

5 years agorolling_update: support upgrading 3.x + ceph-metrics on a dedicated node v4.0.11
Guillaume Abrioux [Wed, 22 Jan 2020 14:00:01 +0000 (15:00 +0100)]
rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node

When upgrading from RHCS 3.x where ceph-metrics was deployed on a
dedicated node to RHCS 4.0, it fails like following:

```
fatal: [magna005]: FAILED! => changed=false
  gid: 0
  group: root
  mode: '0755'
  msg: 'chown failed: failed to look up user ceph'
  owner: root
  path: /etc/ceph
  secontext: unconfined_u:object_r:etc_t:s0
  size: 4096
  state: directory
  uid: 0
```

because we are trying to run `ceph-config` on this node, it doesn't make
sense so we should simply run this play on all groups except
`[grafana-server]`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5812fe45b328951e0ecc249e3b83f03e7a0a4ce)