git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

projects / ceph-ansible.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Guillaume Abrioux [Fri, 3 Apr 2020 13:36:23 +0000 (15:36 +0200)]

switch-to-containers: set and unset osd flags

The workflow in this playbook should be the same than in rolling_update,
we should first set noout and nodeep-scrub flags before migrating the
first osd and unset osd flags after the last osd is migrated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 3 Apr 2020 13:07:54 +0000 (15:07 +0200)]

update: use tasks_from when including ceph-facts

When setting/unsetting osd flags, we can use `tasks_from` when importing
`ceph-facts` role to save some times given that we only need this role
for setting `container_binary`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 3 Apr 2020 20:33:11 +0000 (16:33 -0400)]

ceph-mgr: add saml python lib for dashboard SSO

The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1820233
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 3 Apr 2020 16:23:00 +0000 (18:23 +0200)]

ceph_key: fetch key when needed

Fetch the key when it is present in the cluster but not on the node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 3 Apr 2020 08:24:32 +0000 (10:24 +0200)]

ceph_key: fix idempotency when no secret is passed

553584cbd0d014429e665f998776e8d198f72d2b introduced a regression when no
secret is passed, it overwrites the secret each time the task is run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 2 Apr 2020 20:26:48 +0000 (16:26 -0400)]

tox: replace testinfra by pytest for add-mgrs

The add-mgrs scenario is still using the testinfra command instead of
pytest so the tests exectution are failling.

ERROR: InvocationError for command could not find executable testinfra

This also adds the missing --ssh-config option to testinfra.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 1 Apr 2020 18:51:49 +0000 (14:51 -0400)]

tox-docker2podman: update container image tag

The current docker to podman scenario is using the nautilus container
image tag instead of master.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 1 Apr 2020 19:46:20 +0000 (15:46 -0400)]

vagrant: force centos 8.1 libvirt image

The current centos/8 vagrant image (libvirt) is still using the
CentOS 8.0 release (1905) while the 8.1 release (1911) is already
available since few months.
Using an update CentOS 8 release fixes slow ceph-volume/lvm commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 1 Apr 2020 12:20:05 +0000 (14:20 +0200)]

docker2podman: call `container_options_facts.yml` on osd nodes

We must call `ceph-osd` role from `container_options_facts.yml` because
ceph-osd-run.sh.j2 needs variables set in this file.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1819681
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 17 Mar 2020 14:34:11 +0000 (15:34 +0100)]

ceph_key: remove 'update' state

With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808367
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 27 Mar 2020 15:21:09 +0000 (16:21 +0100)]

osd: use default crush rule name when needed

When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1817586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 27 Mar 2020 16:56:26 +0000 (17:56 +0100)]

tests: add more coverage in external_clients scenario

Run create_users_keys.yml in external_clients scenario

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 12 Mar 2020 11:14:01 +0000 (12:14 +0100)]

osd: support changing default rule even when osd_crush_location isn't defined

Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 31 Mar 2020 12:08:30 +0000 (14:08 +0200)]

remove *docker*.yml symlinks

This commits removes these two symlinks.
They were there for backward compatibility and were marked deprecated as
of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 31 Mar 2020 11:59:23 +0000 (13:59 +0200)]

purge-container: get *all* osds id

Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814542
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 10 Jan 2020 20:35:58 +0000 (15:35 -0500)]

container: remove ulimit nofile parameter

Since Ceph Octopus is python3 only we don't need to specify the max open
files anymore with the container engine.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 27 Mar 2020 21:16:41 +0000 (17:16 -0400)]

ceph_volume: fix multiple db/wal/journal devices

When using the lvm batch ceph-volume subcommand with dedicated devices
for filestore (journal) or bluestore (db/wal) then the list of devices
is convert to a string instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.

TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] **
fatal: [xxxxxx]: FAILED! => changed=true
  cmd:
  - ceph-volume
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - --osds-per-device
  - '4'
  - /dev/nvme2n1
  - /dev/nvme3n1
  - /dev/nvme4n1
  - /dev/nvme5n1
  - /dev/nvme6n1
  - --db-devices
  - /dev/nvme0n1 /dev/nvme1n1
  - --report
  - --format=json
  msg: non-zero return code
  rc: 2
  stderr: |2-
     stderr: lsblk: /dev/nvme0n1 /dev/nvme1n1: not a block device
     stderr: error: /dev/nvme0n1 /dev/nvme1n1: No such file or directory
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
                                 [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
                                 [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
                                 [--no-auto] [--bluestore] [--filestore]
                                 [--report] [--yes] [--format {json,pretty}]
                                 [--dmcrypt]
                                 [--crush-device-class CRUSH_DEVICE_CLASS]
                                 [--no-systemd]
                                 [--osds-per-device OSDS_PER_DEVICE]
                                 [--block-db-size BLOCK_DB_SIZE]
                                 [--block-wal-size BLOCK_WAL_SIZE]
                                 [--journal-size JOURNAL_SIZE] [--prepare]
                                 [--osd-ids [OSD_IDS [OSD_IDS ...]]]
                                 [DEVICES [DEVICES ...]]
    ceph-volume lvm batch: error: Unable to proceed with non-existing device: /dev/nvme0n1 /dev/nvme1n1

So the dedicated device list is considered as a single string.

This commit also adds the journal_devices, block_db_devices and
wal_devices documentation to the ceph_volume module.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1816713
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 26 Mar 2020 21:39:09 +0000 (17:39 -0400)]

rhcs: drop debian support

Support for debian with RHCS has been dropped starting RHCS 4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 26 Mar 2020 18:41:14 +0000 (14:41 -0400)]

rhcs: update release to 5 for octopus

RHCS 5 will be based on Ceph Octopus release and only supported on
RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 26 Mar 2020 06:38:36 +0000 (07:38 +0100)]

defaults: remove legacy comment

This is no longer true, let's remove this comment given that this option
is not ignored in containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 25 Mar 2020 21:09:32 +0000 (22:09 +0100)]

defaults: change nfs_ganesha_stable_branch

In master, even though we are using dev repo, the value here should be closer
from the last stable released.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 24 Mar 2020 14:10:30 +0000 (10:10 -0400)]

mergify: Update with stable-5.0 branch

Add action to backport commits to stable-5.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 25 Mar 2020 18:35:55 +0000 (14:35 -0400)]

ceph-defaults: update ceph_stable_redhat_distro

Since octopus the ceph_stable_redhat_distro variable should be set to
el8 instead of el7.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 25 Mar 2020 12:11:16 +0000 (13:11 +0100)]

github: update issue report template

This commit updates the 'issue report template' to ask for full
ceph-ansible log.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 24 Mar 2020 14:04:42 +0000 (15:04 +0100)]

tests: remove some legacy in tox.ini

This commit removes some leftover in tox.ini

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 24 Mar 2020 21:11:44 +0000 (17:11 -0400)]

ceph-facts: fix rgw_instances_all fact

The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 24 Mar 2020 14:48:58 +0000 (15:48 +0100)]

doc/tests: bump to ansible 2.9 on master

Add testing against ansible 2.9 on master branch.
This commit also updates the documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 24 Mar 2020 18:28:51 +0000 (14:28 -0400)]

tests: update mgr dashboard socket listening test

Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 15 Jan 2020 17:48:10 +0000 (12:48 -0500)]

tests: register mark in pytest configuration

Unregister marks generates warnings like:

PytestUnknownMarkWarning: Unknown pytest.mark.docker - is this a typo?
You can register custom marks to avoid this warning

https://docs.pytest.org/en/latest/mark.html

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 12 Jul 2019 18:56:01 +0000 (14:56 -0400)]

tests: add dashboard testinfra configuration

This commit adds basic tests for grafana, prometheus, node-exporter and
ceph mgr dashboard services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 24 Mar 2020 08:56:45 +0000 (09:56 +0100)]

osd: add a default value for 'default' in crush_rules

Let's default to `False` for the `default` attribute in `crush_rules`
variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797774
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 23 Mar 2020 18:22:46 +0000 (14:22 -0400)]

Add pacific release

Add the 16th ceph release: pacific.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 23 Mar 2020 15:14:29 +0000 (16:14 +0100)]

facts: fix typo

This commit fixes a typo in some task titles

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 19 Mar 2020 19:44:20 +0000 (20:44 +0100)]

nfs: fix nfs with external ceph cluster support

This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]

dashboard: allow to set read-only admin user

This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 17 Mar 2020 00:45:03 +0000 (20:45 -0400)]

ceph-defaults: add registry name on dashboard vars

We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]

ceph-defaults: update grafana container tag

Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

petruha [Mon, 16 Mar 2020 16:35:20 +0000 (17:35 +0100)]

ceph-facts: Fix system_secret_key variable handling

This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>

commit | commitdiff | tree

Boris Ranto [Mon, 16 Mar 2020 16:08:03 +0000 (17:08 +0100)]

rhcs_edits: Update grafana version

We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 16 Mar 2020 08:55:20 +0000 (09:55 +0100)]

config: remove legacy option in ceph.conf.j2

This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 12 Mar 2020 16:06:55 +0000 (17:06 +0100)]

handler: add rgw multi-instances support

This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Mar 2020 10:05:01 +0000 (11:05 +0100)]

rgw: add multi-instances support when deploying multisite

This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]

ceph-infra: open radosgw ports for multi instances

When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 12 Mar 2020 11:22:02 +0000 (12:22 +0100)]

purge-container: clean legacy code

This commit removes a register which isn't used in this playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 11 Mar 2020 00:50:55 +0000 (20:50 -0400)]

update osd pool set size command

Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Mar 2020 13:07:24 +0000 (14:07 +0100)]

rgw: fix a typo in create_realm_zonegroup_zone_lists

This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Mar 2020 09:40:54 +0000 (10:40 +0100)]

infra: add retries/until on firewalld start task

This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Christian Berendt [Sat, 7 Mar 2020 23:11:45 +0000 (00:11 +0100)]

openstack_keys: use openstack_cinder_pool.name

Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>

commit | commitdiff | tree

Dimitri Savineau [Thu, 5 Mar 2020 14:52:56 +0000 (09:52 -0500)]

tests/requirements: bump testinfra

3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Nizamudeen [Fri, 6 Mar 2020 10:38:31 +0000 (16:08 +0530)]

doc: Fixed a minor typo in the document

In the Demo part of the Document, for both Vagrant and Bare metal the description was mentioned as "Deployment from scratch on bare metal machines".
Changed "bare metal" to "vagrant" for Vagrant section

Signed-off-by: Nizamudeen <nia@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 6 Mar 2020 07:06:37 +0000 (08:06 +0100)]

rgw: add retry/until on pools tasks

Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 5 Mar 2020 19:18:33 +0000 (14:18 -0500)]

filestore-to-bluestore: stop ceph-volume services

We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 4 Mar 2020 15:33:46 +0000 (16:33 +0100)]

client: skip create_users_keys.yml when rolling_update

There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]

rgw multisite: enable more than 1 realm per cluster

Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 18:01:27 +0000 (19:01 +0100)]

tests: add more osd nodes in all_daemons scenario

This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 14:06:40 +0000 (15:06 +0100)]

tests: update ooo job

This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 09:47:19 +0000 (10:47 +0100)]

osd: do not change pool size on erasure pool

This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 2 Mar 2020 16:05:18 +0000 (17:05 +0100)]

tests: add erasure pool creation test in CI

This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 28 Feb 2020 17:38:38 +0000 (18:38 +0100)]

tests: enable pg autoscaler on 1 pool

This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 28 Feb 2020 15:03:15 +0000 (16:03 +0100)]

osd: add pg autoscaler support

This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 28 Feb 2020 10:41:00 +0000 (11:41 +0100)]

osd: refact osd pool creation

Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 16:18:17 +0000 (17:18 +0100)]

tests: add lvm batch filestore testing

This commit adds an OSD node in lvm-batch scenario in order to test
filestore backend.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 15:56:33 +0000 (16:56 +0100)]

tests: increase journal_size value

Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).

Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
stdout: Physical volume "/dev/sdc" successfully created.
stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
--> RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```

[1] https://tracker.ceph.com/issues/41374

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Mar 2020 15:12:10 +0000 (16:12 +0100)]

library: fix bug in ceph_volume

This commit fixes a regression introduced by
0326d992c2b6b92a6d4b8ccf2a51d9343652da48.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]

shrink-rbdmirror: fix presence after removal

We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 26 Feb 2020 16:16:56 +0000 (11:16 -0500)]

shrink-mgr: fix systemd condition

This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]

tox: update shrink scenario configuration

The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]

shrink: don't use localhost node

The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 28 Feb 2020 14:42:44 +0000 (09:42 -0500)]

ceph-validate: add key format validation

If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.

Closes: #5104
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]

purge: stop rgw instances by iteration

It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 15 Nov 2019 15:37:27 +0000 (10:37 -0500)]

ceph-infra: install firewalld python bindings

When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 15 Nov 2019 15:11:33 +0000 (10:11 -0500)]

ceph-infra: split firewalld tasks

Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 1 Nov 2019 13:28:03 +0000 (09:28 -0400)]

Add ansible 2.9 support

This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 28 Feb 2020 18:59:22 +0000 (19:59 +0100)]

osd: add journal option in ceph_volume call (batch)

This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 27 Feb 2020 12:33:31 +0000 (13:33 +0100)]

requirements: enforce ansible version requirement

See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 21 Feb 2020 09:22:32 +0000 (10:22 +0100)]

common: support OSDs with more than 2 digits

When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 19 Feb 2020 17:30:14 +0000 (18:30 +0100)]

shrink-osd: support shrinking ceph-disk prepared osds

This commit adds the ceph-disk prepared osds support

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 19 Feb 2020 12:51:49 +0000 (13:51 +0100)]

shrink-osd: don't run ceph-facts entirely

We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 3 Feb 2020 20:03:17 +0000 (15:03 -0500)]

filestore-to-bluestore: reuse dedicated journal

If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.

When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 24 Feb 2020 18:58:38 +0000 (13:58 -0500)]

doc: update infra playbooks statements

We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 20 Feb 2020 14:49:17 +0000 (09:49 -0500)]

ceph-rgw: increase connection timeout to 10

5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Francesco Pantano [Wed, 12 Feb 2020 12:58:59 +0000 (13:58 +0100)]

Configure ceph dashboard backend and dashboard_frontend_vip

This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>

commit | commitdiff | tree

Benoît Knecht [Fri, 3 Jan 2020 09:38:20 +0000 (10:38 +0100)]

infrastructure-playbooks: Run shrink-osd tasks on monitor

Instead of running shring-osd tasks on localhost and delegating most of
them to the first monitor, run all of them on the first monitor
directly.

This has the added advantage of becoming root on the monitor only, not
on localhost.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

commit | commitdiff | tree

Dimitri Savineau [Mon, 17 Feb 2020 20:46:54 +0000 (15:46 -0500)]

ceph-dashboard: update create/get rgw user tasks

Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Sam Choraria [Tue, 3 Dec 2019 12:23:13 +0000 (12:23 +0000)]

ceph-rgw: allow SSL certificate content to supplied

Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>

commit | commitdiff | tree

Dimitri Savineau [Wed, 12 Feb 2020 19:34:30 +0000 (14:34 -0500)]

ceph-defaults: remove bootstrap_dirs_xxx vars

Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Ali Maredia [Tue, 10 Sep 2019 22:01:48 +0000 (22:01 +0000)]

rgw: extend automatic rgw pool creation capability

Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Florian Faltermeier [Wed, 18 Dec 2019 13:31:57 +0000 (14:31 +0100)]

ceph-rgw-loadbalancer: Fix SSL newline issue

The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.

[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.

Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>

commit | commitdiff | tree

Dimitri Savineau [Tue, 5 Nov 2019 16:32:06 +0000 (11:32 -0500)]

rgw: don't create user on secondary zones

The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.

Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 16 Jan 2020 21:58:35 +0000 (16:58 -0500)]

purge-cluster: update package list to remove

We only support python3 so renaming all ceph python packages.
Some ceph packages were missing from the list (ceph-mon, ceph-osd or
rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 3 Feb 2020 16:15:33 +0000 (11:15 -0500)]

Revert "vagrant: temp workaround for CentOS 8 cloud image"

The CentOS 8 vagrant image download is now fixed.

This reverts commit a5385e104884a3692954e4691f3348847a35c7fa.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

John Fulton [Thu, 6 Feb 2020 02:23:54 +0000 (21:23 -0500)]

The _filtered_clients list should intersect with ansible_play_batch

Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Fri, 7 Feb 2020 20:58:17 +0000 (15:58 -0500)]

tests: don't install s3cmd on containerized setup

The s3cmd package should only be installed on non containerized
deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 6 Feb 2020 21:31:39 +0000 (16:31 -0500)]

ceph-iscsi: don't use ceph_dev_xxx variables

Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi
repositories from shaman doesn't make sense because the ceph devel
branches and sha1 aren't compatible with ceph-iscsi devel.
Instead we could rely on the master branch and the latest sha1.
Currently it's not possible to using a custom ceph branch/sha1 value
with iscsi setup otherwise the repository setup will fail.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 10 Feb 2020 16:06:48 +0000 (11:06 -0500)]

ceph-nfs: fix ceph_nfs_ceph_user variable

The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 6 Feb 2020 20:41:46 +0000 (15:41 -0500)]

ceph-nfs: add nfs-ganesha-rados-urls package

Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Mon, 10 Feb 2020 18:43:31 +0000 (13:43 -0500)]

ceph-{mon,osd}: move default crush variables

Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Wed, 12 Feb 2020 15:38:25 +0000 (10:38 -0500)]

ceph-grafana: fix grafana_{crt,key} condition

The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter

Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Thu, 13 Feb 2020 20:56:23 +0000 (15:56 -0500)]

ceph-prometheus: add alertmanager HA config

When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

commit | commitdiff | tree

Dimitri Savineau [Tue, 11 Feb 2020 15:09:51 +0000 (10:09 -0500)]

containers: add KillMode=none to systemd templates

Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.