]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agoceph-facts: Fix for 'running_mon is undefined' error, so that
Vytenis Sabaliauskas [Thu, 23 Jan 2020 08:58:18 +0000 (10:58 +0200)]
ceph-facts: Fix for 'running_mon is undefined' error, so that
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'

Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
5 years agosite-container: don't skip ceph-container-common
Dimitri Savineau [Wed, 22 Jan 2020 19:45:38 +0000 (14:45 -0500)]
site-container: don't skip ceph-container-common

On HCI environment the OSD and Client nodes are collocated. Because we
aren't running the ceph-container-common role on the client nodes except
the first one (for keyring purpose) then the ceph-role execution fails
due to undefined variables.

Closes: #4970
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794195
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: support upgrading 3.x + ceph-metrics on a dedicated node
Guillaume Abrioux [Wed, 22 Jan 2020 14:00:01 +0000 (15:00 +0100)]
rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node

When upgrading from RHCS 3.x where ceph-metrics was deployed on a
dedicated node to RHCS 4.0, it fails like following:

```
fatal: [magna005]: FAILED! => changed=false
  gid: 0
  group: root
  mode: '0755'
  msg: 'chown failed: failed to look up user ceph'
  owner: root
  path: /etc/ceph
  secontext: unconfined_u:object_r:etc_t:s0
  size: 4096
  state: directory
  uid: 0
```

because we are trying to run `ceph-config` on this node, it doesn't make
sense so we should simply run this play on all groups except
`[grafana-server]`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: fix osd_auto_discovery
Dimitri Savineau [Tue, 21 Jan 2020 21:37:10 +0000 (16:37 -0500)]
filestore-to-bluestore: fix osd_auto_discovery

When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocommon: add a default value for ceph_directories_mode
Guillaume Abrioux [Tue, 21 Jan 2020 14:30:16 +0000 (15:30 +0100)]
common: add a default value for ceph_directories_mode

Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: --destroy with raw devices
Dimitri Savineau [Mon, 20 Jan 2020 21:40:58 +0000 (16:40 -0500)]
filestore-to-bluestore: --destroy with raw devices

We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.

Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
 stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  /dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: set container objectstore env variables
Dimitri Savineau [Mon, 20 Jan 2020 16:24:08 +0000 (11:24 -0500)]
ceph-osd: set container objectstore env variables

Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.

Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: Fix customize pool size "when" condition
Benoît Knecht [Mon, 20 Jan 2020 10:36:27 +0000 (11:36 +0100)]
ceph-rgw: Fix customize pool size "when" condition

In 3c31b19ab39f297635c84edb9e8a5de6c2da7707, I fixed the `customize pool
size` task by replacing `item.size` with `item.value.size`. However, I
missed the same issue in the `when` condition.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agohandler: fix call to container_exec_cmd in handler_osds
Guillaume Abrioux [Fri, 17 Jan 2020 14:50:40 +0000 (15:50 +0100)]
handler: fix call to container_exec_cmd in handler_osds

When unsetting the noup flag, we must call container_exec_cmd from the
delegated node (first mon member)
Also, adding a `run_once: true` because this task needs to be run only 1
time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoFix undefined running_mon
Dmitriy Rabotyagov [Thu, 16 Jan 2020 18:23:58 +0000 (20:23 +0200)]
Fix undefined running_mon

Since commit [1] running_mon introduced, it can be not defined
which results in fatal error [2]. This patch defines default value which
was used before patch [1]

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>
[1] https://github.com/ceph/ceph-ansible/commit/8dcbcecd713b0cd7769d3b4d04ef5c2f15881377
[2] https://zuul.opendev.org/t/openstack/build/c82a73aeabd64fd583694ed04b947731/log/job-output.txt#14011

5 years agoFix application for openstack_cephfs pools
Dmitriy Rabotyagov [Thu, 16 Jan 2020 17:41:06 +0000 (19:41 +0200)]
Fix application for openstack_cephfs pools

RBD is invalid application for cephfs pools, so it was change to cephfs.

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>
5 years agoceph-facts: move facts to defaults value
Dimitri Savineau [Thu, 16 Jan 2020 14:38:08 +0000 (09:38 -0500)]
ceph-facts: move facts to defaults value

There's no need to define a variable via a fact if we can do it via a
default value. Using a fact could be interesseting to override the
default value on some condition.

- ceph_uid could be set to 167 by default because it's only different on
non containerized deployment on Debian/Ubuntu.
- rbd_client_directory_{owner,group,mode} could be set to ceph,ceph,0770
by default install of null as we are doing in the facts.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agogroup_vars: remove useless files
Dimitri Savineau [Tue, 14 Jan 2020 19:08:17 +0000 (14:08 -0500)]
group_vars: remove useless files

Delete legacy files that aren't used anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainers: use --cpus instead --cpu-quota
Guillaume Abrioux [Fri, 10 Jan 2020 09:55:06 +0000 (10:55 +0100)]
containers: use --cpus instead --cpu-quota

When using docker 1.13.1, the current condition:

```
{% if (container_binary == 'docker' and ceph_docker_version.split('.')[0] is version_compare('13', '>=')) or container_binary == 'podman' -%}
```

is wrong because it compares the first digit (1) whereas it should
compare the second one.
It means we always use `--cpu-quota` although documentation recommend
using `--cpus` when docker version is 1.13.1 or higher.

From the doc:
> --cpu-quota=<value> Impose a CPU CFS quota on the container. The number of
> microseconds per --cpu-period that the container is limited to before
> throttled. As such acting as the effective ceiling.
> If you use Docker 1.13 or higher, use --cpus instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoremove container_exec_cmd_mgr fact
Guillaume Abrioux [Wed, 15 Jan 2020 13:39:16 +0000 (14:39 +0100)]
remove container_exec_cmd_mgr fact

Iterating over all monitors in order to delegate a `
{{ container_binary }}` fails when collocating mgrs with mons, because
ceph-facts reset `container_exec_cmd` to point to the first member of
the monitor group.

The idea is to force `container_exec_cmd` to be reset in ceph-mgr.
This commit also removes the `container_exec_cmd_mgr` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1791282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotox: use vagrant_up.sh instead of vagrant up
Dimitri Savineau [Wed, 15 Jan 2020 16:02:37 +0000 (11:02 -0500)]
tox: use vagrant_up.sh instead of vagrant up

We should use the same vagrant wrapper everywhere instead of the vagrant
command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agovagrant: temp workaround for CentOS 8 cloud image
Dimitri Savineau [Wed, 15 Jan 2020 15:35:28 +0000 (10:35 -0500)]
vagrant: temp workaround for CentOS 8 cloud image

The CentOS cloud infrastructure storing the vagrant CentOS 8 image
changed the directory path and remove the old 8.0 image so the vagrant
box add centos/8 fails returning a 404 http error.
As a workaround we can pull the image from CentOS instead of letting
vagrant doing the resolution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodrop use_fqdn variables
Dimitri Savineau [Tue, 14 Jan 2020 16:54:36 +0000 (11:54 -0500)]
drop use_fqdn variables

This has been deprecated in the previous releases. Let's drop it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotravis: drop python2 support
Dimitri Savineau [Tue, 14 Jan 2020 16:48:50 +0000 (11:48 -0500)]
travis: drop python2 support

Since python2 is EOL we can drop it from travis CI matrix.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink-mds: fix condition on fs deletion
Guillaume Abrioux [Wed, 15 Jan 2020 06:17:08 +0000 (07:17 +0100)]
shrink-mds: fix condition on fs deletion

the new ceph status registered in `ceph_status` will report `fsmap.up` =
0 when it's the last mds given that it's done after we shrink the mds,
it means the condition is wrong. Also adding a condition so we don't try
to delete the fs if a standby node is going to rejoin the cluster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-iscsi: don't use bracket with trusted_ip_list
Dimitri Savineau [Tue, 14 Jan 2020 15:07:56 +0000 (10:07 -0500)]
ceph-iscsi: don't use bracket with trusted_ip_list

The trusted_ip_list parameter for the rbd-target-api service doesn't
support ipv6 address with bracket.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: use _devices fact in lvm batch scenario
Guillaume Abrioux [Tue, 14 Jan 2020 08:42:43 +0000 (09:42 +0100)]
osd: use _devices fact in lvm batch scenario

since fd1718f3796312e29cd5fd64fcc46826741303d2, we must use `_devices`
when deploying with lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: remove legacy
Guillaume Abrioux [Mon, 13 Jan 2020 15:39:31 +0000 (16:39 +0100)]
update: remove legacy

This task is a code duplicate, probably a legacy, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: fix osp/ceph external use case
Guillaume Abrioux [Mon, 13 Jan 2020 14:30:13 +0000 (15:30 +0100)]
facts: fix osp/ceph external use case

d6da508a9b6829d2d0633c7200efdffce14f403f broke the osp/ceph external use case.

We must skip these tasks when no monitor is present in the inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790508
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: move grafana fact to dedicated file
Dimitri Savineau [Mon, 13 Jan 2020 15:24:52 +0000 (10:24 -0500)]
ceph-facts: move grafana fact to dedicated file

We don't need to executed the grafana fact everytime but only during
the dashboard deployment.
Especially for ceph-grafana, ceph-prometheus and ceph-dashboard roles.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790303
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: ensure osd ids collected are well restarted
Guillaume Abrioux [Mon, 13 Jan 2020 15:31:00 +0000 (16:31 +0100)]
osd: ensure osd ids collected are well restarted

This commit refact the condition in the loop of that task so all
potential osd ids found are well started.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790212
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: do not run openstack_config during upgrade
Guillaume Abrioux [Fri, 8 Nov 2019 15:21:54 +0000 (16:21 +0100)]
osd: do not run openstack_config during upgrade

There is no need to run this part of the playbook when upgrading the
cluter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: use main playbook for add_osds job
Guillaume Abrioux [Fri, 8 Nov 2019 08:53:58 +0000 (09:53 +0100)]
tests: use main playbook for add_osds job

This commit replaces the playbook used for add_osds job given
accordingly to the add-osd.yml playbook removal

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: support scaling up using --limit
Guillaume Abrioux [Tue, 9 Jul 2019 13:40:47 +0000 (15:40 +0200)]
osd: support scaling up using --limit

This commit lets add-osd.yml in place but mark the deprecation of the
playbook.
Scaling up OSDs is now possible using --limit

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests/setup: update mount options on EL 8
Dimitri Savineau [Fri, 10 Jan 2020 20:30:58 +0000 (15:30 -0500)]
tests/setup: update mount options on EL 8

The nobarrier mount flag doesn't exist anymoer on XFS in the EL 8
kernel. That's why the task wasn't working on those systems.
We can still use the other options instead of skipping the task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-validate: fail on CentOS 7
Dimitri Savineau [Fri, 10 Jan 2020 16:25:54 +0000 (11:25 -0500)]
ceph-validate: fail on CentOS 7

The Ceph Octopus release is only supported on CentOS 8

Closes: #4918
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: add a docker2podman scenario
Guillaume Abrioux [Fri, 10 Jan 2020 13:31:42 +0000 (14:31 +0100)]
tests: add a docker2podman scenario

This commit adds a new scenario in order to test docker-to-podman.yml
migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: use set_fact to override variables
Guillaume Abrioux [Fri, 10 Jan 2020 13:30:35 +0000 (14:30 +0100)]
docker2podman: use set_fact to override variables

play vars have lower precedence than role vars and `set_fact`.
We must use a `set_fact` to reset these variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: force systemd to reload config
Guillaume Abrioux [Fri, 10 Jan 2020 13:29:50 +0000 (14:29 +0100)]
docker2podman: force systemd to reload config

This is needed after a change is made in systemd unit files.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: install podman
Guillaume Abrioux [Fri, 10 Jan 2020 10:17:27 +0000 (11:17 +0100)]
docker2podman: install podman

This commit adds a package installation task in order to install podman
during the docker-to-podman.yml migration playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge-iscsi-gateways: don't run all ceph-facts
Dimitri Savineau [Fri, 10 Jan 2020 14:31:26 +0000 (09:31 -0500)]
purge-iscsi-gateways: don't run all ceph-facts

We only need to have the container_binary fact. Because we're not
gathering the facts from all nodes then the purge fails trying to get
one of the grafana fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoconfig: exclude ceph-disk prepared osds in lvm batch report
Guillaume Abrioux [Thu, 9 Jan 2020 18:31:57 +0000 (19:31 +0100)]
config: exclude ceph-disk prepared osds in lvm batch report

We must exclude the devices already used and prepared by ceph-disk when
doing the lvm batch report. Otherwise it fails because ceph-volume
complains about GPT header.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786682
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorolling_update: run registry auth before upgrading
Dimitri Savineau [Thu, 9 Jan 2020 19:57:08 +0000 (14:57 -0500)]
rolling_update: run registry auth before upgrading

There's some tasks using the new container image during the rolling
upgrade playbook that needs to execute the registry login first otherwise
the nodes won't be able to pull the container image.

Unable to find image 'xxx.io/foo/bar:latest' locally
Trying to pull repository xxx.io/foo/bar ...
/usr/bin/docker-current: Get https://xxx.io/v2/foo/bar/manifests/latest:
unauthorized

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink-rgw: refact global workflow
Dimitri Savineau [Thu, 9 Jan 2020 16:48:13 +0000 (11:48 -0500)]
shrink-rgw: refact global workflow

Instead of running the ceph roles against localhost we should do it
on the first mon.
The ansible and inventory hostname of the rgw nodes could be different.
Ensure that the rgw instance to remove is present in the cluster.
Fix rgw service and directory path.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agomon: support replacing a mon
Guillaume Abrioux [Thu, 9 Jan 2020 15:46:34 +0000 (16:46 +0100)]
mon: support replacing a mon

We must pick up a mon which actually exists in ceph-facts in order to
detect if a cluster is running. Otherwise, it will state no cluster is
already running which will end up deploying a new monitor isolated in a
new quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agohandler: fix bug
Guillaume Abrioux [Thu, 19 Dec 2019 10:29:41 +0000 (11:29 +0100)]
handler: fix bug

411bd07d54fc3f585296b68f2fd04484328399b5 introduced a bug in handlers

using `handler_*_status` instead of `hostvars[item]['handler_*_status']`
causes handlers to be triggered in anycase even though
`handler_*_status` was set to `False` on a specific node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-rgw: Fix custom pool size setting
Benoît Knecht [Mon, 30 Dec 2019 09:53:20 +0000 (10:53 +0100)]
ceph-rgw: Fix custom pool size setting

RadosGW pools can be created by setting

```yaml
rgw_create_pools:
  .rgw.root:
    pg_num: 512
    size: 2
```

for instance. However, doing so would create pools of size
`osd_pool_default_size` regardless of the `size` value. This was due to
the fact that the Ansible task used

```
{{ item.size | default(osd_pool_default_size) }}
```

as the pool size value, but `item.size` is always undefined; the
correct variable is `item.value.size`.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-iscsi: manage ipv6 in trusted_ip_list
Dimitri Savineau [Tue, 7 Jan 2020 20:01:48 +0000 (15:01 -0500)]
ceph-iscsi: manage ipv6 in trusted_ip_list

Only the ipv4 addresses from the nodes running the dashboard mgr module
were added to the trusted_ip_list configuration file on the iscsigws
nodes.
This also add the iscsi gateways with ipv6 configuration to the ceph
dashboard.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787531
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink-mds: do not play ceph-facts entirely
Guillaume Abrioux [Wed, 8 Jan 2020 15:10:17 +0000 (16:10 +0100)]
shrink-mds: do not play ceph-facts entirely

We only need to set `container_binary`.
Let's use `tasks_from` option.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-mds: use fact from delegated node
Guillaume Abrioux [Wed, 8 Jan 2020 14:02:24 +0000 (15:02 +0100)]
shrink-mds: use fact from delegated node

The command is delegated on the first monitor so we must use the fact
`container_binary` from this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: use correct python interpreter
Guillaume Abrioux [Wed, 8 Jan 2020 13:14:41 +0000 (14:14 +0100)]
facts: use correct python interpreter

that task is delegated on the first mon so we should always use the
`discovered_interpreter_python` from that node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-mds: fix filesystem removal task
Guillaume Abrioux [Fri, 3 Jan 2020 15:02:48 +0000 (16:02 +0100)]
shrink-mds: fix filesystem removal task

This commit deletes the filesystem when no more MDS is present after
shrinking operation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1787543
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-mds: ensure max_mds is always honored
Guillaume Abrioux [Fri, 3 Jan 2020 14:56:43 +0000 (15:56 +0100)]
shrink-mds: ensure max_mds is always honored

This commit prevent from shrinking an mds node when max_mds wouldn't be
honored after that operation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: use fqdn in external url
Guillaume Abrioux [Thu, 2 Jan 2020 17:09:38 +0000 (18:09 +0100)]
dashboard: use fqdn in external url

Force fqdn to be used in external url for prometheus and alertmanager.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoRevert "nfs: do not run privileged nfs container"
Guillaume Abrioux [Wed, 18 Dec 2019 15:14:21 +0000 (16:14 +0100)]
Revert "nfs: do not run privileged nfs container"

This reverts commit d06158e9d9ab4a706ca72a4940e7acb5fc25697d.

Otherwise ganesha consumers can't dynamically update exports using dbus.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784562
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge-iscsi-gateways: remove node from dashboard
Dimitri Savineau [Mon, 6 Jan 2020 20:22:51 +0000 (15:22 -0500)]
purge-iscsi-gateways: remove node from dashboard

When using the ceph dashboard with iscsi gateways nodes we also need to
remove the nodes from the ceph dashboard list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786686
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_volume: support filestore to bluestore migration
Guillaume Abrioux [Tue, 7 Jan 2020 15:29:48 +0000 (16:29 +0100)]
ceph_volume: support filestore to bluestore migration

This commit adds the filestore to bluestore migration support in
ceph_volume module.

We must append to the executed command only the relevant options
according to what is passed in `osd_objectostore`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge-container-cluster: prune exited containers 4884/head
Dimitri Savineau [Tue, 7 Jan 2020 20:30:16 +0000 (15:30 -0500)]
purge-container-cluster: prune exited containers

Remove all stopped/exited containers.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-iscsi: remove python rtslib shaman repository
Dimitri Savineau [Tue, 7 Jan 2020 15:18:28 +0000 (10:18 -0500)]
ceph-iscsi: remove python rtslib shaman repository

The rtslib python library is now available in the distribution so we
shouldn't have to use the shaman repository

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: disable nfs testing
Guillaume Abrioux [Tue, 7 Jan 2020 14:40:50 +0000 (15:40 +0100)]
tests: disable nfs testing

nfs-ganesha makes the CI failing because of issue related to SELinux.

See:
- https://bugzilla.redhat.com/show_bug.cgi?id=1788563
- https://github.com/nfs-ganesha/nfs-ganesha/issues/527

Until we can get this fixed, let's disable nfs-ganesha testing
temporarily.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: upgrade from octopus to octopus
Guillaume Abrioux [Mon, 6 Jan 2020 16:31:46 +0000 (17:31 +0100)]
tests: upgrade from octopus to octopus

on master we can't test upgrade from stable-4.0/CentOS 7 to
master/CentOS 8.

This commit refact the upgrade so we test upgrade from master/CentOS 8
to master/CentOS 8 (octopus to octopus)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests/functional: change docker to podman
Dimitri Savineau [Mon, 6 Jan 2020 16:14:22 +0000 (11:14 -0500)]
tests/functional: change docker to podman

Some docker commands were hardcoded in tests playbooks and some
conditions were not taking care of the containerized_deployment
variable but only the atomic fact.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: add ganesha_t type to selinux
Dimitri Savineau [Mon, 6 Jan 2020 14:09:42 +0000 (09:09 -0500)]
ceph-nfs: add ganesha_t type to selinux

Since RHEL 8.1 we need to add the ganesha_t type to the permissive
SELinux list.
Otherwise the nfs-ganesha service won't start.
This was done on RHEL 7 previously and part of the nfs-ganesha-selinux
package on RHEL 8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786110
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainer: move lvm2 package installation
Dimitri Savineau [Thu, 2 Jan 2020 20:50:24 +0000 (15:50 -0500)]
container: move lvm2 package installation

Before this patch, the lvm2 package installation was done during the
ceph-osd role.
However we were running ceph-volume command in the ceph-config role
before ceph-osd. If lvm2 wasn't installed then the ceph-volume command
fails:

error checking path "/run/lock/lvm": stat /run/lock/lvm: no such file or
directory

This wasn't visible before because lvm2 was automatically installed as
docker dependency but it's not the same for podman on CentOS 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: change ganesha CentOS repository
Dimitri Savineau [Thu, 19 Dec 2019 17:03:57 +0000 (12:03 -0500)]
ceph-nfs: change ganesha CentOS repository

Since we don't have nfs-ganesha builds available on CentOS 8 at the
moment on shaman then we can use the alternative repository at [1]

[1] https://download.nfs-ganesha.org/3/LATEST/CentOS

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocommon: add centos8 support
Guillaume Abrioux [Mon, 18 Nov 2019 20:05:16 +0000 (15:05 -0500)]
common: add centos8 support

Ceph octopus only supports CentOS 8.

This commit adds CentOS 8 support:
  - update vagrant image in tox configurations.
  - add CentOS 8 repository for el8 dependencies.
  - CentOS 8 container engine is podman (same than RHEL 8).
  - don't use the epel mirror on sepia because it's epel7 only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw-loadbalancer: Modify keepalived master selection
Stanley Lam [Tue, 3 Dec 2019 18:51:22 +0000 (10:51 -0800)]
ceph-rgw-loadbalancer: Modify keepalived master selection

Currently the keepalived template only works when system hostnames exactly match the Ansible inventory name. If these are different, all generated templates become BACKUP without a MASTER assigned. Using the inventory_hostname in the template file resolves this issue.

Signed-off-by: Stanley Lam stanleylam_604@hotmail.com
5 years agofilestore-to-bluestore: umount partitions before zapping them
Guillaume Abrioux [Wed, 18 Dec 2019 14:48:32 +0000 (15:48 +0100)]
filestore-to-bluestore: umount partitions before zapping them

When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-infra: replace hardcoded grafana group name
Dimitri Savineau [Mon, 16 Dec 2019 16:03:21 +0000 (11:03 -0500)]
ceph-infra: replace hardcoded grafana group name

The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: move dashboard into a dedicated file
Dimitri Savineau [Mon, 16 Dec 2019 16:00:35 +0000 (11:00 -0500)]
ceph-infra: move dashboard into a dedicated file

Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: open dashboard port on monitor
Dimitri Savineau [Mon, 16 Dec 2019 15:48:26 +0000 (10:48 -0500)]
ceph-infra: open dashboard port on monitor

When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: regenerate group_vars samples
Dimitri Savineau [Mon, 16 Dec 2019 20:19:35 +0000 (15:19 -0500)]
ceph-defaults: regenerate group_vars samples

In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: exclude rbd devices from discovery
Dimitri Savineau [Mon, 16 Dec 2019 20:12:47 +0000 (15:12 -0500)]
ceph-defaults: exclude rbd devices from discovery

The RBD devices aren't excluded from the devices list in the LVM auto
discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: change monitor|radosgw_address default values
Guillaume Abrioux [Mon, 9 Dec 2019 17:23:15 +0000 (18:23 +0100)]
defaults: change monitor|radosgw_address default values

To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.

Fixes: #4827
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: ensure all dm are closed
Guillaume Abrioux [Tue, 10 Dec 2019 22:04:57 +0000 (23:04 +0100)]
filestore-to-bluestore: ensure all dm are closed

This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: force OSDs to be marked down
Guillaume Abrioux [Tue, 10 Dec 2019 22:03:40 +0000 (23:03 +0100)]
filestore-to-bluestore: force OSDs to be marked down

Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: do not use --destroy
Guillaume Abrioux [Tue, 10 Dec 2019 14:59:50 +0000 (15:59 +0100)]
filestore-to-bluestore: do not use --destroy

Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_volume: add destroy option support
Guillaume Abrioux [Tue, 10 Dec 2019 14:57:42 +0000 (15:57 +0100)]
ceph_volume: add destroy option support

The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: add non containerized support
Guillaume Abrioux [Tue, 10 Dec 2019 10:07:30 +0000 (11:07 +0100)]
filestore-to-bluestore: add non containerized support

This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add filestore_to_bluestore job
Guillaume Abrioux [Tue, 10 Dec 2019 13:37:47 +0000 (14:37 +0100)]
tests: add filestore_to_bluestore job

This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd comment on auto-SSL cert generation
Philip Brown [Mon, 9 Dec 2019 18:20:16 +0000 (18:20 +0000)]
Add comment on auto-SSL cert generation

Fixes: #4830
Signed-off-by: Philip Brown <phil@bolthole.com>
5 years agoceph-facts: set use_new_ceph_iscsi on iscsi nodes
Dimitri Savineau [Tue, 10 Dec 2019 21:35:34 +0000 (16:35 -0500)]
ceph-facts: set use_new_ceph_iscsi on iscsi nodes

We don't need to set the use_new_ceph_iscsi fact on other nodes than
those present in the iscsigws group.
Also remove the duplicate iscsi_gw_group_name condition already present
on the include_task.
Finally validate the ansible distribution as the first task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: fix a typo
Guillaume Abrioux [Tue, 10 Dec 2019 14:24:39 +0000 (15:24 +0100)]
defaults: fix a typo

s/above/below

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoansible.cfg: do not enforce PreferredAuthentications
Guillaume Abrioux [Mon, 9 Dec 2019 16:10:11 +0000 (17:10 +0100)]
ansible.cfg: do not enforce PreferredAuthentications

There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: add a comment
Guillaume Abrioux [Mon, 9 Dec 2019 17:31:52 +0000 (18:31 +0100)]
defaults: add a comment

This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.

Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: support fqdn in inventory
Guillaume Abrioux [Mon, 9 Dec 2019 14:52:26 +0000 (15:52 +0100)]
shrink-osd: support fqdn in inventory

When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoswitch_to_containers: exclude clients nodes from facts gathering
Guillaume Abrioux [Mon, 9 Dec 2019 13:20:42 +0000 (14:20 +0100)]
switch_to_containers: exclude clients nodes from facts gathering

just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: run node_export as privileged container
Guillaume Abrioux [Tue, 3 Dec 2019 13:39:53 +0000 (14:39 +0100)]
dashboard: run node_export as privileged container

Typical error:

```
type=AVC msg=audit(1575367499.582:3210): avc:  denied  { search } for  pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```

node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-validate: start with ansible version test
Dimitri Savineau [Fri, 6 Dec 2019 21:11:51 +0000 (16:11 -0500)]
ceph-validate: start with ansible version test

It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: move ntp/chrony facts to ceph-infra
Dimitri Savineau [Wed, 4 Dec 2019 22:14:54 +0000 (17:14 -0500)]
ceph-facts: move ntp/chrony facts to ceph-infra

The ntp/chrony facts are only used in the ceph-infra role so we don't
really need to set them in the ceph-facts roles.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: change default value for dashboard_admin_password
Guillaume Abrioux [Thu, 5 Dec 2019 14:21:41 +0000 (15:21 +0100)]
defaults: change default value for dashboard_admin_password

A recent change in ceph/ceph prevent from having username in the
password:

`Error EINVAL: Password cannot contain username.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: restart iscsigws daemons after upgrade
Guillaume Abrioux [Thu, 5 Dec 2019 10:06:06 +0000 (11:06 +0100)]
update: restart iscsigws daemons after upgrade

In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupgrade: add dashboard deployment
Guillaume Abrioux [Wed, 4 Dec 2019 16:17:36 +0000 (17:17 +0100)]
upgrade: add dashboard deployment

when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-defaults: exclude md devices from discovery
Dimitri Savineau [Wed, 4 Dec 2019 17:32:49 +0000 (12:32 -0500)]
ceph-defaults: exclude md devices from discovery

The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-cluster: add podman support
Dimitri Savineau [Wed, 4 Dec 2019 15:10:08 +0000 (10:10 -0500)]
purge-cluster: add podman support

The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: reduce max_mds from 3 to 2
Dimitri Savineau [Wed, 4 Dec 2019 17:12:05 +0000 (12:12 -0500)]
tests: reduce max_mds from 3 to 2

Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: fix symlink to purge-container-cluster
Guillaume Abrioux [Wed, 4 Dec 2019 08:34:39 +0000 (09:34 +0100)]
purge: fix symlink to purge-container-cluster

ceph/ceph-ansible#4805 introduced a symlink to
purge-container-cluster.yml playbook which is broken.

This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: rename playbook (container)
Guillaume Abrioux [Tue, 3 Dec 2019 14:48:59 +0000 (15:48 +0100)]
purge: rename playbook (container)

Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: use fqdn url for active alert
Guillaume Abrioux [Mon, 2 Dec 2019 13:31:41 +0000 (14:31 +0100)]
dashboard: use fqdn url for active alert

When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.

This commit changes the template in order to use the fqdn.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: do not try to stop docker when binary is podman
Guillaume Abrioux [Tue, 26 Nov 2019 15:18:28 +0000 (16:18 +0100)]
purge: do not try to stop docker when binary is podman

If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: isolate container_binary facts
Guillaume Abrioux [Tue, 26 Nov 2019 15:10:17 +0000 (16:10 +0100)]
facts: isolate container_binary facts

in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: remove docker_* task
Guillaume Abrioux [Tue, 26 Nov 2019 14:26:35 +0000 (15:26 +0100)]
purge: remove docker_* task

All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd option for HAproxy to act a SSL frontend termination point for loadbalanced RGW...
Stanley Lam [Thu, 21 Nov 2019 22:40:51 +0000 (14:40 -0800)]
Add option for HAproxy to act a SSL frontend termination point for loadbalanced RGW instances.

Signed-off-by: Stanley Lam <stanleylam_604@hotmail.com>
5 years agodocker2podman: import ceph-handler role
Guillaume Abrioux [Mon, 2 Dec 2019 08:47:21 +0000 (09:47 +0100)]
docker2podman: import ceph-handler role

This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: do not hardcode group name
Guillaume Abrioux [Thu, 28 Nov 2019 14:12:59 +0000 (15:12 +0100)]
docker2podman: do not hardcode group name

let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>