]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agofilestore-to-bluestore: umount partitions before zapping them
Guillaume Abrioux [Wed, 18 Dec 2019 14:48:32 +0000 (15:48 +0100)]
filestore-to-bluestore: umount partitions before zapping them

When an OSD is stopped, it leaves partitions mounted.
We must umount them before zapping them, otherwise error like "Device is
busy" will show up.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-infra: replace hardcoded grafana group name
Dimitri Savineau [Mon, 16 Dec 2019 16:03:21 +0000 (11:03 -0500)]
ceph-infra: replace hardcoded grafana group name

The grafana-server group name was hardcoded for the grafana/prometheus
firewalld tasks condition.
We should we the associated variable : grafana_server_group_name

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: move dashboard into a dedicated file
Dimitri Savineau [Mon, 16 Dec 2019 16:00:35 +0000 (11:00 -0500)]
ceph-infra: move dashboard into a dedicated file

Instead of using multiple dashboard_enabled condition in the
configure_firewall file we could just have the condition once
and include the dedicated tasks list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: open dashboard port on monitor
Dimitri Savineau [Mon, 16 Dec 2019 15:48:26 +0000 (10:48 -0500)]
ceph-infra: open dashboard port on monitor

When there's no mgr group defined in the ansible inventory then the
mgrs are deployed implicitly on the mons nodes.
If the dashboard is enabled then we need to open the dashboard port on
the node that is running the ceph mgr process (mgr or mon).
The current code only allow to open that port on the mgr nodes when they
are present explicitly in the inventory but not implicitly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783520
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: regenerate group_vars samples
Dimitri Savineau [Mon, 16 Dec 2019 20:19:35 +0000 (15:19 -0500)]
ceph-defaults: regenerate group_vars samples

In fc02fc9 the group_vars samples have been generated but only for
monitor_address variable not radosgw_address.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: exclude rbd devices from discovery
Dimitri Savineau [Mon, 16 Dec 2019 20:12:47 +0000 (15:12 -0500)]
ceph-defaults: exclude rbd devices from discovery

The RBD devices aren't excluded from the devices list in the LVM auto
discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783908
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: change monitor|radosgw_address default values
Guillaume Abrioux [Mon, 9 Dec 2019 17:23:15 +0000 (18:23 +0100)]
defaults: change monitor|radosgw_address default values

To avoid confusion, let's change the default value from `0.0.0.0` to
`x.x.x.x`.
Users might think setting `0.0.0.0` will make the daemon binding on all
interfaces.

Fixes: #4827
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: ensure all dm are closed
Guillaume Abrioux [Tue, 10 Dec 2019 22:04:57 +0000 (23:04 +0100)]
filestore-to-bluestore: ensure all dm are closed

This commit adds a task to ensure device mappers are well closed when
lvm batch scenario is used.
Otherwise, OSDs can't be redeployed given that devices that are rejected
by ceph-volume because they are locked.

Adding a condition `devices | default([]) | length > 0` to remove these
dm only when using lvm batch scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: force OSDs to be marked down
Guillaume Abrioux [Tue, 10 Dec 2019 22:03:40 +0000 (23:03 +0100)]
filestore-to-bluestore: force OSDs to be marked down

Otherwise, sometimes it can take a while for an OSD to be seen as down
and causes the `ceph osd purge` command to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: do not use --destroy
Guillaume Abrioux [Tue, 10 Dec 2019 14:59:50 +0000 (15:59 +0100)]
filestore-to-bluestore: do not use --destroy

Do not use `--destroy` when zapping a device.
Otherwise, it destroys VGs while they are still needed to redeploy the
OSDs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_volume: add destroy option support
Guillaume Abrioux [Tue, 10 Dec 2019 14:57:42 +0000 (15:57 +0100)]
ceph_volume: add destroy option support

The zap action from ceph_volume module always implies `--destroy`.
This commit adds the destroy option support so we can ask ceph-volume to
not use `--destroy` when zapping a device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: add non containerized support
Guillaume Abrioux [Tue, 10 Dec 2019 10:07:30 +0000 (11:07 +0100)]
filestore-to-bluestore: add non containerized support

This commit adds the non containerized context support to the
filestore-to-bluestore.yml infrastructure playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add filestore_to_bluestore job
Guillaume Abrioux [Tue, 10 Dec 2019 13:37:47 +0000 (14:37 +0100)]
tests: add filestore_to_bluestore job

This commit adds a new job in order to test the
filestore-to-bluestore.yml infrastructure playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd comment on auto-SSL cert generation
Philip Brown [Mon, 9 Dec 2019 18:20:16 +0000 (18:20 +0000)]
Add comment on auto-SSL cert generation

Fixes: #4830
Signed-off-by: Philip Brown <phil@bolthole.com>
5 years agoceph-facts: set use_new_ceph_iscsi on iscsi nodes
Dimitri Savineau [Tue, 10 Dec 2019 21:35:34 +0000 (16:35 -0500)]
ceph-facts: set use_new_ceph_iscsi on iscsi nodes

We don't need to set the use_new_ceph_iscsi fact on other nodes than
those present in the iscsigws group.
Also remove the duplicate iscsi_gw_group_name condition already present
on the include_task.
Finally validate the ansible distribution as the first task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: fix a typo
Guillaume Abrioux [Tue, 10 Dec 2019 14:24:39 +0000 (15:24 +0100)]
defaults: fix a typo

s/above/below

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoansible.cfg: do not enforce PreferredAuthentications
Guillaume Abrioux [Mon, 9 Dec 2019 16:10:11 +0000 (17:10 +0100)]
ansible.cfg: do not enforce PreferredAuthentications

There's no need to enforce PreferredAuthentications by default.
Users can still choose to override the ansible.cfg with any additional
parameter like this one to fit their infrastructure.

Fixes: #4826
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: add a comment
Guillaume Abrioux [Mon, 9 Dec 2019 17:31:52 +0000 (18:31 +0100)]
defaults: add a comment

This commit isolates and adds an explicit comment about variables not
intended to be modified by the user.

Fixes: #4828
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: support fqdn in inventory
Guillaume Abrioux [Mon, 9 Dec 2019 14:52:26 +0000 (15:52 +0100)]
shrink-osd: support fqdn in inventory

When using fqdn in inventory, that playbook fails because of some tasks
using the result of ceph osd tree (which returns shortname) to get
some datas in hostvars[].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779021
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoswitch_to_containers: exclude clients nodes from facts gathering
Guillaume Abrioux [Mon, 9 Dec 2019 13:20:42 +0000 (14:20 +0100)]
switch_to_containers: exclude clients nodes from facts gathering

just like site.yml and rolling_update, let's exclude clients node from
the fact gathering.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: run node_export as privileged container
Guillaume Abrioux [Tue, 3 Dec 2019 13:39:53 +0000 (14:39 +0100)]
dashboard: run node_export as privileged container

Typical error:

```
type=AVC msg=audit(1575367499.582:3210): avc:  denied  { search } for  pid=26680 comm="node_exporter" name="1" dev="proc" ino=11528 scontext=system_u:system_r:container_t:s0:c100,c1014 tcontext=system_u:system_r:init_t:s0 tclass=dir permissive=0
```

node_exporter needs to be run as privileged to avoid avc denied error
since it gathers lot of information on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762168
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-validate: start with ansible version test
Dimitri Savineau [Fri, 6 Dec 2019 21:11:51 +0000 (16:11 -0500)]
ceph-validate: start with ansible version test

It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: move ntp/chrony facts to ceph-infra
Dimitri Savineau [Wed, 4 Dec 2019 22:14:54 +0000 (17:14 -0500)]
ceph-facts: move ntp/chrony facts to ceph-infra

The ntp/chrony facts are only used in the ceph-infra role so we don't
really need to set them in the ceph-facts roles.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodefaults: change default value for dashboard_admin_password
Guillaume Abrioux [Thu, 5 Dec 2019 14:21:41 +0000 (15:21 +0100)]
defaults: change default value for dashboard_admin_password

A recent change in ceph/ceph prevent from having username in the
password:

`Error EINVAL: Password cannot contain username.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: restart iscsigws daemons after upgrade
Guillaume Abrioux [Thu, 5 Dec 2019 10:06:06 +0000 (11:06 +0100)]
update: restart iscsigws daemons after upgrade

In containerized context, containers aren't stopped early in the
sequence.
It means they aren't restarted after the upgrade because the task is
just checking the daemon status is started (eg: `state: started`).

This commit also removes the task which ensure services are started
because it's already done in the role ceph-iscsigw.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupgrade: add dashboard deployment
Guillaume Abrioux [Wed, 4 Dec 2019 16:17:36 +0000 (17:17 +0100)]
upgrade: add dashboard deployment

when upgrading from RHCS 3, dashboard has obviously never been deployed
and it forces us to deploy it later manually.
This commit adds the dashboard deployment as part of the upgrade to
RHCS 4.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1779092
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-defaults: exclude md devices from discovery
Dimitri Savineau [Wed, 4 Dec 2019 17:32:49 +0000 (12:32 -0500)]
ceph-defaults: exclude md devices from discovery

The md devices (RAID software) aren't excluded from the devices list in
the auto discovery scenario.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764601
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-cluster: add podman support
Dimitri Savineau [Wed, 4 Dec 2019 15:10:08 +0000 (10:10 -0500)]
purge-cluster: add podman support

The podman support was added to the purge-container-cluster playbook but
containers are always used for the dashboard even on non containerized
deployment.
This commits adds the podman support on purging the dashboard resources
in the purge-cluster playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: reduce max_mds from 3 to 2
Dimitri Savineau [Wed, 4 Dec 2019 17:12:05 +0000 (12:12 -0500)]
tests: reduce max_mds from 3 to 2

Having max_mds value equals to the number of mds nodes generates a
warning in the ceph cluster status:

cluster:
id:     6d3e49a4-ab4d-4e03-a7d6-58913b8ec00a'
health: HEALTH_WARN'
        insufficient standby MDS daemons available'
(...)
services:
  mds:     cephfs:3 {0=mds1=up:active,1=mds0=up:active,2=mds2=up:active}'

Let's use 2 active and 1 standby mds.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: fix symlink to purge-container-cluster
Guillaume Abrioux [Wed, 4 Dec 2019 08:34:39 +0000 (09:34 +0100)]
purge: fix symlink to purge-container-cluster

ceph/ceph-ansible#4805 introduced a symlink to
purge-container-cluster.yml playbook which is broken.

This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: rename playbook (container)
Guillaume Abrioux [Tue, 3 Dec 2019 14:48:59 +0000 (15:48 +0100)]
purge: rename playbook (container)

Since we now support podman, let's rename the playbook so it's more
generic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: use fqdn url for active alert
Guillaume Abrioux [Mon, 2 Dec 2019 13:31:41 +0000 (14:31 +0100)]
dashboard: use fqdn url for active alert

When using the shortname, the URL for active alert launches with short
hostname and fails to connect to the server.

This commit changes the template in order to use the fqdn.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1765485
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: do not try to stop docker when binary is podman
Guillaume Abrioux [Tue, 26 Nov 2019 15:18:28 +0000 (16:18 +0100)]
purge: do not try to stop docker when binary is podman

If the container binary is podman, we shouldn't try to stop docker here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: isolate container_binary facts
Guillaume Abrioux [Tue, 26 Nov 2019 15:10:17 +0000 (16:10 +0100)]
facts: isolate container_binary facts

in order to be able to call container_binary without having to run the
whole ceph-facts role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: remove docker_* task
Guillaume Abrioux [Tue, 26 Nov 2019 14:26:35 +0000 (15:26 +0100)]
purge: remove docker_* task

All containers are removed when systemd stops them.
There is no need to call this module in purge container playbook.

This commit also removes all docker_image task and remove all container
images in the final cleanup play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1776736
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd option for HAproxy to act a SSL frontend termination point for loadbalanced RGW...
Stanley Lam [Thu, 21 Nov 2019 22:40:51 +0000 (14:40 -0800)]
Add option for HAproxy to act a SSL frontend termination point for loadbalanced RGW instances.

Signed-off-by: Stanley Lam <stanleylam_604@hotmail.com>
5 years agodocker2podman: import ceph-handler role
Guillaume Abrioux [Mon, 2 Dec 2019 08:47:21 +0000 (09:47 +0100)]
docker2podman: import ceph-handler role

This is needed to avoid following error:

```
ERROR! The requested handler 'restart ceph mons' was not found in either the main handlers list nor in the listening handlers list
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: do not hardcode group name
Guillaume Abrioux [Thu, 28 Nov 2019 14:12:59 +0000 (15:12 +0100)]
docker2podman: do not hardcode group name

let's use `client_group_name` instead of hardcoding the name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: import ceph-defaults in first play
Guillaume Abrioux [Thu, 28 Nov 2019 13:01:13 +0000 (14:01 +0100)]
docker2podman: import ceph-defaults in first play

We must import this role in the first play otherwise the first call to
`client_group_name`fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1777829
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoswitch_to_containers: fix umount ceph partitions
Dimitri Savineau [Wed, 27 Nov 2019 16:27:09 +0000 (11:27 -0500)]
switch_to_containers: fix umount ceph partitions

When a container is already running on a non containerized node then the
umount ceph partition task is skipped.
This is due to the container ps command which always returns 0 even if
the filter matches nothing.

We should run the umount task when:
1/ the container command is failing (not installed) : rc != 0
2/ the container command reports running ceph-osd containers : rc == 0

Also we should not fail on the ceph directory listing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1616159
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: wait for all osds once
Dimitri Savineau [Wed, 27 Nov 2019 14:29:06 +0000 (09:29 -0500)]
ceph-osd: wait for all osds once

cf8c6a3 moves the 'wait for all osds' task from openstack_config to the
main tasks list.
But the openstack_config code was executed only on the last OSD node.
We don't need to do this check on all OSD node so we need to add set
run_once to true on that task.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofacts: avoid duplicated element in devices list
Guillaume Abrioux [Wed, 20 Nov 2019 10:02:49 +0000 (11:02 +0100)]
facts: avoid duplicated element in devices list

When using `osd_auto_discovery`, `devices` is built multiple times due
to multiple runs of `ceph-facts` role. It end up with duplicate
instances of a same device in the list.

Using `unique` filter when building the list fixes this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: only print dashboard url of the grafana-server node
Guillaume Abrioux [Tue, 26 Nov 2019 09:59:29 +0000 (10:59 +0100)]
dashboard: only print dashboard url of the grafana-server node

This commit makes the ceph-dashboard role only printing ceph-dashboard
URL of the nodes present in grafana-server group

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1762163
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge/update: remove backward compatibility legacy
Guillaume Abrioux [Tue, 26 Nov 2019 13:43:07 +0000 (14:43 +0100)]
purge/update: remove backward compatibility legacy

This was introduced in 3.1 and marked as deprecation
We can definitely drop it in stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: fix cluster health status
Dimitri Savineau [Tue, 26 Nov 2019 19:09:48 +0000 (14:09 -0500)]
tests: fix cluster health status

The current ceph cluster health is in warning state:

health: HEALTH_WARN
        13 pool(s) have no replicas configured
        2 pool(s) have non-power-of-two pg_num

Because we're using only 1 replica then we need to disable the redundancy
check.
The pool pg num should be a power of two number (like 16).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "tox-podman: use centos 8 vagrant image"
Guillaume Abrioux [Wed, 27 Nov 2019 14:21:15 +0000 (15:21 +0100)]
Revert "tox-podman: use centos 8 vagrant image"

This reverts commit 19e9a06ab1429769a0513c54c12bf07698d2178f.

5 years agoceph-osd: wait for all osd before crush rules
Dimitri Savineau [Tue, 26 Nov 2019 16:09:11 +0000 (11:09 -0500)]
ceph-osd: wait for all osd before crush rules

When creating crush rules with device class parameter we need to be sure
that all OSDs are up and running because the device class list is
is populated with this information.
This is now enable for all scenario not openstack_config only.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-grafana: remove ipv6 brakets on wait_for
Dimitri Savineau [Mon, 25 Nov 2019 20:58:27 +0000 (15:58 -0500)]
ceph-grafana: remove ipv6 brakets on wait_for

The wait_for ansible module doesn't support the backets on IPv6 address
so need to remove them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769710
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: revert vagrant_variable file name detection
Guillaume Abrioux [Mon, 25 Nov 2019 09:03:08 +0000 (10:03 +0100)]
tests: revert vagrant_variable file name detection

This commit reverts the following change:

https://github.com/ceph/ceph-ansible/pull/4510/commits/fcf181342a70b78a355d1c985699028012326b5f#diff-23b6f443c01ea2efcb4f36eedfea9089R7-R14

this is causing CI failures so this commit is intended to unlock the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotravis: add python 3.7 and 3.8
Dimitri Savineau [Fri, 22 Nov 2019 20:17:35 +0000 (15:17 -0500)]
travis: add python 3.7 and 3.8

Add both python 3.7 and 3.8 in the travis matrix testing.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agonfs: remove legacy file
Guillaume Abrioux [Thu, 21 Nov 2019 15:39:42 +0000 (16:39 +0100)]
nfs: remove legacy file

this file is provided by the packaging (nfs-ganesha) so there's no need
to maintain it in ceph-ansible

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agonfs: do not run privileged nfs container
Guillaume Abrioux [Thu, 21 Nov 2019 15:28:42 +0000 (16:28 +0100)]
nfs: do not run privileged nfs container

At the moment, we bindmount the dbus socket from the host, this requires
to run the container with --privileged.
Since we now run a dedicated dbus daemon inside the same container, we
can stop running privileged nfs-ganesha containers

Related ceph-container PR : ceph/ceph-container#1517

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1725254
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: only run post osd upgrade play on 1 mon
Guillaume Abrioux [Mon, 18 Nov 2019 17:12:00 +0000 (18:12 +0100)]
update: only run post osd upgrade play on 1 mon

There is no need to run these tasks n times from each monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: use flags noout and nodeep-scrub only
Guillaume Abrioux [Mon, 18 Nov 2019 16:59:56 +0000 (17:59 +0100)]
update: use flags noout and nodeep-scrub only

1. set noout and nodeep-scrub flags,
2. upgrade each OSD node, one by one, wait for active+clean pgs
3. after all osd nodes are upgraded, unset flags

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rachana Patel <racpatel@redhat.com>
5 years agotox-podman: use centos 8 vagrant image
Dimitri Savineau [Mon, 18 Nov 2019 20:05:16 +0000 (15:05 -0500)]
tox-podman: use centos 8 vagrant image

Switch the podman scenario from atomic centos 7 to centos 8 (not atomic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoFixes failure of cephfs configuration using --limit
VasishtaShastry [Mon, 18 Nov 2019 09:49:17 +0000 (15:19 +0530)]
Fixes failure of cephfs configuration using --limit
Configuration of cephfs with an existing cluster using --limit used to fail
at different tasks while running with site-docker.yml
This commit addresses both of those tasks

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1773489
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
5 years agocontainer: add always tag on gather fact tasks
Dimitri Savineau [Thu, 14 Nov 2019 14:29:29 +0000 (09:29 -0500)]
container: add always tag on gather fact tasks

If we execute the site-container.yml playbook with specific tags (like
ceph_update_config) then we need to be sure to gather the facts otherwise
we will see error like:

The task includes an option with an undefined variable. The error was:
'ansible_hostname' is undefined

This commit also adds missing 'gather_facts: false' to mons plays.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1754432
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: add device class to crush rules
Dimitri Savineau [Thu, 31 Oct 2019 20:24:12 +0000 (16:24 -0400)]
ceph-osd: add device class to crush rules

This adds device class support to crush rules when using the class key
in the rule dict via the create-replicated sub command.
If the class key isn't specified then we use the create-simple sub
command for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1636508
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agomove crush rule creation from mon to osd role
Dimitri Savineau [Thu, 31 Oct 2019 20:17:33 +0000 (16:17 -0400)]
move crush rule creation from mon to osd role

If we want to create crush rules with the create-replicated sub command
and device class then we need to have the OSD created before the crush
rules otherwise the device classes won't exist.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: pin prometheus container tags
Dimitri Savineau [Mon, 4 Nov 2019 15:10:26 +0000 (10:10 -0500)]
ceph-defaults: pin prometheus container tags

In addition to the grafana container tag change, we need to do the same
for the prometheus container stack based on the release present in the
OSE 4.1 container image.

$ docker run --rm openshift4/ose-prometheus-node-exporter:v4.1 --version
node_exporter, version 0.17.0
  build user:       root@67fee13ed48f
  build date:       20191023-14:38:12
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus-alertmanager:4.1 --version
alertmanager, version 0.16.2
  build user:       root@70b79a3f29b6
  build date:       20191023-14:57:30
  go version:       go1.11.13
$ docker run --rm openshift4/ose-prometheus:4.1 --version
prometheus, version 2.7.2
  build user:       root@12da054778a3
  build date:       20191023-14:39:36
  go version:       go1.11.13

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoEvades validation of ceph_repository_type in containerized scenario
VasishtaShastry [Thu, 7 Nov 2019 12:00:21 +0000 (17:30 +0530)]
Evades validation of ceph_repository_type in containerized scenario
This will prevent failure of site-docker.yml with configs in doc.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1769760
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
Co-Authored-By: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_key: restore file mode after a key is fetched
Guillaume Abrioux [Thu, 14 Nov 2019 09:30:34 +0000 (10:30 +0100)]
ceph_key: restore file mode after a key is fetched

when `import_key` is enabled, if the key already exists, it will only be
fetched using ceph cli, if the mode specified in the `ceph_key` task is
different from what is applied by the ceph cli, the mode isn't restored because
we don't call `module.set_fs_attributes_if_different()` before
`module.exit_json(**result)`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1734513
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add time command in vagrant_up.sh
Guillaume Abrioux [Thu, 17 Oct 2019 13:37:31 +0000 (15:37 +0200)]
tests: add time command in vagrant_up.sh

monitor how long it takes to get all VMs up and running

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: remove legacy in tox-update.ini
Guillaume Abrioux [Fri, 8 Nov 2019 13:01:41 +0000 (14:01 +0100)]
tests: remove legacy in tox-update.ini

This variable isn't used in tox-update.ini so this commit removes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: upgrade from nautilus to octopus in master
Guillaume Abrioux [Wed, 10 Apr 2019 12:17:36 +0000 (14:17 +0200)]
tests: upgrade from nautilus to octopus in master

test upgrades from nautilus to octopus instead of mimic to octopus.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: reset flags before and after each osd node upgrade
Guillaume Abrioux [Tue, 18 Jun 2019 08:08:48 +0000 (10:08 +0200)]
update: reset flags before and after each osd node upgrade

It might be possible at some point even with osd flags `noout` and
`norebalance` set the PGs states can change depending on the amount of data
written meantime. It means the check for PGs state will fail.

This commit changes the way we set those flags:
we set them before an OSD node upgrade and unset them before the PGs
state check so they can recover.

Fixes: #3961
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoFixes for Makefile
Javier Pena [Fri, 8 Nov 2019 09:38:32 +0000 (10:38 +0100)]
Fixes for Makefile

- Set default mock configuration to epel-8-x86_64, to match the
  default dist value.
- Add support for alpha tags, like the recently added v5.0.0alpha1

Signed-off-by: Javier Pena <jpena@redhat.com>
5 years agotests: add coverage on purge playbook
Guillaume Abrioux [Thu, 7 Nov 2019 12:39:25 +0000 (13:39 +0100)]
tests: add coverage on purge playbook

This commit adds a playbook to be played before we run purge playbook,
it first creates an rbd image then map an rbd device on client0 so the
purge playbook will try to unmap it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopurge: use sysfs to unmap rbd devices
Guillaume Abrioux [Mon, 4 Nov 2019 14:59:39 +0000 (15:59 +0100)]
purge: use sysfs to unmap rbd devices

in containerized context, using the binary provided in atomic os won't
work because it's an old version provided by ceph-common based on
10.2.5.
Using a container could be an idea but for large cluster with hundreds
of client nodes, that would require to pull the image of each of them
just to unmap the rbd devices.

Let's use the sysfs method in order to avoid any issue related to ceph
version that is shipped on the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766064
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-validate: add rbdmirror validation
Dimitri Savineau [Tue, 5 Nov 2019 16:53:22 +0000 (11:53 -0500)]
ceph-validate: add rbdmirror validation

When ceph_rbd_mirror_configure is set to true we need to ensure that
the required variables aren't empty.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1760553
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-handler: Use /proc/net/unix for rgw socket
Dimitri Savineau [Tue, 6 Aug 2019 15:41:02 +0000 (11:41 -0400)]
ceph-handler: Use /proc/net/unix for rgw socket

If for some reason, there's an old rgw socket file present in the
/var/run/ceph/ directory then the test command could fail with

test: xxxxxxxxx.asok: binary operator expected

$ ls -hl /var/run/ceph/
total 0
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94153614631472.asok
srwxr-xr-x. ceph-client.rgw.rgw0.rgw0.68.94240997655088.asok

We can check the radosgw socket in /proc/net/unix to avoid using wildcard
in the socket name.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoadd-{mon,osd}: run raw install python tasks
Dimitri Savineau [Mon, 4 Nov 2019 14:04:48 +0000 (09:04 -0500)]
add-{mon,osd}: run raw install python tasks

If the new mon/osd node doesn't have python installed then we need to
execute the tasks from raw_install_python.yml.

Closes: #4368
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: fix fs.aio-max-nr sysctl condition
Dimitri Savineau [Wed, 6 Nov 2019 15:15:53 +0000 (10:15 -0500)]
ceph-osd: fix fs.aio-max-nr sysctl condition

[1] introduced a regression on the fs.aio-max-nr sysctl value condition.
The enable key isn't a boolean but a string because the expression isn't
evaluated.
This string output "(osd_objectstore == 'bluestore')" is always true
because item.enable condition only matches non empty string. So the
sysctl value was applyied for both filestore and bluestore backend.

[2] added the bool filter to the condition but the filter always returns
false on string and the sysctl wasn't applyed at all.

This commit fixes the enable key value by evaluating the value instead
of using the string.

[1] https://github.com/ceph/ceph-ansible/commit/08a2b58
[2] https://github.com/ceph/ceph-ansible/commit/ab54fe2

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests/requirements: bump testinfra and pytest v5.0.0alpha1
Dimitri Savineau [Fri, 1 Nov 2019 14:25:36 +0000 (10:25 -0400)]
tests/requirements: bump testinfra and pytest

The ansible ssh connections are now using the ssh backend instead of
paramiko starting testinfra 3.1 and persistent connections too.
pytest 4.6 is the latest release to be supported by python 2.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: pin grafana container tag to 5.2.4
Dimitri Savineau [Thu, 31 Oct 2019 15:37:20 +0000 (11:37 -0400)]
ceph-defaults: pin grafana container tag to 5.2.4

The latest grafana container tag is using grafana 6.x release which could
cause issue with the ceph dashboard integration.
Considering that the grafana container in RHCS 3 is based on 5.x then we
should use the same version.

$ docker run --rm rhceph/rhceph-3-dashboard-rhel7:3 -v
Version 5.2.4 (commit: unknown-dev)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: Remove ulimit nofile on container start
Dimitri Savineau [Wed, 30 Oct 2019 15:45:44 +0000 (11:45 -0400)]
ceph-osd: Remove ulimit nofile on container start

Even if this improves ceph-disk/ceph-volume performances then it also
impact the ceph-osd process.
The ceph-osd process shouldn't use 1024:4096 value for the max open
files.
Removing the ulimit option from the container engine and doing this kind
of change on the container side [1].

[1] https://github.com/ceph/ceph-container/pull/1497

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1702285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoSet grafana-server user and password in ceph-dashboard role
fmount [Thu, 31 Oct 2019 09:49:22 +0000 (10:49 +0100)]
Set grafana-server user and password in ceph-dashboard role

This change adds two tasks to set grafana-api user and password
that are required to inject dashboard layouts to the external
grafana instance.
Without these two parameters the ceph-ansible playbook fails
showing an authorization error (HTTPError: 401 Client Error:
Unauthorized").

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1767365
Signed-off-by: fmount <fpantano@redhat.com>
5 years agoAllow setting dist and mock configuration in Makefile
Javier Pena [Wed, 23 Oct 2019 14:45:41 +0000 (16:45 +0200)]
Allow setting dist and mock configuration in Makefile

Curently, the dist and mock configurations are hardcoded in the
Makefile to be el8 and epel-7-x86_64, respectively. This commit
allows the user to override those settings using the DIST and
MOCK_CONFIG environment variables, falling back to the current
defaults if not set.

This provides additional flexibility when building the RPM directly
from the repository.

Signed-off-by: Javier Peña <jpena@redhat.com>
5 years agoceph-mon: use --admin-daemon to set default crush rule
Mihai Plasoianu [Mon, 28 Oct 2019 15:30:39 +0000 (16:30 +0100)]
ceph-mon: use --admin-daemon to set default crush rule

Signed-off-by: Mihai Plasoianu <m.plasoianu@vertical.de>
5 years agonfs: support specific keys for rgw nfs user
Radu Toader [Tue, 29 Oct 2019 07:56:00 +0000 (09:56 +0200)]
nfs: support specific keys for rgw nfs user

This brings the possibility to modify the rgw nfs user to use specific
keys when those are defined.

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
5 years agoupdate: add default values when setting fact
Guillaume Abrioux [Tue, 29 Oct 2019 17:01:50 +0000 (18:01 +0100)]
update: add default values when setting fact

This commit adds a default value in the `with_dict` because when using
python 2.7, if a task using a `with_dict` has a condition, it is
evaluated anyway whereas in python 3 it isn't.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1766499
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-nfs: add nfs-ganesha-rados-grace explicitly
Dimitri Savineau [Mon, 28 Oct 2019 17:54:16 +0000 (13:54 -0400)]
ceph-nfs: add nfs-ganesha-rados-grace explicitly

Since nfs-ganesha V3.0-rc4 and [1] we need to explicitly install the
nfs-ganesha-rados-grace package.

[1] https://github.com/nfs-ganesha/nfs-ganesha/commit/0fea990

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: remove default filter on mds group
Dimitri Savineau [Fri, 25 Oct 2019 21:03:46 +0000 (17:03 -0400)]
rolling_update: remove default filter on mds group

There's no need to use the default filter on active/standby groups
because if the group doesn't exist then the play is just skipped.

Currently this generates warnings like:

[WARNING]: Could not match supplied host pattern, ignoring: |
[WARNING]: Could not match supplied host pattern, ignoring: default([])

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: fix active mds host value
Dimitri Savineau [Fri, 25 Oct 2019 20:47:50 +0000 (16:47 -0400)]
rolling_update: fix active mds host value

The active mds host should be based on the inventory hostname and not on
the ansible hostname.
The value returns under the mdsmap structure is based on the OS hostname
so we need to find the right node in the inventory with this value when
doing operation on inventory nodes.

Othewise we could see error like:

The task includes an option with an undefined variable. The error was:
"hostvars[foobar]" is undefined

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoipaddrs_in_ranges: fix python indent
Dimitri Savineau [Fri, 25 Oct 2019 19:52:27 +0000 (15:52 -0400)]
ipaddrs_in_ranges: fix python indent

pycodestyle returns:

 E111 indentation is not a multiple of four

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agomove library/plugins tests files under tests dir
Dimitri Savineau [Fri, 25 Oct 2019 19:47:05 +0000 (15:47 -0400)]
move library/plugins tests files under tests dir

To avoid unnecessary ansible warnings during playbook execution we can
move the library and plugins test files under a different directory.

[WARNING]: Skipping plugin (plugins/filter/test_ipaddrs_in_ranges.py) as
it seems to be invalid:
cannot import name 'ipaddrs_in_ranges'

Closes: #4656
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoadd-mon: add missing become flag
Dimitri Savineau [Fri, 25 Oct 2019 15:09:32 +0000 (11:09 -0400)]
add-mon: add missing become flag

Without the become flag set to true, we can't executed the roles
successfully.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: fix reset mon_host variable
Dimitri Savineau [Fri, 25 Oct 2019 17:36:07 +0000 (13:36 -0400)]
rolling_update: fix reset mon_host variable

mon_host should use the inventory hostname and not the node hostname.
Fix creates an issue when the inventory and node hostname are different.

Closes: #4670
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoadd-{mon,osd}: add ceph-container-engine role
Dimitri Savineau [Thu, 24 Oct 2019 14:02:10 +0000 (10:02 -0400)]
add-{mon,osd}: add ceph-container-engine role

The ceph-container-engine role is missing from both playbooks so the
container engine (docker, podman) isn't install resulting in a failure
on the added nodes.

fatal: [xxxxx]: FAILED! => changed=false
  cmd: docker --version
  msg: '[Errno 2] No such file or directory'
  rc: 2

Closes: #4634
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoupdate: use right node when creating active mds group
Guillaume Abrioux [Thu, 24 Oct 2019 07:41:06 +0000 (09:41 +0200)]
update: use right node when creating active mds group

This must be consistent with what is used in `name` parameter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodefaults: add user/pass auth registry variables
Dimitri Savineau [Thu, 24 Oct 2019 15:07:20 +0000 (11:07 -0400)]
defaults: add user/pass auth registry variables

Add ceph_docker_registry_username and ceph_docker_registry_password
variables in ceph-defaults role so they will be present in the group_vars
samples but commented.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1763139
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agomon: call mon_status from asok
Guillaume Abrioux [Thu, 24 Oct 2019 09:08:27 +0000 (11:08 +0200)]
mon: call mon_status from asok

since c09b82a80a392ccd0da7677c7b424ce5cd3fa5d6 in ceph/ceph we must call
mon_status from asok instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: avoid skipping single mds deployment upgrade
Guillaume Abrioux [Wed, 23 Oct 2019 17:39:15 +0000 (19:39 +0200)]
update: avoid skipping single mds deployment upgrade

otherwise a single MDS would never be updated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate: skip mds deactivation when no mds in inventory
Guillaume Abrioux [Wed, 23 Oct 2019 13:48:32 +0000 (15:48 +0200)]
update: skip mds deactivation when no mds in inventory

Let's skip this part of the code if there's no mds node in the
inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: add ceph iscsi management
Dimitri Savineau [Mon, 21 Oct 2019 19:45:19 +0000 (15:45 -0400)]
dashboard: add ceph iscsi management

When deploying with ceph-iscsi nodes and dashboard enabled, we need to
add the ceph iscsi gateway endpoints to the dashboard configuration and
add the mgr ip address in the trusted list in the iscsi gateway
configuration file.

Closes: #4638
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1764173
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-iscsi-management

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-iscsi: add ceph-iscsi stable repositories
Dimitri Savineau [Mon, 21 Oct 2019 14:32:55 +0000 (10:32 -0400)]
ceph-iscsi: add ceph-iscsi stable repositories

This commit adds the support of the ceph-iscsi stable repository when
use ceph_repository community instead of always using the devel
repositories.
We're still using the devel repositories for rtslib and tcmu-runner in
both cases (dev and community).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: set ceph_iscsi_config_dev to false
Dimitri Savineau [Mon, 21 Oct 2019 13:58:40 +0000 (09:58 -0400)]
rhcs: set ceph_iscsi_config_dev to false

We don't have to use ceph_iscsi_config_dev (default true) on RHCS
because all iscsi packages are already included in the RHCS
repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "iscsigw: install python-requests"
Dimitri Savineau [Mon, 21 Oct 2019 13:37:37 +0000 (09:37 -0400)]
Revert "iscsigw: install python-requests"

We don't need this since [1]. Also this was only working for python2 and
not supporting python3.

[1] https://github.com/ceph/ceph-iscsi/commit/00f198a

This reverts commit 167737dd3de02057403fb458c50d22cf94a85b95.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainer/dashboard: run the registry auth task
Dimitri Savineau [Tue, 22 Oct 2019 17:58:50 +0000 (13:58 -0400)]
container/dashboard: run the registry auth task

When deploying with packages then the ceph-container-common role isn't
executed so the registry authentication task is ignored.

Closes: #4636
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: use osd ids instead of device name in ooo_collocation
Guillaume Abrioux [Tue, 22 Oct 2019 11:27:20 +0000 (13:27 +0200)]
tests: use osd ids instead of device name in ooo_collocation

on master, it doesn't make sense anymore to use device name, we should
use osd id instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>