]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agonfs: fix nfs with external ceph cluster support
Guillaume Abrioux [Thu, 19 Mar 2020 19:44:20 +0000 (20:44 +0100)]
nfs: fix nfs with external ceph cluster support

This commit refact and fix the nfs deployment with external ceph cluster
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1814942
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodashboard: allow to set read-only admin user
Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user

This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1810176
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: add registry name on dashboard vars
Dimitri Savineau [Tue, 17 Mar 2020 00:45:03 +0000 (20:45 -0400)]
ceph-defaults: add registry name on dashboard vars

We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: update grafana container tag
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag

Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.

[1] registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: Fix system_secret_key variable handling
petruha [Mon, 16 Mar 2020 16:35:20 +0000 (17:35 +0100)]
ceph-facts: Fix system_secret_key variable handling

This commit fixes the system_secret_key variable not substitued by the
right value and always using the 'system_secret_key' string instead.

$ egrep 'system_(access|secret)_key' group_vars/all.yml
system_access_key: foofoofoofoofoofoofo
system_secret_key: barbarbarbarbarbarbarbarbarbarbarbarbarb

$ ansible-playbook -vv -i hosts site.yml.sample -e rgw_multisite=true
(...)
  - hostname: storage0
    endpoint: http://192.168.100.42:8080
    instance_name: rgw0
    radosgw_address: 192.168.50.3
    radosgw_frontend_port: 8085
    rgw_realm: canada
    rgw_zone: montreal
    rgw_zone_user: justin.trudeau
    rgw_zone_user_display_name: Justin Trudeau
    rgw_zonegroup: quebec
    system_access_key: foofoofoofoofoofoofo
    system_secret_key: system_secret_key

Fixes https://github.com/ceph/ceph-ansible/issues/5150

Signed-off-by: petruha <5363545+p37ruh4@users.noreply.github.com>
5 years agorhcs_edits: Update grafana version
Boris Ranto [Mon, 16 Mar 2020 16:08:03 +0000 (17:08 +0100)]
rhcs_edits: Update grafana version

We are planning to release updated grafana image for ceph dashboard in
RHCS 4.1. We need to update the rhcs edut to point to the new image
then.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786107
Signed-off-by: Boris Ranto <branto@redhat.com>
5 years agoconfig: remove legacy option in ceph.conf.j2
Guillaume Abrioux [Mon, 16 Mar 2020 08:55:20 +0000 (09:55 +0100)]
config: remove legacy option in ceph.conf.j2

This option has been deprecated (As of 0.51).
By the way, ceph-ansible already sets the
auth_{service,client,cluster}_required variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1623586
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agohandler: add rgw multi-instances support
Dimitri Savineau [Thu, 12 Mar 2020 16:06:55 +0000 (17:06 +0100)]
handler: add rgw multi-instances support

This commit adds the rgw multi-instances support in ceph-handler
(restart_rgw_daemons.sh.j2)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: add multi-instances support when deploying multisite
Guillaume Abrioux [Mon, 9 Mar 2020 10:05:01 +0000 (11:05 +0100)]
rgw: add multi-instances support when deploying multisite

This commit adds the multi-instances when deploying rgw multisite

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: open radosgw ports for multi instances
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances

When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-container: clean legacy code
Guillaume Abrioux [Thu, 12 Mar 2020 11:22:02 +0000 (12:22 +0100)]
purge-container: clean legacy code

This commit removes a register which isn't used in this playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoupdate osd pool set size command
Dimitri Savineau [Wed, 11 Mar 2020 00:50:55 +0000 (20:50 -0400)]
update osd pool set size command

Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.

[1] https://github.com/ceph/ceph/commit/21508bd

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: fix a typo in create_realm_zonegroup_zone_lists
Guillaume Abrioux [Tue, 10 Mar 2020 13:07:24 +0000 (14:07 +0100)]
rgw: fix a typo in create_realm_zonegroup_zone_lists

This commit fixes a typo.

`s/realms/secondary_realms`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoinfra: add retries/until on firewalld start task
Guillaume Abrioux [Mon, 9 Mar 2020 09:40:54 +0000 (10:40 +0100)]
infra: add retries/until on firewalld start task

This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:

```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020  08:58:48 +0000 (0:00:00.963)       0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
  msg: |-
    Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
    Failed to execute operation: Connection reset by peer
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoopenstack_keys: use openstack_cinder_pool.name
Christian Berendt [Sat, 7 Mar 2020 23:11:45 +0000 (00:11 +0100)]
openstack_keys: use openstack_cinder_pool.name

Instead of volumes as a static string the openstack_cinder_pool.name
variable should be used as with the other keys.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
5 years agotests/requirements: bump testinfra
Dimitri Savineau [Thu, 5 Mar 2020 14:52:56 +0000 (09:52 -0500)]
tests/requirements: bump testinfra

3.4 is the latest testinfra release available but python2 is dropped
starting 4.0.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodoc: Fixed a minor typo in the document
Nizamudeen [Fri, 6 Mar 2020 10:38:31 +0000 (16:08 +0530)]
doc: Fixed a minor typo in the document

In the Demo part of the Document, for both Vagrant and Bare metal the description was mentioned as "Deployment from scratch on bare metal machines".
Changed "bare metal" to "vagrant" for Vagrant section

Signed-off-by: Nizamudeen <nia@redhat.com>
5 years agorgw: add retry/until on pools tasks
Guillaume Abrioux [Fri, 6 Mar 2020 07:06:37 +0000 (08:06 +0100)]
rgw: add retry/until on pools tasks

Sometimes, these task can timeout for some reason.
Adding these retries can help to avoid unexcepted failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: stop ceph-volume services
Dimitri Savineau [Thu, 5 Mar 2020 19:18:33 +0000 (14:18 -0500)]
filestore-to-bluestore: stop ceph-volume services

We only disable the ceph-osd services but not the ceph-volume lvm
services during the filestore to bluestore migration.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoclient: skip create_users_keys.yml when rolling_update
Guillaume Abrioux [Wed, 4 Mar 2020 15:33:46 +0000 (16:33 +0100)]
client: skip create_users_keys.yml when rolling_update

There's no need to run this part of the role when upgrading clients
node. Let's skip it when rolling_update.yml is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorgw multisite: enable more than 1 realm per cluster
Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]
rgw multisite: enable more than 1 realm per cluster

Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.

The rgw hosts now need to be grouped into zones
and realms in the inventory.

.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.

Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands

remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agotests: add more osd nodes in all_daemons scenario
Guillaume Abrioux [Tue, 3 Mar 2020 18:01:27 +0000 (19:01 +0100)]
tests: add more osd nodes in all_daemons scenario

This commit adds more osd nodes in all_daemons scenario in order to test
erasure pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update ooo job
Guillaume Abrioux [Tue, 3 Mar 2020 14:06:40 +0000 (15:06 +0100)]
tests: update ooo job

This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: do not change pool size on erasure pool
Guillaume Abrioux [Tue, 3 Mar 2020 09:47:19 +0000 (10:47 +0100)]
osd: do not change pool size on erasure pool

This commit adds condition in order to not try to customize pools size
when its type is erasure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add erasure pool creation test in CI
Guillaume Abrioux [Mon, 2 Mar 2020 16:05:18 +0000 (17:05 +0100)]
tests: add erasure pool creation test in CI

This commit makes the CI testing an OSD pool erasure creation due to the
recent refact of the OSD pool creation tasks in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: enable pg autoscaler on 1 pool
Guillaume Abrioux [Fri, 28 Feb 2020 17:38:38 +0000 (18:38 +0100)]
tests: enable pg autoscaler on 1 pool

This commit enables the pg autoscaler on 1 pool.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: add pg autoscaler support
Guillaume Abrioux [Fri, 28 Feb 2020 15:03:15 +0000 (16:03 +0100)]
osd: add pg autoscaler support

This commit adds the pg autoscaler support.

The structure for pool definition has now two additional attributes
`pg_autoscale_mode` and `target_size_ratio`, eg:

```
test:
  name: "test"
  pg_num: "{{ osd_pool_default_pg_num }}"
  pgp_num: "{{ osd_pool_default_pg_num }}"
  rule_name: "replicated_rule"
  application: "rbd"
  type: 1
  erasure_profile: ""
  expected_num_objects: ""
  size: "{{ osd_pool_default_size }}"
  min_size: "{{ osd_pool_default_min_size }}"
  pg_autoscale_mode: False
  target_size_ratio": 0.1
```

when `pg_autoscale_mode` is `True` user has to set a decent value in
`target_size_ratio`.

Given that it's a new feature, it's still disabled by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1782253
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoosd: refact osd pool creation
Guillaume Abrioux [Fri, 28 Feb 2020 10:41:00 +0000 (11:41 +0100)]
osd: refact osd pool creation

Currently, the command executed is wrong, eg:

```
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --cluster
  - ceph
  - osd
  - pool
  - create
  - volumes
  - '32'
  - '32'
  - replicated_rule
  - '1'
  delta: '0:00:01.625525'
  end: '2020-02-27 16:41:05.232705'
  item:
```

From documentation, the osd pool creation command is :

```
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
     [crush-rule-name] [expected-num-objects]
ceph osd pool create {pool-name} {pg-num}  {pgp-num}   erasure \
     [erasure-code-profile] [crush-rule-name] [expected_num_objects]
```

it means we pass '1' (from item.type) as value for
`expected_num_objects` by default which is very likely not what we want.

Also, this commit modifies the default value when no `rule_name` is set
to use the existing variable `osd_pool_default_crush_rule`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1808495
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add lvm batch filestore testing
Guillaume Abrioux [Tue, 3 Mar 2020 16:18:17 +0000 (17:18 +0100)]
tests: add lvm batch filestore testing

This commit adds an OSD node in lvm-batch scenario in order to test
filestore backend.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: increase journal_size value
Guillaume Abrioux [Tue, 3 Mar 2020 15:56:33 +0000 (16:56 +0100)]
tests: increase journal_size value

Looks like we are still seeing issue [1].
Let's increase this value to unlock the CI (however, it still needs to
be investigated).

Typical error (see [1] for further details) :
```
[root@osd2 ~]# ceph-volume --cluster ceph lvm batch --filestore --yes --journal-size '2048' /dev/sda /dev/sdb --journal-devices /dev/sdc
Running command: /sbin/vgcreate --force --yes ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78 /dev/sdc
 stdout: Physical volume "/dev/sdc" successfully created.
 stdout: Volume group "ceph-journals-817ef90b-77ac-4f52-b8a9-30893849fb78" successfully created
--> Refusing to continue with configured size for journal
-->  RuntimeError: journal sizes must be larger than 2GB, detected: 1024.00 MB
```

[1] https://tracker.ceph.com/issues/41374

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agolibrary: fix bug in ceph_volume
Guillaume Abrioux [Tue, 3 Mar 2020 15:12:10 +0000 (16:12 +0100)]
library: fix bug in ceph_volume

This commit fixes a regression introduced by
0326d992c2b6b92a6d4b8ccf2a51d9343652da48.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-rbdmirror: fix presence after removal
Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]
shrink-rbdmirror: fix presence after removal

We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink-mgr: fix systemd condition
Dimitri Savineau [Wed, 26 Feb 2020 16:16:56 +0000 (11:16 -0500)]
shrink-mgr: fix systemd condition

This playbook was using mds systemd condition.
Also a command task was using pipeline which is not allowed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: update shrink scenario configuration
Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]
tox: update shrink scenario configuration

The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoshrink: don't use localhost node
Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]
shrink: don't use localhost node

The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-validate: add key format validation
Dimitri Savineau [Fri, 28 Feb 2020 14:42:44 +0000 (09:42 -0500)]
ceph-validate: add key format validation

If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.

Closes: #5104
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: stop rgw instances by iteration
Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]
purge: stop rgw instances by iteration

It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.

fatal: [rgw0]: FAILED! => changed=false
  msg: 'This module does not currently support using glob patterns,
        found ''*'' in service name: ceph-radosgw@*'
...ignoring

Instead we should iterate over the rgw_instances list.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: install firewalld python bindings
Dimitri Savineau [Fri, 15 Nov 2019 15:37:27 +0000 (10:37 -0500)]
ceph-infra: install firewalld python bindings

When using the firewalld ansible module we need to be sure that the
python bindings are installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-infra: split firewalld tasks
Dimitri Savineau [Fri, 15 Nov 2019 15:11:33 +0000 (10:11 -0500)]
ceph-infra: split firewalld tasks

Since ansible 2.9 the firewalld task could not be used with service and
source in the same time anymore.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoAdd ansible 2.9 support
Dimitri Savineau [Fri, 1 Nov 2019 13:28:03 +0000 (09:28 -0400)]
Add ansible 2.9 support

This commit adds ansible 2.9 support in addition of 2.8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoosd: add journal option in ceph_volume call (batch)
Guillaume Abrioux [Fri, 28 Feb 2020 18:59:22 +0000 (19:59 +0100)]
osd: add journal option in ceph_volume call (batch)

This commit adds the journal option to the ceph_volume call when
scenario is lvm batch

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorequirements: enforce ansible version requirement
Guillaume Abrioux [Thu, 27 Feb 2020 12:33:31 +0000 (13:33 +0100)]
requirements: enforce ansible version requirement

See https://github.com/advisories/GHSA-3m93-m4q6-mc6v

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: support OSDs with more than 2 digits
Guillaume Abrioux [Fri, 21 Feb 2020 09:22:32 +0000 (10:22 +0100)]
common: support OSDs with more than 2 digits

When running environment with OSDs having ID with more than 2 digits,
some tasks don't match the system units and therefore, playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1805643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: support shrinking ceph-disk prepared osds
Guillaume Abrioux [Wed, 19 Feb 2020 17:30:14 +0000 (18:30 +0100)]
shrink-osd: support shrinking ceph-disk prepared osds

This commit adds the ceph-disk prepared osds support

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796453
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoshrink-osd: don't run ceph-facts entirely
Guillaume Abrioux [Wed, 19 Feb 2020 12:51:49 +0000 (13:51 +0100)]
shrink-osd: don't run ceph-facts entirely

We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: reuse dedicated journal
Dimitri Savineau [Mon, 3 Feb 2020 20:03:17 +0000 (15:03 -0500)]
filestore-to-bluestore: reuse dedicated journal

If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.

When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790479
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodoc: update infra playbooks statements
Dimitri Savineau [Mon, 24 Feb 2020 18:58:38 +0000 (13:58 -0500)]
doc: update infra playbooks statements

We don't need to copy the infrastructure playbooks in the root
ceph-ansible directory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: increase connection timeout to 10
Dimitri Savineau [Thu, 20 Feb 2020 14:49:17 +0000 (09:49 -0500)]
ceph-rgw: increase connection timeout to 10

5s as a connection timeout could be low in some setup. Let's increase
it to 10s.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoConfigure ceph dashboard backend and dashboard_frontend_vip
Francesco Pantano [Wed, 12 Feb 2020 12:58:59 +0000 (13:58 +0100)]
Configure ceph dashboard backend and dashboard_frontend_vip

This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
5 years agoinfrastructure-playbooks: Run shrink-osd tasks on monitor
Benoît Knecht [Fri, 3 Jan 2020 09:38:20 +0000 (10:38 +0100)]
infrastructure-playbooks: Run shrink-osd tasks on monitor

Instead of running shring-osd tasks on localhost and delegating most of
them to the first monitor, run all of them on the first monitor
directly.

This has the added advantage of becoming root on the monitor only, not
on localhost.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-dashboard: update create/get rgw user tasks
Dimitri Savineau [Mon, 17 Feb 2020 20:46:54 +0000 (15:46 -0500)]
ceph-dashboard: update create/get rgw user tasks

Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.

[1] https://github.com/ceph/ceph/commit/269e9b9

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw: allow SSL certificate content to supplied
Sam Choraria [Tue, 3 Dec 2019 12:23:13 +0000 (12:23 +0000)]
ceph-rgw: allow SSL certificate content to supplied

Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
5 years agoceph-defaults: remove bootstrap_dirs_xxx vars
Dimitri Savineau [Wed, 12 Feb 2020 19:34:30 +0000 (14:34 -0500)]
ceph-defaults: remove bootstrap_dirs_xxx vars

Both bootstrap_dirs_owner and bootstrap_dirs_group variables aren't
used anymore in the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: extend automatic rgw pool creation capability
Ali Maredia [Tue, 10 Sep 2019 22:01:48 +0000 (22:01 +0000)]
rgw: extend automatic rgw pool creation capability

Add support for erasure code pools.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1731148
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-rgw-loadbalancer: Fix SSL newline issue
Florian Faltermeier [Wed, 18 Dec 2019 13:31:57 +0000 (14:31 +0100)]
ceph-rgw-loadbalancer: Fix SSL newline issue

The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.

[ALERT] 351/140240 (21388) : parsing [xxxxx] : 'stats socket' : unknown
keyword 'tune.ssl.default-dh-param'. Registered
[ALERT] 351/140240 (21388) : Fatal errors found in configuration.

Fixes: #4869
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
5 years agorgw: don't create user on secondary zones
Dimitri Savineau [Tue, 5 Nov 2019 16:32:06 +0000 (11:32 -0500)]
rgw: don't create user on secondary zones

The rgw user creation for the Ceph dashboard integration shouldn't be
created on secondary rgw zones.

Closes: #4707
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794351
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge-cluster: update package list to remove
Dimitri Savineau [Thu, 16 Jan 2020 21:58:35 +0000 (16:58 -0500)]
purge-cluster: update package list to remove

We only support python3 so renaming all ceph python packages.
Some ceph packages were missing from the list (ceph-mon, ceph-osd or
rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "vagrant: temp workaround for CentOS 8 cloud image"
Dimitri Savineau [Mon, 3 Feb 2020 16:15:33 +0000 (11:15 -0500)]
Revert "vagrant: temp workaround for CentOS 8 cloud image"

The CentOS 8 vagrant image download is now fixed.

This reverts commit a5385e104884a3692954e4691f3348847a35c7fa.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoThe _filtered_clients list should intersect with ansible_play_batch
John Fulton [Thu, 6 Feb 2020 02:23:54 +0000 (21:23 -0500)]
The _filtered_clients list should intersect with ansible_play_batch

Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781
Signed-off-by: John Fulton <fulton@redhat.com>
5 years agotests: don't install s3cmd on containerized setup
Dimitri Savineau [Fri, 7 Feb 2020 20:58:17 +0000 (15:58 -0500)]
tests: don't install s3cmd on containerized setup

The s3cmd package should only be installed on non containerized
deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-iscsi: don't use ceph_dev_xxx variables
Dimitri Savineau [Thu, 6 Feb 2020 21:31:39 +0000 (16:31 -0500)]
ceph-iscsi: don't use ceph_dev_xxx variables

Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi
repositories from shaman doesn't make sense because the ceph devel
branches and sha1 aren't compatible with ceph-iscsi devel.
Instead we could rely on the master branch and the latest sha1.
Currently it's not possible to using a custom ceph branch/sha1 value
with iscsi setup otherwise the repository setup will fail.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix ceph_nfs_ceph_user variable
Dimitri Savineau [Mon, 10 Feb 2020 16:06:48 +0000 (11:06 -0500)]
ceph-nfs: fix ceph_nfs_ceph_user variable

The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role.
6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1801319
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: add nfs-ganesha-rados-urls package
Dimitri Savineau [Thu, 6 Feb 2020 20:41:46 +0000 (15:41 -0500)]
ceph-nfs: add nfs-ganesha-rados-urls package

Since nfs-ganesha 2.8.3 the rados-urls library has been move to a
dedicated package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-{mon,osd}: move default crush variables
Dimitri Savineau [Mon, 10 Feb 2020 18:43:31 +0000 (13:43 -0500)]
ceph-{mon,osd}: move default crush variables

Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.

"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"

This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798864
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-grafana: fix grafana_{crt,key} condition
Dimitri Savineau [Wed, 12 Feb 2020 15:38:25 +0000 (10:38 -0500)]
ceph-grafana: fix grafana_{crt,key} condition

The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter

Closes: #5053
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-prometheus: add alertmanager HA config
Dimitri Savineau [Thu, 13 Feb 2020 20:56:23 +0000 (15:56 -0500)]
ceph-prometheus: add alertmanager HA config

When using multiple alertmanager nodes (via the grafana-server group)
then we need to specify the other peers in the configuration.

https://prometheus.io/docs/alerting/alertmanager/#high-availability
https://github.com/prometheus/alertmanager#high-availability

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792225
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocontainers: add KillMode=none to systemd templates
Dimitri Savineau [Tue, 11 Feb 2020 15:09:51 +0000 (10:09 -0500)]
containers: add KillMode=none to systemd templates

Because we are relying on docker|podman for managing containers then we
don't need systemd to manage the process (like kill).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: allow configuring multiple grafana host
Dimitri Savineau [Mon, 27 Jan 2020 19:47:00 +0000 (14:47 -0500)]
dashboard: allow configuring multiple grafana host

When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.

grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).

We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1784011
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_containers: increase health check values
Guillaume Abrioux [Thu, 30 Jan 2020 10:33:38 +0000 (11:33 +0100)]
switch_to_containers: increase health check values

This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1783223
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoRevert "rhcs: update container image name"
Dimitri Savineau [Wed, 5 Feb 2020 00:51:13 +0000 (19:51 -0500)]
Revert "rhcs: update container image name"

This wasn't necesarry. The container image was fixed on the
RedHat's registry

This reverts commit 3bd250c7422a6eaaffb2d8c6a4750b232e6b1c7e.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorhcs: update container image name
Dimitri Savineau [Tue, 4 Feb 2020 20:19:53 +0000 (15:19 -0500)]
rhcs: update container image name

The RHCS 4 container image is rhceph/rhceph-4

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1797743
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: remove legacy `osd_scenario` variable
Guillaume Abrioux [Mon, 3 Feb 2020 08:22:00 +0000 (09:22 +0100)]
tests: remove legacy `osd_scenario` variable

As of stable-4.0 most of these references aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: set devices osd_auto_discovery on OSDs
Dimitri Savineau [Thu, 16 Jan 2020 19:03:09 +0000 (14:03 -0500)]
ceph-facts: set devices osd_auto_discovery on OSDs

We only need to set the devices fact with osd_auto_discovery on OSD
nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-facts: remove is_podman fact
Dimitri Savineau [Mon, 27 Jan 2020 20:29:25 +0000 (15:29 -0500)]
ceph-facts: remove is_podman fact

This was used before the CentOS 8 requirement when using CentOS 7
atomic which has both docker and podman installed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agopurge: fix purge cluster failed
wujie1993 [Sun, 5 Jan 2020 07:31:46 +0000 (15:31 +0800)]
purge: fix purge cluster failed

Fix purge cluster failed when local container images does not exist.

Purge node-exporter and grafana-server only when dashboard_enabled is set to True.

Signed-off-by: wujie1993 qq594jj@gmail.com
5 years agoiscsi: Fix crashes during rolling update
Mike Christie [Tue, 28 Jan 2020 22:31:55 +0000 (16:31 -0600)]
iscsi: Fix crashes during rolling update

During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.

This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1795806

Signed-off-by: Mike Christie <mchristi@redhat.com>
5 years agoceph-common: rhcs 4 repositories for rhel 7
Dimitri Savineau [Fri, 31 Jan 2020 13:59:21 +0000 (08:59 -0500)]
ceph-common: rhcs 4 repositories for rhel 7

RHCS 4 is available for both RHEL 7 and 8 so we should also enable the
cdn repositories for that distribution.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1796853
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoconfig: fix external client scenario
Guillaume Abrioux [Fri, 31 Jan 2020 10:51:54 +0000 (11:51 +0100)]
config: fix external client scenario

When no monitor group is present in the inventory, this task fails.
This affects only non-containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add external_clients scenario
Guillaume Abrioux [Thu, 30 Jan 2020 12:00:47 +0000 (13:00 +0100)]
tests: add external_clients scenario

This commit adds a new 'external ceph clients' scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-container-engine: lvm2 on OSD nodes only
Dimitri Savineau [Wed, 29 Jan 2020 03:31:04 +0000 (22:31 -0500)]
ceph-container-engine: lvm2 on OSD nodes only

Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-defaults: remove rgw from ceph_conf_overrides
Dimitri Savineau [Tue, 28 Jan 2020 15:27:34 +0000 (10:27 -0500)]
ceph-defaults: remove rgw from ceph_conf_overrides

The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794552
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: add quotes when passing password to the CLI
Guillaume Abrioux [Tue, 28 Jan 2020 14:32:27 +0000 (15:32 +0100)]
dashboard: add quotes when passing password to the CLI

Otherwise, if the variables contains a '$' it will be interpreted as a BASH
variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: set dashboard|grafana_admin_password
Guillaume Abrioux [Tue, 28 Jan 2020 13:04:45 +0000 (14:04 +0100)]
tests: set dashboard|grafana_admin_password

Set these 2 variables in all test scenarios where `dashboard_enabled` is
`True`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agovalidate: fail if dashboard|grafana_admin_password aren't set
Guillaume Abrioux [Tue, 28 Jan 2020 12:55:54 +0000 (13:55 +0100)]
validate: fail if dashboard|grafana_admin_password aren't set

This commit adds a task to make sure user set a custom password for
`grafana_admin_password` and `dashboard_admin_password` variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795509
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: fix _container_exec_cmd fact value
Dimitri Savineau [Wed, 29 Jan 2020 02:34:24 +0000 (21:34 -0500)]
ceph-facts: fix _container_exec_cmd fact value

When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).

Later the container exec commands will fail because the container name
is wrong.

We should always set the _container_exec_cmd based on the
ansible_hostname fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1795792
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: set extras vars for filestore-to-bluestore
Dimitri Savineau [Mon, 27 Jan 2020 16:31:47 +0000 (11:31 -0500)]
tox: set extras vars for filestore-to-bluestore

The ansible extra variables aren't set with the ansible-playbook
command running the filestore-to-bluestore playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofilestore-to-bluestore: fix undefine osd_fsid_list
Dimitri Savineau [Mon, 27 Jan 2020 14:36:56 +0000 (09:36 -0500)]
filestore-to-bluestore: fix undefine osd_fsid_list

If the playbook is used on a host running bluestore OSDs then the
osd_fsid_list won't be filled because the bluestore OSDs are reported
with 'type: block' via ceph-volume lvm list command but we are looking
for 'type: data' (filestore).

TASK [zap ceph-volume prepared OSDs] *********
fatal: [xxxxx]: FAILED! =>
  msg: '''osd_fsid_list'' is undefined

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: add 'all_in_one' scenario
Guillaume Abrioux [Mon, 27 Jan 2020 14:49:30 +0000 (15:49 +0100)]
tests: add 'all_in_one' scenario

Add new scenario 'all_in_one' in order to catch more collocated related
issues.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofix calls to `container_exec_cmd` in ceph-osd role
Guillaume Abrioux [Mon, 27 Jan 2020 12:31:29 +0000 (13:31 +0100)]
fix calls to `container_exec_cmd` in ceph-osd role

We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794900
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: skip bluestore osd nodes
Dimitri Savineau [Thu, 23 Jan 2020 21:58:14 +0000 (16:58 -0500)]
filestore-to-bluestore: skip bluestore osd nodes

If the OSD node is already using bluestore OSDs then we should skip
all the remaining tasks to avoid purging OSD for nothing.
Instead we warn the user.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofilestore-to-bluestore: don't fail when with no PV
Dimitri Savineau [Fri, 24 Jan 2020 16:50:34 +0000 (11:50 -0500)]
filestore-to-bluestore: don't fail when with no PV

When the PV is already removed from the devices then we should not fail
to avoid errors like:

stderr: No PV found on device /dev/sdb.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoEnsure that ganesha log directory exists
Dmitriy Rabotyagov [Mon, 20 Jan 2020 11:44:23 +0000 (13:44 +0200)]
Ensure that ganesha log directory exists

Some ganesha packages do not create ganesha log directories
while it's expected to be created while changing it's permissions.
Additionally it's no much sense in doing that as a separate task,
so directory is created as correct permissions are set with creation of
the rest required directories.

Signed-off-by: Dmitriy Rabotyagov <drabotyagov@vexxhost.com>
5 years agohandler: read container_exec_cmd value from first mon
Guillaume Abrioux [Thu, 23 Jan 2020 14:51:17 +0000 (15:51 +0100)]
handler: read container_exec_cmd value from first mon

Given that we delegate to the first monitor, we must read the value of
`container_exec_cmd` from this node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792320
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: Fix for 'running_mon is undefined' error, so that
Vytenis Sabaliauskas [Thu, 23 Jan 2020 08:58:18 +0000 (10:58 +0200)]
ceph-facts: Fix for 'running_mon is undefined' error, so that
fact 'running_mon' is set once 'grep' successfully exits with 'rc == 0'

Signed-off-by: Vytenis Sabaliauskas <vytenis.sabaliauskas@protonmail.com>
5 years agosite-container: don't skip ceph-container-common
Dimitri Savineau [Wed, 22 Jan 2020 19:45:38 +0000 (14:45 -0500)]
site-container: don't skip ceph-container-common

On HCI environment the OSD and Client nodes are collocated. Because we
aren't running the ceph-container-common role on the client nodes except
the first one (for keyring purpose) then the ceph-role execution fails
due to undefined variables.

Closes: #4970
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1794195
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: support upgrading 3.x + ceph-metrics on a dedicated node
Guillaume Abrioux [Wed, 22 Jan 2020 14:00:01 +0000 (15:00 +0100)]
rolling_update: support upgrading 3.x + ceph-metrics on a dedicated node

When upgrading from RHCS 3.x where ceph-metrics was deployed on a
dedicated node to RHCS 4.0, it fails like following:

```
fatal: [magna005]: FAILED! => changed=false
  gid: 0
  group: root
  mode: '0755'
  msg: 'chown failed: failed to look up user ceph'
  owner: root
  path: /etc/ceph
  secontext: unconfined_u:object_r:etc_t:s0
  size: 4096
  state: directory
  uid: 0
```

because we are trying to run `ceph-config` on this node, it doesn't make
sense so we should simply run this play on all groups except
`[grafana-server]`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1793885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: fix osd_auto_discovery
Dimitri Savineau [Tue, 21 Jan 2020 21:37:10 +0000 (16:37 -0500)]
filestore-to-bluestore: fix osd_auto_discovery

When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocommon: add a default value for ceph_directories_mode
Guillaume Abrioux [Tue, 21 Jan 2020 14:30:16 +0000 (15:30 +0100)]
common: add a default value for ceph_directories_mode

Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: --destroy with raw devices
Dimitri Savineau [Mon, 20 Jan 2020 21:40:58 +0000 (16:40 -0500)]
filestore-to-bluestore: --destroy with raw devices

We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.

Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
 stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
  /dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792227
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: set container objectstore env variables
Dimitri Savineau [Mon, 20 Jan 2020 16:24:08 +0000 (11:24 -0500)]
ceph-osd: set container objectstore env variables

Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.

Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792122
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>