]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agodashboard: add grafana dashboard support on Debian based OS
liuxu [Thu, 26 Sep 2019 12:47:01 +0000 (20:47 +0800)]
dashboard: add grafana dashboard support on Debian based OS

download grafana dashboard files from github when running on Debian based OS

Signed-off-by: liuxu <liuxu623@gmail.com>
6 years agoInject ceph grafana dashboard layouts
fmount [Tue, 10 Sep 2019 13:20:48 +0000 (15:20 +0200)]
Inject ceph grafana dashboard layouts

This change just adds the task to inject from the
ceph dashboard mgr module the required layouts
to show all the cluster metrics on the grafana
instance.
Since we're now able to push grafana layouts through
the ceph mgr module command, the dashboards configuration
template is no longer needed on containerized environments.
This commit also fixes the Vagrantfile IP static assigment
in the grafana section because it generates an issue (it's
the same of the mgr instance).
Finally, considering some deployments that use an external
grafana server instance, we reworked the 'grafana_server_addr'
assignment to address these requirements.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agorolling_update.yml: force ceph-volume scan on osds
Sam Choraria [Tue, 17 Sep 2019 16:23:02 +0000 (17:23 +0100)]
rolling_update.yml: force ceph-volume scan on osds

The rolling_update.yml playbook fails when scanning ceph-disk osds while
deploying nautilus. The --force flag is required to scan existing osds
and rewrite their json metadata.

Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
6 years agoiscsigw: install python-requests 4450/head
Guillaume Abrioux [Wed, 25 Sep 2019 12:20:48 +0000 (14:20 +0200)]
iscsigw: install python-requests

Typical error at rbd-target-api startup:

```
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: Traceback (most recent call last):
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/bin/rbd-target-api", line 39, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: from gwcli.utils import (APIRequest, valid_gateway, valid_client,
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: File "/usr/lib/python2.7/site-packages/gwcli/utils.py", line 1, in <module>
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: import requests
Sep 25 12:12:29 iscsi-gw0 rbd-target-api[9959]: ImportError: No module named requests
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: pin jinja2 version
Guillaume Abrioux [Wed, 25 Sep 2019 11:50:30 +0000 (13:50 +0200)]
tests: pin jinja2 version

ensure we get the latest jinja2 version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: set copy_admin_key at group_vars level
Guillaume Abrioux [Tue, 24 Sep 2019 17:13:31 +0000 (19:13 +0200)]
tests: set copy_admin_key at group_vars level

setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoglobal: remove fetch_directory dependency
Guillaume Abrioux [Mon, 23 Sep 2019 11:30:05 +0000 (13:30 +0200)]
global: remove fetch_directory dependency

This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoinfrastructure-playbooks: add filestore-to-bluestore.yml
Guillaume Abrioux [Mon, 19 Aug 2019 13:07:10 +0000 (15:07 +0200)]
infrastructure-playbooks: add filestore-to-bluestore.yml

This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: add wal_devices option support to ceph_volume module
Guillaume Abrioux [Fri, 23 Aug 2019 07:02:12 +0000 (09:02 +0200)]
osd: add wal_devices option support to ceph_volume module

This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: update doc text in defaults/main.yml
Guillaume Abrioux [Wed, 21 Aug 2019 09:10:29 +0000 (11:10 +0200)]
osd: update doc text in defaults/main.yml

This commit removes ceph-disk references.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: add block_db_devices option support to ceph_volume module
Guillaume Abrioux [Tue, 20 Aug 2019 13:57:45 +0000 (15:57 +0200)]
osd: add block_db_devices option support to ceph_volume module

This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agolv-create: fix a typo
Guillaume Abrioux [Tue, 20 Aug 2019 12:00:52 +0000 (14:00 +0200)]
lv-create: fix a typo

This commit fixes a typo.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-rgw.yml: fix confirmation play's name
Mehdy [Sun, 22 Sep 2019 07:10:25 +0000 (10:40 +0330)]
shrink-rgw.yml: fix confirmation play's name

the confirmation play's name should confirm removing rgw instead of
monitor

Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
6 years agogroup_vars: remove useless dashboard files
Dimitri Savineau [Tue, 3 Sep 2019 18:16:34 +0000 (14:16 -0400)]
group_vars: remove useless dashboard files

The only useful ansible group for the grafana/prometheus stack is
grafana-server so no one of those files are actually needed.
The default values for all dashboard roles are present in ceph-defaults
role so it's also present in in group_vars/all.yml.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agovalidate: check ceph_docker_registry_* length
Guillaume Abrioux [Wed, 18 Sep 2019 12:41:46 +0000 (14:41 +0200)]
validate: check ceph_docker_registry_* length

This commit adds a condition to check whether these variables are empty.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocontainer: Allow to use registry authentication
Dimitri Savineau [Tue, 10 Sep 2019 19:33:44 +0000 (15:33 -0400)]
container: Allow to use registry authentication

The registry.redhat.io regsitry requires authentication so before pulling
the RHCS 4 container images from the registry we need to do the login
step.
This is done via the new ceph_docker_registry_auth variable. The
default value is false but true for RHCS setup.
When set to true, you need to provide the username and password
for the registry via the associated variables.
This patch also updates the ceph_docker_registry value for RHCS setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorhel8: add default python bin path
Dimitri Savineau [Wed, 11 Sep 2019 15:44:30 +0000 (11:44 -0400)]
rhel8: add default python bin path

On RHEL 8 system we should check the /usr/libexec/platform-python path
instead of installing python36 package.

[DEPRECATION WARNING]: Distribution redhat 8.0 on host xxxxx should use
/usr/libexec/platform-python, but is using /usr/bin/python for backward
compatibility with prior Ansible releases. A future Ansible release will
default to using the discovered platform python for this host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoshrink-mon: search mon in the quorum_names list
Dimitri Savineau [Thu, 12 Sep 2019 15:51:37 +0000 (11:51 -0400)]
shrink-mon: search mon in the quorum_names list

If we're looking at the mon hostname in the ceph status output then
there's some scenarios where this could be true.
If we collocate some services (mons, mgrs, etc..) then the hostname of
the monitor to shrink will still be present in the ceph status (like
in mgrs or other).
Instead we should check the hostame only in the mon part of the output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: do not rely on pg_num to validate rgw_tuning_pools
Guillaume Abrioux [Wed, 18 Sep 2019 10:59:31 +0000 (12:59 +0200)]
tests: do not rely on pg_num to validate rgw_tuning_pools

Since the pg_autoscaler has been enabled recently in ceph, this check
should stick to validate the requested pools are well created only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-handler: Fix osd restart condition
Dimitri Savineau [Mon, 9 Sep 2019 15:23:47 +0000 (11:23 -0400)]
ceph-handler: Fix osd restart condition

In containerized deployment, the restart OSD handler couldn't be
triggered in most ansible execution.
This is due to the usage of run_once + a condition on the inventory
hostname and the last filter.
The run_once is triggered first so ansible will pick a node in the
osd group to execute the restart task. But if this node isn't the
last one in the osd group then the task is ignored. There's more
probability that the task will be ignored than executed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorbd-mirror: Allow to copy the admin keyring
Dimitri Savineau [Mon, 9 Sep 2019 18:33:55 +0000 (14:33 -0400)]
rbd-mirror: Allow to copy the admin keyring

The ceph-rbd-mirror role allows to copy the admin keyring via the
copy_admin_key variable but there's actually no task in that role
doing the job.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorbd-mirror: Use the rbd mirror client keyring
Dimitri Savineau [Mon, 9 Sep 2019 18:18:49 +0000 (14:18 -0400)]
rbd-mirror: Use the rbd mirror client keyring

The admin keyring isn't present by default on the rbd mirror nodes so
the rbd commands related to the mirroring confguration will fail.
Instead we can use the rbd mirror client keyring.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotox-update: set the ansible.cfg path before update
Dimitri Savineau [Tue, 10 Sep 2019 16:23:19 +0000 (12:23 -0400)]
tox-update: set the ansible.cfg path before update

During an upgrade we're installation the platform with the stable-3.2
branch. But the ansible configuration is still using the file from the
current branch which could have some differences.
Instead we can override the ANSIBLE_CONFIG environment variable with
the stable-3.2 commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoLook for additional names when checking ceph-nfs container status
Giulio Fidente [Mon, 9 Sep 2019 17:07:02 +0000 (19:07 +0200)]
Look for additional names when checking ceph-nfs container status

Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.

This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
6 years agoSupport comma-delimited subnets in firewall
Harald Jensås [Fri, 6 Sep 2019 14:24:30 +0000 (16:24 +0200)]
Support comma-delimited subnets in firewall

ceph.conf supports a comma separated list of
subnet CIDR's for the public_network and the
cluster network. ceph-ansible should support
setting up the firewall for this configuration.

Closes: #4425
Related: #4333
https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings

Signed-off-by: Harald Jensås <hjensas@redhat.com>
6 years agotox: Fix incorrect ANSIBLE_CONFIG value
Dimitri Savineau [Mon, 9 Sep 2019 15:11:43 +0000 (11:11 -0400)]
tox: Fix incorrect ANSIBLE_CONFIG value

The ANSIBLE_CONFIG value wasn't set correctly for two scenarios. This
environment variable doesn't use '-F'.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorbd-mirror: configure pool and peer
Dimitri Savineau [Wed, 4 Sep 2019 18:35:20 +0000 (14:35 -0400)]
rbd-mirror: configure pool and peer

The rbd mirror configuration was only available for non containerized
deployment and was also imcomplete.
We now enable the mirroring on the pool and add the remote peer in both
scenarios.

The default mirroring mode is set to 'pool' but can be configured via
the ceph_rbd_mirror_mode variable.

This commit also fixes an issue on the rbd mirror command if the ceph
cluster name isn't using the default value (ceph) due to a missing
--cluster parameter to the command.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorhcs: Pin downstream containers
Boris Ranto [Wed, 4 Sep 2019 19:38:50 +0000 (21:38 +0200)]
rhcs: Pin downstream containers

We should pin down the versions of downstream container for dashboard
instead of using upstream containers.

Signed-off-by: Boris Ranto <branto@redhat.com>
6 years agoFix discovered_interpreter_python variable
fmount [Wed, 4 Sep 2019 07:56:10 +0000 (09:56 +0200)]
Fix discovered_interpreter_python variable

This change fixes the discovered_interpreter_python variable
name that was "discovered_python_interpreter" and caused a
failure in OSP deployments.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agolint: fix error [201,206]
Dimitri Savineau [Thu, 29 Aug 2019 18:11:46 +0000 (14:11 -0400)]
lint: fix error [201,206]

 [201] Trailing whitespace
 [206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-common: remove ceph_stable repo on dev
Dimitri Savineau [Tue, 9 Apr 2019 13:44:27 +0000 (09:44 -0400)]
ceph-common: remove ceph_stable repo on dev

When upgrading from stable to devel release with redhat community
packages, the rpm packages are not updated due to priority introduced
via a7b1e35 (starting nautilus).
We need to remove the ceph stable repositories when configuring the
dev repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd octopus release
Dimitri Savineau [Thu, 4 Apr 2019 16:52:49 +0000 (12:52 -0400)]
Add octopus release

Add the 15th ceph release: octopus.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd http_addr option to grafana config
fmount [Fri, 23 Aug 2019 08:00:30 +0000 (10:00 +0200)]
Add http_addr option to grafana config

We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agoceph_custom_repo: define apt and rpm key for custom repo
Anthony Rusdi [Sun, 25 Aug 2019 18:47:32 +0000 (01:47 +0700)]
ceph_custom_repo: define apt and rpm key for custom repo

This commit also remove the notify on new added debian repo,
force update_cache to yes and define sample ceph_custom_key vars.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
6 years agoopenSUSE OBS repo using ceph_stable_release
Johannes Kastl [Wed, 21 Aug 2019 19:45:57 +0000 (21:45 +0200)]
openSUSE OBS repo using ceph_stable_release

Instead of hardcoding `luminous`, use the `ceph_stable_release` variable
to point to the correct repository.

This is now uncommented in roles/ceph-defaults/defaults/main.yml to be
available, as it is only used if ceph_repository is set to 'obs'.

group_vars/*.sample files have been regenerated using the
./generate_group_vars_sample.sh script.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agofix openSUSE OBS repo creation
Johannes Kastl [Thu, 22 Aug 2019 20:12:51 +0000 (22:12 +0200)]
fix openSUSE OBS repo creation

roles/ceph-common/tasks/installs/suse_obs_repository.yml:
ansible's zypper_repository module does not know a parameter 'uri', this is
called 'repo' instead

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-infra: open ceph iscsi/prometheus port
Nick Erdmann [Tue, 27 Aug 2019 09:25:02 +0000 (11:25 +0200)]
ceph-infra: open ceph iscsi/prometheus port

Signed-off-by: Nick Erdmann <n@nirf.de>
6 years agotests: use a single grafana node on podman
Dimitri Savineau [Wed, 28 Aug 2019 14:59:30 +0000 (10:59 -0400)]
tests: use a single grafana node on podman

We don't use multiple grafana nodes for the moment on the others
scenarios and I don't think this is supposed to be working.
We can often see failure on grafana on that scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: change container image tag for switch_to_containers
Guillaume Abrioux [Fri, 23 Aug 2019 07:29:19 +0000 (09:29 +0200)]
tests: change container image tag for switch_to_containers

test switch_to_containers job against the latest ceph@master
ceph-container image tag available.
In order to be sure the ceph release deployed in the first step (non
containerized deployment) isn't newer than the tag used for the
containerized migration (which would mean we try to downgrade the
version).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoset discovered_python_interpreter if ansible_python_interpreter is defined
Johannes Kastl [Thu, 22 Aug 2019 15:46:05 +0000 (17:46 +0200)]
set discovered_python_interpreter if ansible_python_interpreter is defined

If the user has set the `ansible_python_interpreter`, ansible will not try to
discover python, so `discovered_python_interpreter` will not be set.

Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter`
if `ansible_python_interpreter` is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-mon: Bind mount the ca-trust directory
Dimitri Savineau [Mon, 26 Aug 2019 14:47:05 +0000 (10:47 -0400)]
ceph-mon: Bind mount the ca-trust directory

On containerized deployment, the mon container sometimes needs to
access to the radosgw endpoint (via the radosgw-admin command). When
using TLS on the radosgw with self-signed certificates then we need to
access to the CA certification from the mon container.
The CA certificate needs to be added on the host and then the directory
will be bind mount on the container.

Resolves: #4358

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-client: Use profile rbd in keyring caps
Dimitri Savineau [Mon, 26 Aug 2019 19:35:19 +0000 (15:35 -0400)]
ceph-client: Use profile rbd in keyring caps

Like the OpenStack keyrings, we can use the profile rbd for the clients
keyring (both mon and osd).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoRevert "osd: add 'osd blacklist' cap for osp keyrings"
Dimitri Savineau [Mon, 26 Aug 2019 19:04:41 +0000 (15:04 -0400)]
Revert "osd: add 'osd blacklist' cap for osp keyrings"

This reverts commit 2d955757ee9324a018374f628664e2e15dcb7903.

The "osd blacklist" isn't an osd caps but should be used with mon caps.
Also the correct caps for this is: 'allow command "osd blacklist"'.
The current change is breaking the openstack and clients keyrings.
By using the profile rbd (which is already used) we already rely on the
ability to blacklist dead client.

Resolves: #4385

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoglobal: add newline at end of file
Guillaume Abrioux [Thu, 22 Aug 2019 18:29:40 +0000 (20:29 +0200)]
global: add newline at end of file

This commit re-add a newline at end of files when it's missing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoglobal: make directories mode parameterizable
Artur Fijalkowski [Wed, 1 Aug 2018 12:37:40 +0000 (14:37 +0200)]
global: make directories mode parameterizable

This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf
guihecheng [Wed, 23 Jan 2019 01:36:25 +0000 (09:36 +0800)]
rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf

since the following commit:
  commit 1ac94c048ff1d1385de2892d0ecef7879ec563e9
  rgw: add support for multiple rgw instances on a single host

we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
6 years agolint: fix error [301], add `changed_when: false` when needed
Guillaume Abrioux [Wed, 31 Jul 2019 07:51:12 +0000 (09:51 +0200)]
lint: fix error [301], add `changed_when: false` when needed

This commit fixes the error [301]:

`[301] Commands should not change things if nothing needs doing`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agolint: fix error [306], add pipefail on shell command using pipe
Guillaume Abrioux [Wed, 31 Jul 2019 07:31:50 +0000 (09:31 +0200)]
lint: fix error [306], add pipefail on shell command using pipe

This commit fixes the error [306]:

`[306] Shells that use pipes should set the pipefail option`

using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoplugins/actions/validate.py: allow ceph_repository 'obs' on openSUSE
Johannes Kastl [Wed, 21 Aug 2019 19:54:03 +0000 (21:54 +0200)]
plugins/actions/validate.py: allow ceph_repository 'obs' on openSUSE

Allow the use of 'obs' as a valid value for ceph_repository, and validate that
- OS is openSUSE
- ceph_obs_repo is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-validate: Refactor check for installation check on SUSE/openSUSE
Johannes Kastl [Wed, 21 Aug 2019 18:56:36 +0000 (20:56 +0200)]
ceph-validate: Refactor check for installation check on SUSE/openSUSE

Move the validation from roles/ceph-common/tasks/installs/install_on_suse.yml
to roles/ceph-validate/ and fix the syntax.

There are two valid combinations of `ceph_origin` and `ceph_repository` on
SUSE/openSUSE:
- ceph_origin == 'distro'
- ceph_origin == 'repository' and ceph_repository == 'obs'

The current when condition would fail even in the valid second combination,
as ceph_origin != distro would be true then

Fixes: #4362
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agofacts: fix a typo
Johannes Kastl [Thu, 22 Aug 2019 15:39:47 +0000 (17:39 +0200)]
facts: fix a typo

This commit fixes a typo in roles/ceph-facts/tasks/facts.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-config: Set changed_when to false on fact gathering statements
Kevin Coakley [Thu, 8 Aug 2019 22:32:38 +0000 (15:32 -0700)]
ceph-config: Set changed_when to false on fact gathering statements

The "run 'ceph-volume lvm batch --report' to see how many osds are to be
created" and "run 'ceph-volume lvm list' to see how many osds have already been
created" statements only register the lvm_batch_report and lvm_list variables.
Running those ceph-volume commands should never produce a change on the system.
Adding changed_when: false prevents irrelevant change messages from Ansible.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
6 years agofix SUSE/openSUSE naming
Johannes Kastl [Wed, 21 Aug 2019 16:17:00 +0000 (18:17 +0200)]
fix SUSE/openSUSE naming

As SUSE 15.x and openSUSE Leap 15.x share the same base, make clear
that both are targeted by the respective tasks

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoroles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions
Johannes Kastl [Wed, 21 Aug 2019 19:36:38 +0000 (21:36 +0200)]
roles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions

Fail if SUSE distributions other than 15.x are found, similar to what we have
for openSUSE

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-osd: Add ulimit nofile on container start
Dimitri Savineau [Tue, 6 Aug 2019 15:52:59 +0000 (11:52 -0400)]
ceph-osd: Add ulimit nofile on container start

On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoSet proper ownership command performance improvement
Kevin Jones [Sat, 10 Aug 2019 19:44:32 +0000 (15:44 -0400)]
Set proper ownership command performance improvement

By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>
6 years agoceph-nfs: fail on openSUSE Leap using distro packages
Johannes Kastl [Fri, 16 Aug 2019 09:53:16 +0000 (11:53 +0200)]
ceph-nfs: fail on openSUSE Leap using distro packages

roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoinstall ceph-mds packages on SUSE/openSUSE
Johannes Kastl [Wed, 14 Aug 2019 20:48:34 +0000 (22:48 +0200)]
install ceph-mds packages on SUSE/openSUSE

install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agohandler: do not validate the server certificate against the CA
Guillaume Abrioux [Tue, 20 Aug 2019 09:47:48 +0000 (11:47 +0200)]
handler: do not validate the server certificate against the CA

Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoremove duplicate task installing suse dependencies
Johannes Kastl [Tue, 20 Aug 2019 09:23:29 +0000 (11:23 +0200)]
remove duplicate task installing suse dependencies

roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agovalidate: do not validate devices or lvm_volumes in osd_auto_discovery case
Guillaume Abrioux [Wed, 14 Aug 2019 12:20:58 +0000 (14:20 +0200)]
validate: do not validate devices or lvm_volumes in osd_auto_discovery case

we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove useless condition
Guillaume Abrioux [Mon, 19 Aug 2019 11:51:14 +0000 (13:51 +0200)]
osd: remove useless condition

just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomergify: disable automatic merging on master
Guillaume Abrioux [Tue, 6 Aug 2019 07:35:25 +0000 (09:35 +0200)]
mergify: disable automatic merging on master

automatic merging by mergify is failing for a while now.
Until we can figure out what's wrong, let's disable it on master for now
so we don't merge "failing" PRs although they passed all scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodoc: update backport section
Guillaume Abrioux [Wed, 14 Aug 2019 12:32:51 +0000 (14:32 +0200)]
doc: update backport section

Only maintainers can set labels on PRs, so let's clarify that point in
the doc which says something confusing at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: tests switch_to_containers against octopus
Guillaume Abrioux [Wed, 14 Aug 2019 11:58:47 +0000 (13:58 +0200)]
tests: tests switch_to_containers against octopus

since we have container images for ceph@master, we shouldn't use
nautilus anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: replace shell module
Guillaume Abrioux [Wed, 14 Aug 2019 09:10:12 +0000 (11:10 +0200)]
common: replace shell module

there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-mon: refact 'verify the monitor is out of the cluster' task
Guillaume Abrioux [Wed, 14 Aug 2019 09:04:30 +0000 (11:04 +0200)]
shrink-mon: refact 'verify the monitor is out of the cluster' task

use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: refact 'wait for all osd to be up' task
Guillaume Abrioux [Wed, 14 Aug 2019 08:47:40 +0000 (10:47 +0200)]
osd: refact 'wait for all osd to be up' task

let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: use discovered_interpreter_python fact
Guillaume Abrioux [Wed, 14 Aug 2019 07:56:41 +0000 (09:56 +0200)]
common: use discovered_interpreter_python fact

in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: update test_mgr_is_up()
Guillaume Abrioux [Wed, 14 Aug 2019 07:08:24 +0000 (09:08 +0200)]
tests: update test_mgr_is_up()

the data structure has changed in octopus:

```
    "mgrmap": {
        "available": true,
        "modules": [
            "dashboard",
            "prometheus"
        ],
        "num_standbys": 0,
        "services": {
            "prometheus": "http://mgr0:9283/"
        }
    },
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: update the check for 'all osd to be up'
Guillaume Abrioux [Wed, 14 Aug 2019 05:31:09 +0000 (07:31 +0200)]
osd: update the check for 'all osd to be up'

the data structure has changed in octopus.
eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorefact python installation
Guillaume Abrioux [Thu, 1 Aug 2019 07:37:34 +0000 (09:37 +0200)]
refact python installation

This commit refacts the python installation when no available.

In order to avoid generating errors, we check for each package manager
to detect which system we are running on.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: fix wrong paths for lv-create in tox.ini
Igor [Tue, 6 Aug 2019 08:57:26 +0000 (11:57 +0300)]
tests: fix wrong paths for lv-create in tox.ini
solution: change paths inside tox.ini file
Fixes: #4311
Signed-off-by: Bogomolov Igor <igor95n@gmail.com>
6 years agoRevert "tests: disable nfs-ganesha deployment"
Dimitri Savineau [Fri, 2 Aug 2019 15:48:32 +0000 (11:48 -0400)]
Revert "tests: disable nfs-ganesha deployment"

This reverts commit 83940e624bc4faf6bc7bb1a7637711556d6b3e8c.

Because nfs-ganesha@master (2.9-dev) build has been fixed by [1] then
we can test nfs-ganesha in the CI for master/octopus.

[1] https://github.com/ceph/ceph-build/pull/1346

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agomgr: refact 'wait for all mgr to be up' task
Guillaume Abrioux [Tue, 6 Aug 2019 09:11:15 +0000 (11:11 +0200)]
mgr: refact 'wait for all mgr to be up' task

There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr/dashboard: Fix grafana/prometheus url config
Dimitri Savineau [Fri, 2 Aug 2019 17:35:22 +0000 (13:35 -0400)]
mgr/dashboard: Fix grafana/prometheus url config

When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: run dashboard role on mgr/mon nodes
Dimitri Savineau [Fri, 2 Aug 2019 15:24:03 +0000 (11:24 -0400)]
dashboard: run dashboard role on mgr/mon nodes

We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-dashboard: Add run_once on delegate tasks
Dimitri Savineau [Fri, 2 Aug 2019 14:58:11 +0000 (10:58 -0400)]
ceph-dashboard: Add run_once on delegate tasks

Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoonly support openSUSE Leap 15.x, fail on 42.x
Johannes Kastl [Sat, 27 Jul 2019 14:09:26 +0000 (16:09 +0200)]
only support openSUSE Leap 15.x, fail on 42.x

openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-infra: Apply firewall rules with container
Dimitri Savineau [Wed, 31 Jul 2019 18:02:41 +0000 (14:02 -0400)]
ceph-infra: Apply firewall rules with container

We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-grafana: Set grafana uid/gid on files
Dimitri Savineau [Tue, 30 Jul 2019 20:09:47 +0000 (16:09 -0400)]
ceph-grafana: Set grafana uid/gid on files

We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: do not deploy on Debian based OS/non-containerized
Guillaume Abrioux [Wed, 31 Jul 2019 14:15:43 +0000 (16:15 +0200)]
dashboard: do not deploy on Debian based OS/non-containerized

in non-containerized deployment, we can't deploy dashboard on Debian
based distribution since the package `ceph-grafana-dashboards` isn't
available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocs: Correct weird wording
Theo Ouzhinski [Thu, 1 Aug 2019 00:25:54 +0000 (20:25 -0400)]
docs: Correct weird wording

for the Ceph master branch.

Signed-off-by: Theo Ouzhinski touzhinski@gmail.com
6 years agotests/shrink_rgw: Disable dashboard
Dimitri Savineau [Wed, 31 Jul 2019 18:17:48 +0000 (14:17 -0400)]
tests/shrink_rgw: Disable dashboard

The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: add more memory in podman job
Guillaume Abrioux [Tue, 30 Jul 2019 11:04:15 +0000 (13:04 +0200)]
tests: add more memory in podman job

Typical error :

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - podman
  - exec
  - ceph-mon-mon0
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/ssl
  - 'false'
  delta: '0:00:00.644870'
  end: '2019-07-30 10:17:32.715639'
  msg: non-zero return code
  rc: 1
  start: '2019-07-30 10:17:32.070769'
  stderr: |-
    Traceback (most recent call last):
      File "/usr/bin/ceph", line 140, in <module>
        import rados
    ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
    Error: exit status 1
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Let's add more memory to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: deploy dashboard on mons
Guillaume Abrioux [Tue, 30 Jul 2019 09:25:59 +0000 (11:25 +0200)]
tests: deploy dashboard on mons

there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: fix timeout usage on rgw user creation command
Guillaume Abrioux [Tue, 30 Jul 2019 08:16:23 +0000 (10:16 +0200)]
dashboard: fix timeout usage on rgw user creation command

For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests/functional: add a test for shrink-rgw.yml
Rishabh Dave [Wed, 26 Jun 2019 06:09:45 +0000 (11:39 +0530)]
tests/functional: add a test for shrink-rgw.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook the remove rgw from a given node
Rishabh Dave [Wed, 26 Jun 2019 05:59:50 +0000 (11:29 +0530)]
add a playbook the remove rgw from a given node

Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoosd: add 'osd blacklist' cap for osp keyrings
Guillaume Abrioux [Mon, 15 Jul 2019 07:57:06 +0000 (09:57 +0200)]
osd: add 'osd blacklist' cap for osp keyrings

This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd: check container engine rc for pools
Dimitri Savineau [Mon, 22 Jul 2019 20:58:40 +0000 (16:58 -0400)]
ceph-osd: check container engine rc for pools

When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: test dashboard deployment with podman scenario
Guillaume Abrioux [Mon, 29 Jul 2019 12:08:40 +0000 (14:08 +0200)]
tests: test dashboard deployment with podman scenario

This commit adds a grafana-server section in order to test dashboard
deployment with podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: add checks for grafana-server group definition
Guillaume Abrioux [Mon, 29 Jul 2019 08:03:48 +0000 (10:03 +0200)]
validate: add checks for grafana-server group definition

this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: fix a typo
Guillaume Abrioux [Fri, 26 Jul 2019 15:33:07 +0000 (17:33 +0200)]
mgr: fix a typo

this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: remove cfg80211 module installation
Guillaume Abrioux [Fri, 26 Jul 2019 15:23:19 +0000 (17:23 +0200)]
dashboard: remove cfg80211 module installation

According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi          Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] https://github.com/ceph/ceph-ansible/commit/dbf81b6b5bd6d7e977706a93f9e75b38efe32305#diff-961545214e21efed3b84a9e178927a08L21-L23

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: use dedicated group only
Guillaume Abrioux [Thu, 25 Jul 2019 17:08:22 +0000 (19:08 +0200)]
dashboard: use dedicated group only

There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: move code into a dedicated playbook
Dimitri Savineau [Fri, 28 Jun 2019 14:39:38 +0000 (10:39 -0400)]
dashboard: move code into a dedicated playbook

Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: enable dashboard by default
Guillaume Abrioux [Thu, 25 Jul 2019 12:29:11 +0000 (14:29 +0200)]
dashboard: enable dashboard by default

This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoRemove NBSP characters
Dimitri Savineau [Thu, 18 Jul 2019 18:57:46 +0000 (14:57 -0400)]
Remove NBSP characters

Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoinfra-playbooks: rewite a condition for better readability
Rishabh Dave [Thu, 25 Jul 2019 11:32:32 +0000 (17:02 +0530)]
infra-playbooks: rewite a condition for better readability

Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>