git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

tests: set copy_admin_key at group_vars level

setting it at extra vars level prevent from setting it per node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5bb6a4da4267b987aec6e20a8d09b18eebc2c693)

global: remove fetch_directory dependency

This commit drops the fetch_directory dependency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622688
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ab370b6ad823e551cfc324fd9c264633a34b72b5)

infrastructure-playbooks: add filestore-to-bluestore.yml

This playbook helps to migrate all osds on a node from filestore to
bluestore backend.
Note that *ALL* osd on the specified osd nodes will be shrinked and
redeployed.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1729267
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3f9ccdaa8ab9ce299c300dedf239eb91f6aa44f0)

osd: add wal_devices option support to ceph_volume module

This commit adds the `wal_devices` option support to the
ceph_volume module.
passing a devices list in `bluestore_wal_devices` will make ceph-volume
creating 1 vg using these devices to create block.wal partitions.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09e04a91973304fda9ec006776f4a9b9f2bc93b9)

osd: update doc text in defaults/main.yml

This commit removes ceph-disk references.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70f1b37097b24edadad49158713fccb94b57c43e)

osd: add block_db_devices option support to ceph_volume module

This commit adds the `block_db_devices` option support to the
ceph_volume module.
passing a devices list in `dedicated_devices` will make ceph-volume
creating 1 vg using these devices to create block.db partitions for data
devices.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b836eaa47fc0a84d5c5d79dfadd9dab96eb6472)

lv-create: fix a typo

This commit fixes a typo.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c785ad3637aa06f0457ae7fbf435907da4c5d929)

shrink-rgw.yml: fix confirmation play's name

the confirmation play's name should confirm removing rgw instead of
monitor

Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
(cherry picked from commit 9fa98d79fde5f886e43fb27a9c8139e8271c0095)

group_vars: remove useless dashboard files

The only useful ansible group for the grafana/prometheus stack is
grafana-server so no one of those files are actually needed.
The default values for all dashboard roles are present in ceph-defaults
role so it's also present in in group_vars/all.yml.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ec56a95013efef59c591310cc16f3037cdaba255)

validate: check ceph_docker_registry_* length

This commit adds a condition to check whether these variables are empty.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b97ac921bcd913e2daf0fa106d98ff1d00743c9)

container: Allow to use registry authentication

The registry.redhat.io regsitry requires authentication so before pulling
the RHCS 4 container images from the registry we need to do the login
step.
This is done via the new ceph_docker_registry_auth variable. The
default value is false but true for RHCS setup.
When set to true, you need to provide the username and password
for the registry via the associated variables.
This patch also updates the ceph_docker_registry value for RHCS setup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1748911
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9f4a99fb244a705b5f04a9e8ec911d425a4bd23f)

rhel8: add default python bin path

On RHEL 8 system we should check the /usr/libexec/platform-python path
instead of installing python36 package.

[DEPRECATION WARNING]: Distribution redhat 8.0 on host xxxxx should use
/usr/libexec/platform-python, but is using /usr/bin/python for backward
compatibility with prior Ansible releases. A future Ansible release will
default to using the discovered platform python for this host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f90696c36e0a0cce57e61a66057b628267fdc3ed)

shrink-mon: search mon in the quorum_names list

If we're looking at the mon hostname in the ceph status output then
there's some scenarios where this could be true.
If we collocate some services (mons, mgrs, etc..) then the hostname of
the monitor to shrink will still be present in the ceph status (like
in mgrs or other).
Instead we should check the hostame only in the mon part of the output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 734c0dc3106a14a0bacd952d1fd91e3d8856bac6)

ceph-handler: Fix osd restart condition

In containerized deployment, the restart OSD handler couldn't be
triggered in most ansible execution.
This is due to the usage of run_once + a condition on the inventory
hostname and the last filter.
The run_once is triggered first so ansible will pick a node in the
osd group to execute the restart task. But if this node isn't the
last one in the osd group then the task is ignored. There's more
probability that the task will be ignored than executed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5b1c15653fcb4772f0839f3a57f7e36ba1b86f49)

rbd-mirror: Allow to copy the admin keyring

The ceph-rbd-mirror role allows to copy the admin keyring via the
copy_admin_key variable but there's actually no task in that role
doing the job.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1f505628dd5e62226ceee975679f1629788771f9)

rbd-mirror: Use the rbd mirror client keyring

The admin keyring isn't present by default on the rbd mirror nodes so
the rbd commands related to the mirroring confguration will fail.
Instead we can use the rbd mirror client keyring.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a3d36df02564d2301a0b51e5d63ea5ffb8ae6968)

tox-update: set the ansible.cfg path before update

During an upgrade we're installation the platform with the stable-3.2
branch. But the ansible configuration is still using the file from the
current branch which could have some differences.
Instead we can override the ANSIBLE_CONFIG environment variable with
the stable-3.2 commands.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a8740026ad3f1503e45add78630cb63bbb413cfc)

Support comma-delimited subnets in firewall

ceph.conf supports a comma separated list of
subnet CIDR's for the public_network and the
cluster network. ceph-ansible should support
setting up the firewall for this configuration.

Closes: #4425
Related: #4333
https://docs.ceph.com/docs/nautilus/rados/configuration/network-config-ref/#network-config-settings

Signed-off-by: Harald Jensås <hjensas@redhat.com>
(cherry picked from commit d94229204d84fc27c5997d273dff577af0ab1684)

Look for additional names when checking ceph-nfs container status

Ganesha cannot be operated active/active, in those deployments
where it is managed by pacemaker the container name can be
different than the default.

This change uses "ceph_nfs_service_suffix" where previously
missing to ensure tasks will work with customized names.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1750005
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit d2a2bd7c423182b60460bffa6b3d6a28c7d12227)

rbd-mirror: configure pool and peer

The rbd mirror configuration was only available for non containerized
deployment and was also imcomplete.
We now enable the mirroring on the pool and add the remote peer in both
scenarios.

The default mirroring mode is set to 'pool' but can be configured via
the ceph_rbd_mirror_mode variable.

This commit also fixes an issue on the rbd mirror command if the ceph
cluster name isn't using the default value (ceph) due to a missing
--cluster parameter to the command.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1665877
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7e5e21741eb0143e3d981dc1891253ad331d9753)

rhcs: Pin downstream containers

We should pin down the versions of downstream container for dashboard
instead of using upstream containers.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 79fdf125c7c26d057b2e0f6e2f637bf3f5c438dc)

Fix discovered_interpreter_python variable

This change fixes the discovered_interpreter_python variable
name that was "discovered_python_interpreter" and caused a
failure in OSP deployments.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 81eb09153373bcd8200ffd914be198fe201fe6af)

openSUSE OBS repo using ceph_stable_release

Instead of hardcoding `luminous`, use the `ceph_stable_release` variable
to point to the correct repository.

This is now uncommented in roles/ceph-defaults/defaults/main.yml to be
available, as it is only used if ceph_repository is set to 'obs'.

group_vars/*.sample files have been regenerated using the
./generate_group_vars_sample.sh script.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 0cedc4d3030d6859935609a055a08d87f44e3354)

Add http_addr option to grafana config

We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 8a666bfd1554267859a39c0f87fe3fb1ea1c7418)

lint: fix error [201,206]

[201] Trailing whitespace
[206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 42082c0a27f368259e5181fb6a3aff8ec4db0190)

fix openSUSE OBS repo creation

roles/ceph-common/tasks/installs/suse_obs_repository.yml:
ansible's zypper_repository module does not know a parameter 'uri', this is
called 'repo' instead

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 4711a7d626053fce4b70d8f89f694e8833ba3756)

ceph-infra: open ceph iscsi/prometheus port

Signed-off-by: Nick Erdmann <n@nirf.de>
(cherry picked from commit 7953ee1b81b03e8b4c542282a900aff2526b9f9b)

tests: use a single grafana node on podman

We don't use multiple grafana nodes for the moment on the others
scenarios and I don't think this is supposed to be working.
We can often see failure on grafana on that scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 825045f6b4333d3435e302a3cdb106422fa58220)

lint: fix error [301], add `changed_when: false` when needed

This commit fixes the error [301]:

`[301] Commands should not change things if nothing needs doing`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 327d5641063e5a435e53ecb95148dca3c099b930)

lint: fix error [306], add pipefail on shell command using pipe

This commit fixes the error [306]:

`[306] Shells that use pipes should set the pipefail option`

using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 102edaeb614f3203202ad6b19dec07bb57cfaf9e)

ceph-mon: Bind mount the ca-trust directory

On containerized deployment, the mon container sometimes needs to
access to the radosgw endpoint (via the radosgw-admin command). When
using TLS on the radosgw with self-signed certificates then we need to
access to the CA certification from the mon container.
The CA certificate needs to be added on the host and then the directory
will be bind mount on the container.

Resolves: #4358

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2b0616ecca1f526dd669d00b7b5f501affebb6ca)

ceph-client: Use profile rbd in keyring caps

Like the OpenStack keyrings, we can use the profile rbd for the clients
keyring (both mon and osd).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 49aa05b96c6614a07127238fe157c2bf87315618)

Revert "osd: add 'osd blacklist' cap for osp keyrings"

This reverts commit 2d955757ee9324a018374f628664e2e15dcb7903.

The "osd blacklist" isn't an osd caps but should be used with mon caps.
Also the correct caps for this is: 'allow command "osd blacklist"'.
The current change is breaking the openstack and clients keyrings.
By using the profile rbd (which is already used) we already rely on the
ability to blacklist dead client.

Resolves: #4385

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 717af83475e4ece252b4a776dcd17b013451a075)

set discovered_python_interpreter if ansible_python_interpreter is defined

If the user has set the `ansible_python_interpreter`, ansible will not try to
discover python, so `discovered_python_interpreter` will not be set.

Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter`
if `ansible_python_interpreter` is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit bd507fa14751398e1248a630fec2262b1540702b)

rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf

since the following commit:
commit 1ac94c048ff1d1385de2892d0ecef7879ec563e9
rgw: add support for multiple rgw instances on a single host

we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
(cherry picked from commit a0590cae9d2da15bfa25da38ce5ac42eb7203502)

global: make directories mode parameterizable

This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 011270ca698ddf9602b8fe52d4e3b98f6b06155d)

ceph-osd: Add ulimit nofile on container start

On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9a4ac46d1977726c6e9b0ce8e4f051f15ce2ca12)

ceph-config: Set changed_when to false on fact gathering statements

The "run 'ceph-volume lvm batch --report' to see how many osds are to be
created" and "run 'ceph-volume lvm list' to see how many osds have already been
created" statements only register the lvm_batch_report and lvm_list variables.
Running those ceph-volume commands should never produce a change on the system.
Adding changed_when: false prevents irrelevant change messages from Ansible.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit e11cbbbcb1f4c9ef45546b94d571a6e1ba7d7678)

facts: fix a typo

This commit fixes a typo in roles/ceph-facts/tasks/facts.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit e1b9312084f3c1370207a88270a1b4cdc1656e7d)

Set proper ownership command performance improvement

By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>
(cherry picked from commit 47bf47c9d87fe057bc1402f7a1aa0a84d13b5fd0)

ceph-nfs: fail on openSUSE Leap using distro packages

roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 11aa5dbb588e69decb9fba204f20a8d021815f98)

handler: do not validate the server certificate against the CA

Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9329bbb3af8b60a0cc1f5eec2129d67bb6aecff6)

install ceph-mds packages on SUSE/openSUSE

install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit c721cb99cbcecdc7e234568096e49874358eff49)

remove duplicate task installing suse dependencies

roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 504017d56257b590deed68fdaa0fe69ad3dfb919)

osd: add 'osd blacklist' cap for osp keyrings

This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d955757ee9324a018374f628664e2e15dcb7903)

validate: do not validate devices or lvm_volumes in osd_auto_discovery case

we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 243edfbc963baa68af0e0acf8489c4b920f57c1a)

only support openSUSE Leap 15.x, fail on 42.x

openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
(cherry picked from commit 5ee3d96fb428a734c58d8823daf660b47595bafa)

osd: remove useless condition

just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70cf2a5846e38e98b32c47a176cb38488af3dc12)

common: replace shell module

there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4df92152c06496b7b9ce9558809f5488192499d3)

shrink-mon: refact 'verify the monitor is out of the cluster' task

use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5573f17e76382d62a74303e3422ddca512247779)

use pre_tasks and post_tasks in shrink-mon.yml too

This commit should've been part of commit
2fb12ae55462f5601a439a104a5b0c01929accd9.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 2034387f57edc8e28ba89107bb67b3e72338290c)

osd: refact 'wait for all osd to be up' task

let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 687087fd43c0ddcadf569ad0ffaac62eac59756f)

common: use discovered_interpreter_python fact

in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 13815ad3cae4e2dbfcb6f5c626b7a381d91c47a6)

refact python installation

This commit refacts the python installation when no available.

In order to avoid generating errors, we check for each package manager
to detect which system we are running on.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d3fa3c2d72f4956a0fa91f10b9d808107c3df3fd)

mgr: refact 'wait for all mgr to be up' task

There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b9b8411086556fa95caa8dd1083eeeb72b5fe3c)

mgr/dashboard: Fix grafana/prometheus url config

When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 4c6ec1dccb994dbd25ec22919f4964bb9b9db8f5)

dashboard: run dashboard role on mgr/mon nodes

We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 16939eff9efc680aebaafa12f2a25a804400b5e9)

ceph-dashboard: Add run_once on delegate tasks

Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f545b5be0d5d5433692a889b0425c164d7c9f812)

ceph-infra: Apply firewall rules with container

We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 771f25b1f823ce46995b62dec84f9f3515bfac2c)

dashboard: do not deploy on Debian based OS/non-containerized

in non-containerized deployment, we can't deploy dashboard on Debian
based distribution since the package `ceph-grafana-dashboards` isn't
available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dc7eb535b6b22ab95cfe204fb6072de40551bde3)

ceph-grafana: Set grafana uid/gid on files

We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 34036c667c244a9a36b8775fb27f5be10fe9fed1)

tests/shrink_rgw: Disable dashboard

The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 867583d5dda602fb6730153e8436001c118e21d9)

tests/functional: add a test for shrink-rgw.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 236b081a3aa4ae408687052c9aa164ad12326193)

# Conflicts:
# tox.ini

add a playbook the remove rgw from a given node

Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 632a44bdf28ab7051884cb475b262344d124d923)

ceph-osd: check container engine rc for pools

When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d549fffdd24d21661b64b31bda20b4e8c6aa82b6)

tests: add more memory in podman job

Typical error :

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - podman
  - exec
  - ceph-mon-mon0
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/ssl
  - 'false'
  delta: '0:00:00.644870'
  end: '2019-07-30 10:17:32.715639'
  msg: non-zero return code
  rc: 1
  start: '2019-07-30 10:17:32.070769'
  stderr: |-
    Traceback (most recent call last):
      File "/usr/bin/ceph", line 140, in <module>
        import rados
    ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
    Error: exit status 1
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Let's add more memory to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0f620b25844b2596f38de8ff13fe5db9ad995d43)

tests: deploy dashboard on mons

there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d649e008934a5f0c6c15c85babdd872f23dbc965)

dashboard: fix timeout usage on rgw user creation command

For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c9d80af4e0368a8e451223123184d4214afd75cb)

library/ceph_volume.py: remove six dependency

The ceph nodes couldn't have the python six library installed which
could lead to error during the ceph_volume custom module execution.

ImportError: No module named six

The six library isn't useful in this module if we're sure that all
action variables passed to the build_ceph_volume_cmd function are a
list and not a string.

Resolves: #4071

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a64a61429d04d25a71d6136d02b4ab4a58ebd880)

infra-playbooks: rewite a condition for better readability

Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5aecdd3ba6e67019eda7adcfa6bb5f2a3cce5081)

tests: test dashboard deployment with podman scenario

This commit adds a grafana-server section in order to test dashboard
deployment with podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3c2fd337d9f5791e0082b5f0f6753f14005969ae)

validate: add checks for grafana-server group definition

this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 02beb0091612c8cd339e1404c8c44c081a46b807)

mgr: fix a typo

this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ec33ee7574172c6f946f1030c26b8b4a6eb285dc)

dashboard: remove cfg80211 module installation

According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] https://github.com/ceph/ceph-ansible/commit/dbf81b6b5bd6d7e977706a93f9e75b38efe32305#diff-961545214e21efed3b84a9e178927a08L21-L23

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b9cdf341beaf5a83bf5eec3b64c2c29ee78185ac)

dashboard: use dedicated group only

There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d67230b2a26b40651c1c1dbee68a92b0e851f3d5)

dashboard: move code into a dedicated playbook

Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 43135840b1570d35314eee3115c2ec826cff4d8d)

dashboard: enable dashboard by default

This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb1b5b3251be71ae7058de358ca17d4982cc6f81)

# Conflicts:
# tox-dashboard.ini

Remove NBSP characters

Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 07c6695d16bc3e8f6d8d5fdc17bd9830b1d94619)

container: rename docker directories

Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 19950b5170950d8eae7360ebf54adaad978e9024)

Avoid to setup provisioners in a fully containerized environment

This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit fac1b030cbc67abfe1af83ec1ffdb648db4bda70)

tests: bump nfs-ganesha version

in stable-4.0, nfs-ganesha 2.8 should be used.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-dashboard: enable rgw options conditionally

The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5383c2f7f357ce6c8adb88422d757874d2f53c05)

tests/dashboard: use the dedicated grafana node

The Vagrant dashboard scenario creates a dedicated grafana node but
was not use in the ansible inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a9a1f633a9756dff08b1067def8effa25d8c1348)

dashboard: use variables for port value

The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 8ab9b719fa42fe493035595612f132b741dde641)

Fix backward compat with old cephfs_pools format

Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit edd1420217d96d87ce84700317b50256648e1fa3)

handler: fix bug in osd handlers

fbf4ed42aee8fa5fd18c4c289cbb80ffeda8f72e introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 618dbf271d04723bf35483a3aa62ef9019f2d390)

validate: fail if gpt header found on unprepared devices

ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 487d7016850524af64254aa3a9ec61222ec926a4)

tests: remove useless setting

this setting is not needed here since we explicitely set it for
container and non container context.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 87b173d02252f3781c36ac3ed26f575397e9c410)

shrink-rbdmirror: check if rbdmirror is well removed from cluster

This commits adds a check to ensure the daemon has been removed from the
cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 916dc1f52f372d52eec045a7780291bef143e037)

tests/functional: add a test for shrink-rbdmirror.yml

Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f80521f773ac8d40afecb617cff19fd8cec497f0)

# Conflicts:
# tox.ini

add a playbook that removes rbd-mirror from a node

Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit c4824acb19053cc6bfee519f3be057c6ecddabf0)

ceph-infra: update handler with daemon variable

Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 0ae0193144897676b56e1d4142565e759531fd35)

ceph-infra: Open prometheus port

The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 41b44dde85d8a4e6070511a723fa874c7e49876e)

handler: remove legacy condition

since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ee29f7370add57bb016b693f662b1a34a0a7c607)

validate: improve message printed in check_devices.yml

The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e6dc3ebd8c8161e56c0a4a1fdb409272b9fd5342)

dashboard: Use upstream default port

We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 21758fcee847fac1146cc784ff93088139afed6b)

ceph-dashboard: remove bool filter for rgw vars

Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5413274412394c314df578ddf5208b79af75f222)

ceph-iscsi: Update gateway config/template

- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1f2a4f1910d75241deb345228227a3ad7711c145)

tests/functional: add a test for shrink-mgr.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5c95c34d4be5ef545c09d03f4bd1124901119f04)

# Conflicts:
# tox.ini

add a playbook that removes manager from a node

Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit f4ea75051b238cc41e2bb3414ff1aa572a16083b)