git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

Extends check_devices tasks to non-collocated an lvm-batch scenarios

Tuned name of a task and error message to make it more user understandable

Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 34c25ef49b10ef6c789447e785a4bf6938c2a804)

Convert interface names to underscores

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540881
Signed-off-by: Tomas Petr <tpetr@redhat.com>
(cherry picked from commit 573adce7dd4f306c384b3308c8049ae49ef59716)

osd: add ipc=host in systemd template for containers

in addition to 15812970f033206b8680cc68351952d49cc18314

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d5be83e5042a5e22ace6250234ccd81acaffb0a2)

tests: update ceph_volume tests

accordingly to change introduced by b5548ea9412cd7741bee993dddcbfd9daa34cb02

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2dcb02d213e862c5a5498c2d12cd86b22676c84)

cv: expose host ipc namespace to ceph-volume container

this is needed to properly handle semaphore synchronization for udev
actions via dmcrypt/cryptsetup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683770
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 15812970f033206b8680cc68351952d49cc18314)

# Conflicts:
# library/ceph_volume.py

tests: add lvm bluestore dmcrypt support

Add coverage for container / non container lvm bluestore dmcrypt OSDs

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 207fae38d480f7de369106c5bda1dfe0f1b6033c)

Removed not needed mountpoint and removed ubuntu section

Referring to BZ#1683290, as dsavineau suggests, being this
bug tripleO specific, removed the ubuntu section and removed
useless mountpoints.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 21fad7ced344e441ffcd5c4010d634b81ead517f)

Added to the ceph-radosgw service template the ca-trust
volume avoiding to expose useless information.
This bug is referred to the following bugzilla:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 0c1944236bfb397e9dff6ef436569556bc00379d)

Set permissions on monitor directory to u=rwX,g=rX,o=rX recursive

Set directories to 755 and files to 644 to
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of
setting files and directories to 755 recursively. The ceph mon
process writes files to this path with permissions 644. This update stops
ansible from updating the permissions in
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes
a file and increases idempotency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683997
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit d327681b99915578fc8b389fda69556966db905f)

mon: Move client admin variable to defaults

There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 58a9d310d5651171214dc2a621cf2ba197229951)

mon: Add mds permissions to client.admin

The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd7b7604de62c49cc979adfc89b4b89c1b39ae6e)

common: do not override ceph_release when ceph_repository is 'rhcs'

We shouldn't reset `ceph_release` with `ceph_stable_release` when
`ceph_repository` is `rhcs`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b60a356343677da6371b7861ee657bfd42c54fd)

osd: make the 'wait for all osd to be up' task configurable

introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21e5db8982afd6e075541e7fc88620d59a1df498)

ensure at least one osd is up

The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.

The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).

This is related to Bugzilla 1578086.

In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.

Signed-off-by: David Waiting <david_waiting@comcast.com>
(cherry picked from commit 3930791cb7d2872e3388d33713171d7a0c1951e8)

ceph_key: fix rstrip for python 3

Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f5c2ca3710844f73960b5e3652c521de97fb3383)

setup_ntp: call handler to disable ntpd if chronyd used

The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.

Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1673664
Fixes: #3582
Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
(cherry picked from commit c605ff6a68720ab43b63086c3ac1d529a651f585)

iscsi: fix permission denied error

Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```

`become: True` is not needed on the following task:

`copy crt file(s) to gateway nodes`.

Since it's already set in the main playbook (site.yml/site-container.yml)

The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).

The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d590f4339a4d758f07388bf97b7eabdcbca6043)

add 'custom' as valid ceph_repository value

This is documented as valid:

https://github.com/ceph/ceph-ansible/blob/561746f75e3913b30e6ae3f14768ebc8a516bf66/group_vars/all.yml.sample#L245

Signed-off-by: Justin Riley <justin.t.riley@gmail.com>
(cherry picked from commit 6a79870d62565eae9ae34a7e5d386941fc8ba590)

Fix uses of default(omit) with string concatenation

When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.

Regression introduced in d53f55e807e.

Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>

tests: do not deploy iscsigw on ubuntu

not supported on non rhel based distribution

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add inventory file

add missing inventory file for ubuntu-container-all_daemons job

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ansible: increase fact cache timeout

10m seems a bit low, indeed, a complete run can take more than 1h.
Let's increase it to 2h

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b37c4adb32715b8749b7d6714a20b8b538bdf214)

osd: expose udev into the container

In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 997667a8734eddaa616fe642e57f6378408736a9)

osd: bind mount /var/run/udev/

without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ade0328072896e99817b070b6a82448024bfb84)

shrink_osd: use cv zap by fsid to remove parts/lvs

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1569413
https://bugzilla.redhat.com/show_bug.cgi?id=1572933

Note: rebased

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 9a43674d2e91ef46917cabe49651c46b630e5ace)

test: add missing test dependency

[nwatkins@smash ceph-ansible]$ virtualenv env
[nwatkins@smash ceph-ansible]$ env/bin/pip install -r tests/requirements.txt
[nwatkins@smash ceph-ansible]$ env/bin/python -c "import mock"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'mock'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 8a5530ee98d3128c9558e8e8e38f9517fb34d7cf)

cv: support zap by osd fsid

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit fce9f6ef60e3725ac6912bcb150ae59e36ff56fb)

set `any_errors_fatal` true for left out host sections

Many hosts sections in site.yml.sample were left out during the
backport commit 6e2cd0930fa17f5d50c73496eff71074301f55bd.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

use shortname in keyring path

socket.gethostname may return a FQDN. Problem found in Linode.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8cd0308f5f570635d66295c442ea49dc2c043194)

tests: run lvm_setup.yml only when osd_scenario is lvm

especially for ooo_collocation scenario which is still using ceph-disk
testing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add nodes for container-all_daemons scenario

add back iscsigw and rbdmirror vm in all_daemons testing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Add a ceph-volume aware shrink-osd playbook

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit f5dacbf7de38a9b08cfcf041438d49acce792afe)

Rename ceph-disk version of shrink-osd playbook

This will be replaced by a ceph-volume aware verison.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 0782cfc546ec398cfa405fb4c9c8226ab52a7960)

tests: specify docker params for shrink-osd

Otherwise, it will go with the default values, eg:

"latest" for `ceph_docker_image_tag`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Fixup shrink_osd[_container] scenario config

** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b964084ab52d6ca949b50f413c0ad9e2362)

tests: refact testing in stable-3.2

Apply the same refact recently introduced in master to stable-3.2

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

override ceph_release with ceph_stable_release

when `ceph_origin` is set to `'repository'` and `ceph_repository` to
`'community'` we need to ensure `ceph_release` reflect
`ceph_stable_release`.

4a3f180f9d29d5a31468ebb3d1c5f31a53a93960 simply removed the override
while it should just have to be run only when the condition mentioned
above is satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bfefdd5bc06b4f1dd03d9060b0a38a6f447b207)

config: remove code related to ceph release prior to luminous

This part of the code is not needed since ceph-ansible@master is
intended to deploy ceph@master only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bbdde272f6ae2107174cd521132b4c2f7ed325a)

ceph-default: rm useless condition

This condition is useless and it's also creating issues we don't see in
our CI. ceph_release is set by either ceph-common or ceph-docker-common
so let's keep it this way.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
(cherry picked from commit e9188cd202663656a773eb9e2276c6dbc0684599)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Preserve rolling_update backward compatibility with ansible < 2.5

Let's enforce the default value for `client_update_batch` to 20 since
`ansible_forks` isn't always available.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit ff8dbe114cc1e13c8972993c340cd3b1a189d326)

Vagrantfile: remove useless default values

Those default values are useless and might cause issues.

- `osd_scenario` should be mandatory anyway.
- `pool_default_size` is not used anymore (this has been refactored
recently.

Closes: #3468
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7a929b2dc75afefc1c30deb6e32e3513a81dc1c)

start_osds: use list instead of keys (re-introduce)

the python3 fix merged by:

https://github.com/ceph/ceph-ansible/pull/3346

was reintroduced a few days later by:

https://github.com/ceph/ceph-ansible/commit/82a6b5adec4d72eb4b7219147f2225b7b2904460

and this patch fixes it again :)

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 3cf5fd2c3ee1fc342ac8dc3365ed82d863c7127e)

site: Make sure is_atomic is defined

configure_firewall tests the is_atomic variable if the firewalld package
is not present. is_atomic is defined in ceph_facts so include that.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit 55fab6f547db8c0604abf536b8f1f469c4cfe2ec)

ceph-facts: resync group_vars file

Signed-off-by: Sébastien Han <seb@redhat.com>

switch: do not fail on missing key

Some people use the switch playbook to perform upgrade so they end up in
the same situation than https://bugzilla.redhat.com/show_bug.cgi?id=1650572
This is applying the same fix as
729744c6a8c69f5fdf66b67fb28063297996e30a.

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Signed-off-by: Sébastien Han <seb@redhat.com>

ceph-infra: remove ntp_rmp.yml and ntp_debian.yml

This commit fixes the merge conflict that occurred during the
auto-backport and auto-merge of the commit
488281187e8ac6c587db74961db9e075f31c8eae.

Also please note that the commit
488281187e8ac6c587db74961db9e075f31c8eae was merged (on PR 3477)
"as it is" (despite of merge conflicts) which was not supposed to be
the case ideally. This had a side-effect that the feature of supporting
multiple NTP daemons (new ones are namely chronyd and timesyncd) was
also backported which is itself against the convention. For
consistency's sake the feature was backported to stable-3.1 as well.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

introduce new role ceph-facts

sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0eb56e36f8ce52015aa6c343faccd589e5fd2c6c)

purge-container: move facts gathering after ceph-defaults role import

This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a12de3e048e1b62b8cbf6bc6d089db1bb880d37c)

purge-container: fix wrong syntax

we want a default value for `mon_group_name`, not for
`groups[mon_group_name]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0b3cb7f851ece584da47af65c4416ad93541065)

purge-docker: do not call ceph-osd role

calling ceph-osd role in purge playbook is not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ae7f3d66a67d9e446bf66aed4f26dd5701afdbda)

purge: gather monitors facts in OSD purge

the OSD part of the purge delegates commands on monitor node, we need to
gather monitors facts to know the `ansible_hostname` fact that is used
in the `docker_exec_cmd` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a4a6ec855fcb2943d78d2a84c29a7e8c588824f)

purge-container: gather fact before calling ceph-defaults

ceph-defaults relies on facts so we must gather facts before running it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 62111ff53c6a139b7ce2195c2b13d0c0b22e2769)

purge-cluster: add support for mon/mgr collocation

Recently we introduced the default collocation of mon/mgr without the
need of a dedicated mgrs section. This means we have to stop the mgr
process on that machine too.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fc6ebd8ebb39c74544ece57c401783908ad1ed24)

purge-cluster: remove support for other init system

We only support systemd and use the service module anyway.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3a154fa0ad64f6704a832743571e5d20b84e9813)

purge-docker-cluster: add support for mgr/mon collocation

Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 325a159415a0eb8699a45c04b2d8ea233b2157c2)

# Conflicts:
# infrastructure-playbooks/purge-docker-cluster.yml

purge-docker-cluste: add a task to check hosts

It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.

We fail if we see containers running.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2bcc00896f623e3baa2f306128115134bdce84ce)

purge-docker-cluster: add ceph-volume support

This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1751885bc9292581e5114f88fe5b513cb396ed72)

The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
This has to be pushed directly to stable-3.2 since master has diverged

Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>

ceph-infra: disable unrequired NTP services

When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6fa757d34358e90ae3a2f035b50d319193521ec5)

ceph-infra: merge ntp_debian.yml and ntp_rpm.yml

Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b03ab607422eda0094d74223d52024a373b7ee9a)

# Conflicts:
# roles/ceph-infra/tasks/ntp_debian.yml
# roles/ceph-infra/tasks/ntp_rpm.yml

fix json data type

Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1663026
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 896676ee80226121785f44f50d1f01fff5aa2fd7)

update: do not enforce `serial: 1` on client nodes

There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 268f2cef821dcb5835bd925c42585ddda5a07861)

set any_errors_fatal to true for all host sections

Add `any_errors_fatal: true` to all host sections in `site.yml.sample`
and `site-container.yml.sample` so that the playbook execution
ceases spontaneously and instantaneously when errors occurs.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5f43dae5938b4c0a3bfbafccf9e2aa13816a237f)

add support for rocksdb and wal on the same partition in non-collocated

Signed-off-by: Kai Wembacher <kai@ktwe.de>
(cherry picked from commit a273ed7f6038b51d3ddb5198d4f3ab57d45bc328)

purge: tox add lvm-setup

Since we deploy > purge > deploy the LVs are gone so we much recreate
them.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 656fbd290121a79722bd5f3af4bd44e928e74ae2)

purge-cluster: skip tasks that use ceph-volume if it's not installed

This will allow the playbook to be idempotent.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit ffd56177e7616ba6345f1f1cc1f3b3e6ea7d66f3)

ceph_keys: pass in module for error messages

fixes: #3421

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 114fac15dc3200bbf9da183c75d889fd75794654)

RELASE-NOTE: fix PR links

Fix wrong position of link and names. The format is [name](link).

Signed-off-by: Sébastien Han <seb@redhat.com>

Add release note for stable-3.2

Signed-off-by: Sébastien Han <seb@redhat.com>

tests: reintroduce purge_cluster scenario

- reintroduce `purge_cluster_container` and `purge_cluster_non_container`
on `stable-3.2`,
- remove all purge scenario based on ceph-disk,
- remove purge_lvm_osds_* scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add purge_lvm_osds_container scenario

This commits adds the purge_lvm_osds_container scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b04fe72f35a1d857968463b8ac0421e9b0e03872)

purge: add iscsi support

add iscsi support for both non containerized and containerized
deployment in purge playbooks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 78116fa6dbe269d1319213251b01e43f1b8f3cff)

revert infra: don't restart firewalld if unit is masked

If firewalld unit is masked, setting `configure_firewall: false` is
enough

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655059
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cff1f98065bf3b4056810a15998411f7300b58a)

rolling_update: fail if less than 3 MONs

... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit cb784c601d2063b95fb7d2514e39518137164e12)

disable nfs scenario

The packages are broken, so let's remove it, until this solved.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a502327e52b2577a721790fce1cdc5e3201678bf)

test: disable nfs for containers

Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe94eb874b415d1cfcf329e20232ba9a)

osd: discover osd_objectstore on the fly

Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.

Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c5113019893c92c4d75c9fc457b04158b86398b)

ceph-osd: change jinja condition

If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bef522627e1e9827b86710c7a54f35a0cd596fbb)

rolling_update: do not fail on missing keys

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc901c6af67300f7b7b8da1b2d0a74147798da5)

rgw: use correct default rgw frontend address

since 0.0.0.0 is the default radosgw address (not 'address'), not
configuring an address explicitly, and instead configuring the radosgw
interface, would result in 0.0.0.0 being used, instead of falling
through to section that inspects the interface config option.

backport note: this cannot be cherry-picked from master since this code
doesn't exist in master.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1655131

Signed-off-by: Noah Watkins <nwatkins@redhat.com>

tox.ini: setup LVs in OSD hosts for '*-cluster' scenarios

... as the scenarios set up ceph clusters with LVM OSDs.

Closes: https://github.com/ceph/ceph-ansible/issues/3399
Signed-off-by: Ramana Raja <rraja@redhat.com>

osd: manage legacy ceph-disk non-container startup

The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.

Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>

config: write jinja comment with appropriate syntax

jinja comment should be written using the jinja syntax `{# ... #}`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a86c2b85263f84891e2cbf7e782f7ac8891257b3)

rolling_update: default ceph json output to empty dict

So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>

client: change default pool size

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed42262b372ace8688c2b20a05d143e46174ec08)

defaults: change default size for openstack pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d1fe329980b91944cae58e68b909d34892667e7)

defaults: change for default pool size for cephfs_pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fdc438dd0dd7ad91b296008a7335460a88c2ca4a)

defaults: add ceph related vars file

This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1735e9bb016dd30c2164b9f8ec6f644914052b1)

refact osd pool size customization

Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.

If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.

By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```

we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7774069d45477df9f37c98bc414b3bf38cf41feb)

mon: move `osd_pool_default_pg_num` in `ceph-defaults`

`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04342e995db2453b50940aa9933ceb09)

tests: change default pools size

default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

update: fix a typo

`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7c99b6df6d8f0daa05ed8da987984d638af3a794)

tests: do not fully override previous ceph_conf_overrides

We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df86a6c878dfffa4d017537f3be6ff615)

rolling_update: refact set_fact `mon_host`

each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af78173584f1b3a99515e9b94f450be22420c545)

rolling_update: create rbd and rbd-mirror keyrings

During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4e267bee4f9263b9ac3b5649f1e3cf3cbaf12d10)

ceph_key: add a get_key function

When checking if a key exists we also have to ensure that the key exists
on the filesystem, the key can change on Ceph but still have an outdated
version on the filesystem. This solves this issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 691f373543d96d26b1af61c4ff7731fd888a9ce9)

switch: do not look for devices anymore

It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c14f9b78ff7b88419148ac2dd01611b7ec830598)

switch: disable all ceph units

Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit cd56dad9fa4574f8474c362083d97003f62926ab)

switch: do not mask systemd unit

If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0

Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fe1d09925ae1525e99f22a3eab9ca1823c079bda)

osd: re-introduce disk_list check

This commit
https://github.com/ceph/ceph-ansible/commit/4cc1506303739f13bb7a6e1022646ef90e004c90#diff-51bbe3572e46e3b219ad726da44b64ebL13
accidentally removed this check.

This is a must have for ceph-disk based containerized OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>