git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

ceph_crush: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Please backport to stable-3.2

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 6d506dba1a6fb3a827460d3a7090517cf3241c39)

ceph_volume: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Bruceforce <markus.greis@gmx.de>

Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file.

Also fixed rhcs_edits.txt for variable ceph_docker_registry.

Moved namespace to ceph_docker_image variable.

Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>
(cherry picked from commit 3305309e87b16c42af5f7faf35fd322241e8e964)

osd: backward compatibility with old disk_list.sh location

Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963cee8d8aba1f10659f23bb68c2b1d1b)

ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet

When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5c39735be530b2c7339510486bc4078687236bbb)

Change docker_container parameter network to network_mode

Addressing "populate kv_store with custom ceph.conf":
Unsupported parameters for (docker_container) module. Looking at
https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
shows that the correct parameter is network_mode, not network.

Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>

Set the default crush rule in ceph.conf

Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d8538ad4e16fe76d63e607491d41793303f929b1)

add-osd.yml: Add become flag for ceph-validate

The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b23c05ae5200255fb8452c26834de1e9db1497cc)

ceph-osd: Install numactl package when needed

With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7d73d931fd7fec4c940384c890dee42)

osd: support numactl options on OSD activate

This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3eb9206fada05df811602217d8770db854e0adf)

tests: add mgrs section in non_container-collocation

No mgrs are deployed in this scenario, causing the testinfra jobs to
fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix collocation scenario

ceph_origin and ceph_repository are mandatory variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: use memory backend for cache fact

force ansible to generate facts for each run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a1bafdc2181b3a951991fcc9a5108edde757615)

tests: pin testinfra version

As of testinfra 2.0.0, the binary name is `py.test`.

But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.

Note that I've replaced all `testinfra` occurences by `py.test` anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b42250332a0c9afadcb5a670cc7153d72fd8daec)

add-osd: gather facts in second part of playbook

otherwise, it will end up with error like following:

```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```

because facts won't have been gathered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a4408785332512f6fab6134ee488a4cec18639c1)

purge: fix rbd-mirror group name

the default is rbdmirrors in ceph-defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47ebef374ffbcaed496ff42b3f13bfd21951c333)

purge: fix rbd mirror purge

as of b70d54ac809a92cd88e39e3efa7ed3fee864a866 the service launched isn't
ceph-rbd-mirror@admin.service.

it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9153084778d5bc219d33e3769485cbea8d7a6a9)

purge: do not remove /var/lib/apt/lists/*

removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.

Typical error:
```
fatal: [mon0]: FAILED! => changed=false
  attempts: 3
  msg: No package matching 'ceph' is available
```

The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
  apt:
    update_cache: yes
    cache_valid_time: 3600
  register: result
  until: result is succeeded
```

since the task installing ceph packages has a `update_cache: no` it
fails:

```
- name: install ceph for debian
  apt:
    name: "{{ debian_ceph_pkgs | unique }}"
    update_cache: no
    state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
    default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
  register: result
  until: result is succeeded
```

/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3849f30f58e35d559bfa137fa301214192db993b)

purge: fix purge of lvm devices

using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.

on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
  cmd: command -v ceph-volume
  failed_when_result: false
  msg: '[Errno 2] No such file or directory: ''command'': ''command'''
  rc: 2
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 89f77589fa0a431490933379c0781e1df2b95440)

Extends check_devices tasks to non-collocated an lvm-batch scenarios

Tuned name of a task and error message to make it more user understandable

Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 34c25ef49b10ef6c789447e785a4bf6938c2a804)

Convert interface names to underscores

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540881
Signed-off-by: Tomas Petr <tpetr@redhat.com>
(cherry picked from commit 573adce7dd4f306c384b3308c8049ae49ef59716)

osd: add ipc=host in systemd template for containers

in addition to 15812970f033206b8680cc68351952d49cc18314

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d5be83e5042a5e22ace6250234ccd81acaffb0a2)

tests: update ceph_volume tests

accordingly to change introduced by b5548ea9412cd7741bee993dddcbfd9daa34cb02

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2dcb02d213e862c5a5498c2d12cd86b22676c84)

cv: expose host ipc namespace to ceph-volume container

this is needed to properly handle semaphore synchronization for udev
actions via dmcrypt/cryptsetup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683770
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 15812970f033206b8680cc68351952d49cc18314)

# Conflicts:
# library/ceph_volume.py

tests: add lvm bluestore dmcrypt support

Add coverage for container / non container lvm bluestore dmcrypt OSDs

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 207fae38d480f7de369106c5bda1dfe0f1b6033c)

Removed not needed mountpoint and removed ubuntu section

Referring to BZ#1683290, as dsavineau suggests, being this
bug tripleO specific, removed the ubuntu section and removed
useless mountpoints.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 21fad7ced344e441ffcd5c4010d634b81ead517f)

Added to the ceph-radosgw service template the ca-trust
volume avoiding to expose useless information.
This bug is referred to the following bugzilla:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 0c1944236bfb397e9dff6ef436569556bc00379d)

Set permissions on monitor directory to u=rwX,g=rX,o=rX recursive

Set directories to 755 and files to 644 to
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of
setting files and directories to 755 recursively. The ceph mon
process writes files to this path with permissions 644. This update stops
ansible from updating the permissions in
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes
a file and increases idempotency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683997
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit d327681b99915578fc8b389fda69556966db905f)

mon: Move client admin variable to defaults

There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 58a9d310d5651171214dc2a621cf2ba197229951)

mon: Add mds permissions to client.admin

The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd7b7604de62c49cc979adfc89b4b89c1b39ae6e)

common: do not override ceph_release when ceph_repository is 'rhcs'

We shouldn't reset `ceph_release` with `ceph_stable_release` when
`ceph_repository` is `rhcs`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b60a356343677da6371b7861ee657bfd42c54fd)

osd: make the 'wait for all osd to be up' task configurable

introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21e5db8982afd6e075541e7fc88620d59a1df498)

ensure at least one osd is up

The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.

The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).

This is related to Bugzilla 1578086.

In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.

Signed-off-by: David Waiting <david_waiting@comcast.com>
(cherry picked from commit 3930791cb7d2872e3388d33713171d7a0c1951e8)

ceph_key: fix rstrip for python 3

Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f5c2ca3710844f73960b5e3652c521de97fb3383)

setup_ntp: call handler to disable ntpd if chronyd used

The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.

Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1673664
Fixes: #3582
Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
(cherry picked from commit c605ff6a68720ab43b63086c3ac1d529a651f585)

iscsi: fix permission denied error

Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```

`become: True` is not needed on the following task:

`copy crt file(s) to gateway nodes`.

Since it's already set in the main playbook (site.yml/site-container.yml)

The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).

The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d590f4339a4d758f07388bf97b7eabdcbca6043)

add 'custom' as valid ceph_repository value

This is documented as valid:

https://github.com/ceph/ceph-ansible/blob/561746f75e3913b30e6ae3f14768ebc8a516bf66/group_vars/all.yml.sample#L245

Signed-off-by: Justin Riley <justin.t.riley@gmail.com>
(cherry picked from commit 6a79870d62565eae9ae34a7e5d386941fc8ba590)

Fix uses of default(omit) with string concatenation

When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.

Regression introduced in d53f55e807e.

Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>

tests: do not deploy iscsigw on ubuntu

not supported on non rhel based distribution

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add inventory file

add missing inventory file for ubuntu-container-all_daemons job

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ansible: increase fact cache timeout

10m seems a bit low, indeed, a complete run can take more than 1h.
Let's increase it to 2h

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b37c4adb32715b8749b7d6714a20b8b538bdf214)

osd: expose udev into the container

In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 997667a8734eddaa616fe642e57f6378408736a9)

osd: bind mount /var/run/udev/

without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ade0328072896e99817b070b6a82448024bfb84)

shrink_osd: use cv zap by fsid to remove parts/lvs

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1569413
https://bugzilla.redhat.com/show_bug.cgi?id=1572933

Note: rebased

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 9a43674d2e91ef46917cabe49651c46b630e5ace)

test: add missing test dependency

[nwatkins@smash ceph-ansible]$ virtualenv env
[nwatkins@smash ceph-ansible]$ env/bin/pip install -r tests/requirements.txt
[nwatkins@smash ceph-ansible]$ env/bin/python -c "import mock"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'mock'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 8a5530ee98d3128c9558e8e8e38f9517fb34d7cf)

cv: support zap by osd fsid

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit fce9f6ef60e3725ac6912bcb150ae59e36ff56fb)

set `any_errors_fatal` true for left out host sections

Many hosts sections in site.yml.sample were left out during the
backport commit 6e2cd0930fa17f5d50c73496eff71074301f55bd.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

use shortname in keyring path

socket.gethostname may return a FQDN. Problem found in Linode.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8cd0308f5f570635d66295c442ea49dc2c043194)

tests: run lvm_setup.yml only when osd_scenario is lvm

especially for ooo_collocation scenario which is still using ceph-disk
testing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add nodes for container-all_daemons scenario

add back iscsigw and rbdmirror vm in all_daemons testing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Add a ceph-volume aware shrink-osd playbook

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit f5dacbf7de38a9b08cfcf041438d49acce792afe)

Rename ceph-disk version of shrink-osd playbook

This will be replaced by a ceph-volume aware verison.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 0782cfc546ec398cfa405fb4c9c8226ab52a7960)

tests: specify docker params for shrink-osd

Otherwise, it will go with the default values, eg:

"latest" for `ceph_docker_image_tag`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Fixup shrink_osd[_container] scenario config

** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b964084ab52d6ca949b50f413c0ad9e2362)

tests: refact testing in stable-3.2

Apply the same refact recently introduced in master to stable-3.2

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

override ceph_release with ceph_stable_release

when `ceph_origin` is set to `'repository'` and `ceph_repository` to
`'community'` we need to ensure `ceph_release` reflect
`ceph_stable_release`.

4a3f180f9d29d5a31468ebb3d1c5f31a53a93960 simply removed the override
while it should just have to be run only when the condition mentioned
above is satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bfefdd5bc06b4f1dd03d9060b0a38a6f447b207)

config: remove code related to ceph release prior to luminous

This part of the code is not needed since ceph-ansible@master is
intended to deploy ceph@master only.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1bbdde272f6ae2107174cd521132b4c2f7ed325a)

ceph-default: rm useless condition

This condition is useless and it's also creating issues we don't see in
our CI. ceph_release is set by either ceph-common or ceph-docker-common
so let's keep it this way.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
(cherry picked from commit e9188cd202663656a773eb9e2276c6dbc0684599)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Preserve rolling_update backward compatibility with ansible < 2.5

Let's enforce the default value for `client_update_batch` to 20 since
`ansible_forks` isn't always available.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit ff8dbe114cc1e13c8972993c340cd3b1a189d326)

Vagrantfile: remove useless default values

Those default values are useless and might cause issues.

- `osd_scenario` should be mandatory anyway.
- `pool_default_size` is not used anymore (this has been refactored
recently.

Closes: #3468
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c7a929b2dc75afefc1c30deb6e32e3513a81dc1c)

start_osds: use list instead of keys (re-introduce)

the python3 fix merged by:

https://github.com/ceph/ceph-ansible/pull/3346

was reintroduced a few days later by:

https://github.com/ceph/ceph-ansible/commit/82a6b5adec4d72eb4b7219147f2225b7b2904460

and this patch fixes it again :)

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 3cf5fd2c3ee1fc342ac8dc3365ed82d863c7127e)

site: Make sure is_atomic is defined

configure_firewall tests the is_atomic variable if the firewalld package
is not present. is_atomic is defined in ceph_facts so include that.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit 55fab6f547db8c0604abf536b8f1f469c4cfe2ec)

ceph-facts: resync group_vars file

Signed-off-by: Sébastien Han <seb@redhat.com>

switch: do not fail on missing key

Some people use the switch playbook to perform upgrade so they end up in
the same situation than https://bugzilla.redhat.com/show_bug.cgi?id=1650572
This is applying the same fix as
729744c6a8c69f5fdf66b67fb28063297996e30a.

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Signed-off-by: Sébastien Han <seb@redhat.com>

ceph-infra: remove ntp_rmp.yml and ntp_debian.yml

This commit fixes the merge conflict that occurred during the
auto-backport and auto-merge of the commit
488281187e8ac6c587db74961db9e075f31c8eae.

Also please note that the commit
488281187e8ac6c587db74961db9e075f31c8eae was merged (on PR 3477)
"as it is" (despite of merge conflicts) which was not supposed to be
the case ideally. This had a side-effect that the feature of supporting
multiple NTP daemons (new ones are namely chronyd and timesyncd) was
also backported which is itself against the convention. For
consistency's sake the feature was backported to stable-3.1 as well.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

introduce new role ceph-facts

sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0eb56e36f8ce52015aa6c343faccd589e5fd2c6c)

purge-container: move facts gathering after ceph-defaults role import

This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a12de3e048e1b62b8cbf6bc6d089db1bb880d37c)

purge-container: fix wrong syntax

we want a default value for `mon_group_name`, not for
`groups[mon_group_name]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0b3cb7f851ece584da47af65c4416ad93541065)

purge-docker: do not call ceph-osd role

calling ceph-osd role in purge playbook is not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ae7f3d66a67d9e446bf66aed4f26dd5701afdbda)

purge: gather monitors facts in OSD purge

the OSD part of the purge delegates commands on monitor node, we need to
gather monitors facts to know the `ansible_hostname` fact that is used
in the `docker_exec_cmd` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a4a6ec855fcb2943d78d2a84c29a7e8c588824f)

purge-container: gather fact before calling ceph-defaults

ceph-defaults relies on facts so we must gather facts before running it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 62111ff53c6a139b7ce2195c2b13d0c0b22e2769)

purge-cluster: add support for mon/mgr collocation

Recently we introduced the default collocation of mon/mgr without the
need of a dedicated mgrs section. This means we have to stop the mgr
process on that machine too.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fc6ebd8ebb39c74544ece57c401783908ad1ed24)

purge-cluster: remove support for other init system

We only support systemd and use the service module anyway.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3a154fa0ad64f6704a832743571e5d20b84e9813)

purge-docker-cluster: add support for mgr/mon collocation

Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 325a159415a0eb8699a45c04b2d8ea233b2157c2)

# Conflicts:
# infrastructure-playbooks/purge-docker-cluster.yml

purge-docker-cluste: add a task to check hosts

It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.

We fail if we see containers running.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2bcc00896f623e3baa2f306128115134bdce84ce)

purge-docker-cluster: add ceph-volume support

This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1751885bc9292581e5114f88fe5b513cb396ed72)

The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
This has to be pushed directly to stable-3.2 since master has diverged

Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>

ceph-infra: disable unrequired NTP services

When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6fa757d34358e90ae3a2f035b50d319193521ec5)

ceph-infra: merge ntp_debian.yml and ntp_rpm.yml

Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b03ab607422eda0094d74223d52024a373b7ee9a)

# Conflicts:
# roles/ceph-infra/tasks/ntp_debian.yml
# roles/ceph-infra/tasks/ntp_rpm.yml

fix json data type

Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1663026
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 896676ee80226121785f44f50d1f01fff5aa2fd7)

update: do not enforce `serial: 1` on client nodes

There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 268f2cef821dcb5835bd925c42585ddda5a07861)

set any_errors_fatal to true for all host sections

Add `any_errors_fatal: true` to all host sections in `site.yml.sample`
and `site-container.yml.sample` so that the playbook execution
ceases spontaneously and instantaneously when errors occurs.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5f43dae5938b4c0a3bfbafccf9e2aa13816a237f)

add support for rocksdb and wal on the same partition in non-collocated

Signed-off-by: Kai Wembacher <kai@ktwe.de>
(cherry picked from commit a273ed7f6038b51d3ddb5198d4f3ab57d45bc328)

purge: tox add lvm-setup

Since we deploy > purge > deploy the LVs are gone so we much recreate
them.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 656fbd290121a79722bd5f3af4bd44e928e74ae2)

purge-cluster: skip tasks that use ceph-volume if it's not installed

This will allow the playbook to be idempotent.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit ffd56177e7616ba6345f1f1cc1f3b3e6ea7d66f3)

ceph_keys: pass in module for error messages

fixes: #3421

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 114fac15dc3200bbf9da183c75d889fd75794654)

RELASE-NOTE: fix PR links

Fix wrong position of link and names. The format is [name](link).

Signed-off-by: Sébastien Han <seb@redhat.com>

Add release note for stable-3.2

Signed-off-by: Sébastien Han <seb@redhat.com>

tests: reintroduce purge_cluster scenario

- reintroduce `purge_cluster_container` and `purge_cluster_non_container`
on `stable-3.2`,
- remove all purge scenario based on ceph-disk,
- remove purge_lvm_osds_* scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add purge_lvm_osds_container scenario

This commits adds the purge_lvm_osds_container scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b04fe72f35a1d857968463b8ac0421e9b0e03872)

purge: add iscsi support

add iscsi support for both non containerized and containerized
deployment in purge playbooks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 78116fa6dbe269d1319213251b01e43f1b8f3cff)

revert infra: don't restart firewalld if unit is masked

If firewalld unit is masked, setting `configure_firewall: false` is
enough

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655059
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cff1f98065bf3b4056810a15998411f7300b58a)

rolling_update: fail if less than 3 MONs

... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit cb784c601d2063b95fb7d2514e39518137164e12)

disable nfs scenario

The packages are broken, so let's remove it, until this solved.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a502327e52b2577a721790fce1cdc5e3201678bf)

test: disable nfs for containers

Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe94eb874b415d1cfcf329e20232ba9a)

osd: discover osd_objectstore on the fly

Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.

Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c5113019893c92c4d75c9fc457b04158b86398b)

ceph-osd: change jinja condition

If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bef522627e1e9827b86710c7a54f35a0cd596fbb)

rolling_update: do not fail on missing keys

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc901c6af67300f7b7b8da1b2d0a74147798da5)

rgw: use correct default rgw frontend address

since 0.0.0.0 is the default radosgw address (not 'address'), not
configuring an address explicitly, and instead configuring the radosgw
interface, would result in 0.0.0.0 being used, instead of falling
through to section that inspects the interface config option.

backport note: this cannot be cherry-picked from master since this code
doesn't exist in master.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1655131

Signed-off-by: Noah Watkins <nwatkins@redhat.com>

tox.ini: setup LVs in OSD hosts for '*-cluster' scenarios

... as the scenarios set up ceph clusters with LVM OSDs.

Closes: https://github.com/ceph/ceph-ansible/issues/3399
Signed-off-by: Ramana Raja <rraja@redhat.com>