git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

iscsi: assign application (rbd) to pool 'rbd'

if we don't assign the rbd application tag on this pool,
the cluster will get `HEALTH_WARN` state like following:

```
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'rbd'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4cf17a6fddc052c944026ae1d138263131e677f8)

ceph-handler: replace fuser by /proc/net/unix

We're using fuser command to see if a process is using a ceph unix
socket file. But the fuser command runs through every PID present in
/proc/<PID> to see if one of them is using the file.
On a system running thousands processes, the fuser command can take
a long time to finish.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1717011

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit da9891da1e8b9a8c91077c74e54a9df8ebb7070d)

validate: fail in check_devices at the right task

see https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17 for details.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168#c17
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 771648304d7d867e053f8b8fe3ce5b36e061f100)

spec: bring back possibility to install ceph with custom repo

This can be seen as a regression for customers who were used to deploy
in offline environment with custom repositories.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1673254
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c933645bf7015e08c97058186954483c40ecbfbd)

update default rhcs values and docs

The RHCS documentation mentionned in the default values and
group_vars directory are referring to RHCS 2.x while it should be
3.x.

Revolves: https://bugzilla.redhat.com/show_bug.cgi?id=1702732

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

vagrant: Default box to centos/7

We don't use ceph/ubuntu-xenial anymore but only centos/7 and
centos/atomic-host.
Changing the default to centos/7.

Resolves: #4036

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 24d0fd70030e3014405bf3bf2d628ede4cee6466)

tox: Refact lvm_osds scenario

The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 52b9f3fb2886d703b25f650221ea973147c68ed6)

igw: Fix rolling update service ordering

We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d7ef12910e7b583fa42f84a7173a87e7c679e79e)

Revert "Revert "cv: support zap by osd fsid""

This reverts commit addcc1e61abb50f53bb82ddac22c643c5ce636b7.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Revert "Revert "shrink_osd: use cv zap by fsid to remove parts/lvs""

This reverts commit 043ee8c1584147665b1a38f27f43e599fc2a775f.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osds: allow passing devices by path

ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8f2c45dfd3d1d3875a480247ca047aa52d7cd1b1)

Revert "cv: support zap by osd fsid"

This reverts commit 8454f0144af10834da0cddb508a5dea11bda3c72.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Revert "shrink_osd: use cv zap by fsid to remove parts/lvs"

This reverts commit be59e0b451df6028c71eca54754d4d1464a8cc83.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: set default bluestore_wal_devices empty

We only need to set the wal dedicated device when there's three tiers
of storage used.
Currently the block.wal partition will also be created on the same
device than block.db.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1685253
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

rolling_update: restart all ceph-iscsi services

Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f1048627eaab27563511011fa3cc31b525e2f4c9)

ceph-mds: Increase cpu limit to 4

In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 1999cf3d1902456aa123ed3c96116c21e88799bb)

ceph-osd: Fix merge conflict from mergify

The PR #3916 was merged automatically by mergify even if there was a
confict in the ceph-osd-run.sh.j2 template.
This commit resolves the conflict.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-osd: Increase cpu limit to 4

In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c17106874c29f3eafb196a30b97fd1f8fd52e768)

# Conflicts:
# roles/ceph-osd/templates/ceph-osd-run.sh.j2

ansible.cfg: Add library path to configuration

Ceph module path needs to be configured if we want to avoid issues
like:

no action detected in task. This often indicates a misspelled module
name, or incorrect module path

Currently the ansible-lint command in Travis CI complains about that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1668478
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a1a871cadee5e86d181e1306c985e620b81fccac)

ceph-mon: increase timeout waiting for admin and bootstrap keys

With a large and/or busy cluster, it can take significantly more than
30s for a restarted monitor to get to the point where
`ceph-create-keys` returns successfully. A recent upgrade of our
production cluster failed here because it took a couple of minutes for
the newly-upgraded `mon` to be ready. So increase the timeout
significantly.

This patch is applied to stable-3.2, because the affected code is
refactored in stable-4.0 and ceph-create-keys is no longer called.

Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>

tests: Add debug to ceph-override.json

It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d25af1b872628607e37741c760aa31b88229f3da)

tests/functional: use ceph-override.json symlink

We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit a19054be18d748a5133216a2339f1989a0e0b27b)

ceph-mds: Set application pool to cephfs

We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d2efb7f02b9e6f6888449dbaeba0e2435606ca43)

remove all NBSPs char in stable-3.2 branch

this can cause issues, let's replace all of these chars with real
spaces.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

UCA: Uncomment UCA variables in defaults, fix consequent breakage

The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 9dd913cf8a3dcc12683b55ae13d95bca6f15cd32)

ceph-osd: Drop memory flag with bluestore

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dc1c0dcee21deefa359cae30d3a733a4348bfd2f)

mon/rgw: use last ipv6 address

When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests: fix update job

jenkins sets CEPH_ANSIBLE_BRANCH to stable-3.2, this makes all
nightly job failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

rgw multisite: add more than 1 rgw to the master or secondary zone

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 37f46a8c5de9585c2639cc4741ee8f62bc2c854b)

tests: run lvm_setup.yml on secondary cluster

otherwise ceph-osd fails:

```
ceph-volume lvm prepare: error: Unable to proceed with non-existing device: test_group/data-lv2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

radosgw: Raise cpu limit to 8

In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d3ae9fd05fe46933a1437501b0d8a5edb4ca2056)

tests: do not deploy ceph@master in rgw_multisite

deploying ceph@master in stable-3.2 is not possible.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add back testinfra testing

136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8d106c2c58d354e10335ca017fd8df4c427e38a6)

tests: pin pytest-xdist to 1.27.0

looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ba0a95211cc00b2cae14b018722f437c0091a2ef)

purge: fix lvm-batch purge osd

`lvm_volumes` and/or `devices` variable(s) can be undefined depending on
the scenario chosen.

These tasks should be run only if these variable are defined, otherwise
it ends up with undefined variable errors.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 01807383132c2897e331dcc665f062f8be0feeb8)

tests: test idempotency only on all_daemons job

there's no need to test this on all scenarios.
testing idempotency on all_daemons should be enough and allow us to save
precious resources for the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 136bfe096c5e97c5c983d02882919d4af2af48a6)

rolling_update: Update systemd unit regex for nvme

The systemd unit regex doesn't handle nvme devices (/dev/nvmeXn1).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1687828
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit c8442f3705d0bd7b64fe2b14d925a82d52a052e4)

tests: refact update scenario (stable-3.2)

refact the update scenario like it has been made in master.
(see f0e616962)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

purge-docker-cluster: Remove ceph-osd service

The systemd ceph-osd@.service file used for starting the ceph osd
containers is used in all osd_scenarios.
Currently purging a containerized deployment using the lvm scenario
didn't remove the ceph-osd systemd service.
If the next deployment is a non-containerized deployment, the OSDs
won't be online because the file is still present and override the
one from the package.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 7cc626b72dbb242a00f714d925b6aea6b4524c37)

tox: Fix container purge jobs

On containerized CI jobs the playbook executed is purge-cluster.yml
but it should be set to purge-docker-cluster.yml

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit bd0869cd01090e135a9312a6890ed7611f8e3a1c)

tests: add mgr and nfs nodes in all_daemons

even not used, we need to fire up those VMs to be able to perform the
upgrade in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Add uca to ceph_repository choices validation

Ubuntu cloud archive is configurable via ceph_repository variable but
the uca choice isn't accepted.
This commit fixes this issue and also validates the associated uca
repository variables.

Resolves: #3739

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 94505a3af264fc0e847e71cbf633dc02ce58e6ff)

defaults: change default value for ceph_docker_image_tag

Since nautilus has been released, it's now the latest stable release, it
means the tag `latest` now refers to nautilus.

`stable-3.2` isn't intended to deploy nautilus, therefore, we should
change the default value for this variable to the latest release
stable-3.2 is able to deploy (mimic).

Closes: #3734
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-osd: Ensure lvm2 is installed

When using osd_scenario lvm, we never check if the lvm2 package is
present on the host.
When using containerized deployment and docker on CentOS/RedHat this
package will be automatically installed as a dependency but not for
Ubuntu distribution.
OSD deployed via ceph-volume require the lvmetad.socket to be active
and running.

Resolves: #3728

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 179fdfbc19ab7001fdd185f131039043d690bbe8)

ceph_crush: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Please backport to stable-3.2

Signed-off-by: Bruceforce <markus.greis@gmx.de>
(cherry picked from commit 6d506dba1a6fb3a827460d3a7090517cf3241c39)

ceph_volume: fix rstrip for python 3
Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Bruceforce <markus.greis@gmx.de>

Remove trailing forward slash in ceph_docker_registry variable from group_vars/rhcs.yml.sample file.

Also fixed rhcs_edits.txt for variable ceph_docker_registry.

Moved namespace to ceph_docker_image variable.

Signed-off-by: Phuong Nguyen <pnguyen@redhat.com>
(cherry picked from commit 3305309e87b16c42af5f7faf35fd322241e8e964)

osd: backward compatibility with old disk_list.sh location

Since all files in container image have moved to `/opt/ceph-container`
this check must look for new AND the old path so it's backward
compatible. Otherwise it could end up by templating an inconsistent
`ceph-osd-run.sh`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 987bdac963cee8d8aba1f10659f23bb68c2b1d1b)

ceph-validate: fail if there's no ipaddr available in monitor_address_block subnet

When using monitor_address_block to determine the ip address of the
monitor node, we need an ip address available in that cidr to be
present in the ansible facts (ansible_all_ipv[46]_addresses).
Currently we don't check if there's an ip address available during
the ceph-validate role.
As a result, the ceph-config role fails due to an empty list during
ceph.conf template creation but the error isn't explicit.

TASK [ceph-config : generate ceph.conf configuration file] *****
fatal: [0]: FAILED! => {"msg": "No first item, sequence was empty."}

With this patch we will fail before the ceph deployment with an
explicit failure message.

Resolves: rhbz#1673687

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5c39735be530b2c7339510486bc4078687236bbb)

Change docker_container parameter network to network_mode

Addressing "populate kv_store with custom ceph.conf":
Unsupported parameters for (docker_container) module. Looking at
https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
shows that the correct parameter is network_mode, not network.

Signed-off-by: Gregory Orange <gregoryo2014@users.noreply.github.com>

Set the default crush rule in ceph.conf

Currently the default crush rule value is added to the ceph config
on the mon nodes as an extra configuration applied after the template
generation via the ansible ini module.

This implies two behaviors:

1/ On each ceph-ansible run, the ceph.conf will be regenerated via
ceph-config+template and then ceph-mon+ini_file. This leads to a
non necessary daemons restart.

2/ When other ceph daemons are collocated on the monitor nodes
(like mgr or rgw), the default crush rule value will be erased by
the ceph.conf template (mon -> mgr -> rgw).

This patch adds the osd_pool_default_crush_rule config to the ceph
template and only for the monitor nodes (like crush_rules.yml).
The default crush rule id is read (if exist) from the current ceph
configuration.
The default configuration is -1 (ceph default).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1638092
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d8538ad4e16fe76d63e607491d41793303f929b1)

add-osd.yml: Add become flag for ceph-validate

The check_devices task fails if the ceph-validate role isn't executed
as a privileged user (Permission denied).

failed: [osd0] (item=/dev/sdb) => {"changed": false, "err": "Error:
Error opening /dev/sdb: Permission denied\n", "item": "/dev/sdb",
"msg": "Error while getting device information with parted script:
'/sbin/parted -s -m /dev/sdb -- unit 'MiB' print'", "out": "", "rc": 1}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b23c05ae5200255fb8452c26834de1e9db1497cc)

ceph-osd: Install numactl package when needed

With 3e32dce we can run OSD containers with numactl support.
When using numactl command in a containerized deployment we need to
be sure that the corresponding package is installed on the host.
The package installation is only executed when the
ceph_osd_numactl_opts variable isn't empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit b7f4e3e7c7d73d931fd7fec4c940384c890dee42)

osd: support numactl options on OSD activate

This commit adds OSD containers activate with numactl support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1684146
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b3eb9206fada05df811602217d8770db854e0adf)

tests: add mgrs section in non_container-collocation

No mgrs are deployed in this scenario, causing the testinfra jobs to
fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix collocation scenario

ceph_origin and ceph_repository are mandatory variables.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: use memory backend for cache fact

force ansible to generate facts for each run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4a1bafdc2181b3a951991fcc9a5108edde757615)

tests: pin testinfra version

As of testinfra 2.0.0, the binary name is `py.test`.

But let's pin the version to 1.19.0.
Indeed, migrating to 2.0.0 requires our current testing to be reworked a bit.
Since we don't have the bandwidth ATM for this, it's better to simply
keep testing with testinfra 1.19.0.

Note that I've replaced all `testinfra` occurences by `py.test` anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b42250332a0c9afadcb5a670cc7153d72fd8daec)

add-osd: gather facts in second part of playbook

otherwise, it will end up with error like following:

```
FAILED! => {"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_hostname'"}
```

because facts won't have been gathered.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1670663
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a4408785332512f6fab6134ee488a4cec18639c1)

purge: fix rbd-mirror group name

the default is rbdmirrors in ceph-defaults

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47ebef374ffbcaed496ff42b3f13bfd21951c333)

purge: fix rbd mirror purge

as of b70d54ac809a92cd88e39e3efa7ed3fee864a866 the service launched isn't
ceph-rbd-mirror@admin.service.

it's now `ceph-rbd-mirror@rbd-mirror.{{ ansible_hostname }}`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9153084778d5bc219d33e3769485cbea8d7a6a9)

purge: do not remove /var/lib/apt/lists/*

removing the content of this directory seems a bit agressive and cause a
redeployment to fail after a purge on debian based distrubition.

Typical error:
```
fatal: [mon0]: FAILED! => changed=false
  attempts: 3
  msg: No package matching 'ceph' is available
```

The following task will consider the cache is still valid, so apt
doesn't refresh it:
```
- name: update apt cache if cache_valid_time has expired
  apt:
    update_cache: yes
    cache_valid_time: 3600
  register: result
  until: result is succeeded
```

since the task installing ceph packages has a `update_cache: no` it
fails:

```
- name: install ceph for debian
  apt:
    name: "{{ debian_ceph_pkgs | unique }}"
    update_cache: no
    state: "{{ (upgrade_ceph_packages|bool) | ternary('latest','present') }}"
    default_release: "{{ ceph_stable_release_uca | default('') }}{{ ansible_distribution_release ~ '-backports' if ceph_origin == 'distro' and ceph_use_distro_backports else '' }}"
  register: result
  until: result is succeeded
```

/tmp/* isn't specific to ceph as well, so we shouldn't remove everything
in this directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3849f30f58e35d559bfa137fa301214192db993b)

purge: fix purge of lvm devices

using `shell` module seems to be the only way to make this task working
on rhel based distribution AND debian based distributions.

on ubuntu, using `command` ansible module fails like following
(not due to `sudo` usage or not):
```
ok: [osd1] => changed=false
  cmd: command -v ceph-volume
  failed_when_result: false
  msg: '[Errno 2] No such file or directory: ''command'': ''command'''
  rc: 2
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1653307
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 89f77589fa0a431490933379c0781e1df2b95440)

Extends check_devices tasks to non-collocated an lvm-batch scenarios

Tuned name of a task and error message to make it more user understandable

Fixes BZ 1648168 - ceph-validate : devices are not validated in non-collocated and lvm_batch scenario

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1648168
Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 34c25ef49b10ef6c789447e785a4bf6938c2a804)

Convert interface names to underscores

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540881
Signed-off-by: Tomas Petr <tpetr@redhat.com>
(cherry picked from commit 573adce7dd4f306c384b3308c8049ae49ef59716)

osd: add ipc=host in systemd template for containers

in addition to 15812970f033206b8680cc68351952d49cc18314

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d5be83e5042a5e22ace6250234ccd81acaffb0a2)

tests: update ceph_volume tests

accordingly to change introduced by b5548ea9412cd7741bee993dddcbfd9daa34cb02

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2dcb02d213e862c5a5498c2d12cd86b22676c84)

cv: expose host ipc namespace to ceph-volume container

this is needed to properly handle semaphore synchronization for udev
actions via dmcrypt/cryptsetup.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683770
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 15812970f033206b8680cc68351952d49cc18314)

# Conflicts:
# library/ceph_volume.py

tests: add lvm bluestore dmcrypt support

Add coverage for container / non container lvm bluestore dmcrypt OSDs

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 207fae38d480f7de369106c5bda1dfe0f1b6033c)

Removed not needed mountpoint and removed ubuntu section

Referring to BZ#1683290, as dsavineau suggests, being this
bug tripleO specific, removed the ubuntu section and removed
useless mountpoints.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 21fad7ced344e441ffcd5c4010d634b81ead517f)

Added to the ceph-radosgw service template the ca-trust
volume avoiding to expose useless information.
This bug is referred to the following bugzilla:

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683290
Signed-off-by: fpantano <fpantano@redhat.com>
(cherry picked from commit 0c1944236bfb397e9dff6ef436569556bc00379d)

Set permissions on monitor directory to u=rwX,g=rX,o=rX recursive

Set directories to 755 and files to 644 to
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} recursively instead of
setting files and directories to 755 recursively. The ceph mon
process writes files to this path with permissions 644. This update stops
ansible from updating the permissions in
/var/lib/ceph/mon/{{ cluster }}-{{ monitor_name }} every time ceph mon writes
a file and increases idempotency.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683997
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit d327681b99915578fc8b389fda69556966db905f)

mon: Move client admin variable to defaults

There's no need to set the client_admin_ceph_authtool_cap variable
via a set_fact task.
Instead we can set this in the role defaults.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 58a9d310d5651171214dc2a621cf2ba197229951)

mon: Add mds permissions to client.admin

The administrator keyring needs full capabilities on mds like mon,
osd and mgr.
Whithout this, the client.admin key won't be able to run commands
against mds (like ceph tell mds.0 session ls)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1672878
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit dd7b7604de62c49cc979adfc89b4b89c1b39ae6e)

common: do not override ceph_release when ceph_repository is 'rhcs'

We shouldn't reset `ceph_release` with `ceph_stable_release` when
`ceph_repository` is `rhcs`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1645379
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b60a356343677da6371b7861ee657bfd42c54fd)

osd: make the 'wait for all osd to be up' task configurable

introduce two new variables to make the check that 'wait for all osd to
be up' configurable.
It's possible that for some deployments, OSDs can take longer to be seen
as UP and IN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1676763
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21e5db8982afd6e075541e7fc88620d59a1df498)

ensure at least one osd is up

The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.

The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).

This is related to Bugzilla 1578086.

In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.

Signed-off-by: David Waiting <david_waiting@comcast.com>
(cherry picked from commit 3930791cb7d2872e3388d33713171d7a0c1951e8)

ceph_key: fix rstrip for python 3

Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f5c2ca3710844f73960b5e3652c521de97fb3383)

setup_ntp: call handler to disable ntpd if chronyd used

The task setup chronyd called the handler disable chronyd, which of
course defeats the purpose.

Changing the task to disable ntpd instead fixes the issue of chronyd
being disabled after it got enabled.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1673664
Fixes: #3582
Signed-off-by: Patrick C. F. Ernzer pcfe@redhat.com
(cherry picked from commit c605ff6a68720ab43b63086c3ac1d529a651f585)

iscsi: fix permission denied error

Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```

`become: True` is not needed on the following task:

`copy crt file(s) to gateway nodes`.

Since it's already set in the main playbook (site.yml/site-container.yml)

The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).

The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d590f4339a4d758f07388bf97b7eabdcbca6043)

add 'custom' as valid ceph_repository value

This is documented as valid:

https://github.com/ceph/ceph-ansible/blob/561746f75e3913b30e6ae3f14768ebc8a516bf66/group_vars/all.yml.sample#L245

Signed-off-by: Justin Riley <justin.t.riley@gmail.com>
(cherry picked from commit 6a79870d62565eae9ae34a7e5d386941fc8ba590)

Fix uses of default(omit) with string concatenation

When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.

Regression introduced in d53f55e807e.

Signed-off-by: Leah Neukirchen <leah.neukirchen@mayflower.de>

tests: do not deploy iscsigw on ubuntu

not supported on non rhel based distribution

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add inventory file

add missing inventory file for ubuntu-container-all_daemons job

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ansible: increase fact cache timeout

10m seems a bit low, indeed, a complete run can take more than 1h.
Let's increase it to 2h

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b37c4adb32715b8749b7d6714a20b8b538bdf214)

osd: expose udev into the container

In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 997667a8734eddaa616fe642e57f6378408736a9)

osd: bind mount /var/run/udev/

without this, the command `ceph-volume lvm list --format json` hangs and
takes a very long time to complete.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7ade0328072896e99817b070b6a82448024bfb84)

shrink_osd: use cv zap by fsid to remove parts/lvs

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1569413
https://bugzilla.redhat.com/show_bug.cgi?id=1572933

Note: rebased

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 9a43674d2e91ef46917cabe49651c46b630e5ace)

test: add missing test dependency

[nwatkins@smash ceph-ansible]$ virtualenv env
[nwatkins@smash ceph-ansible]$ env/bin/pip install -r tests/requirements.txt
[nwatkins@smash ceph-ansible]$ env/bin/python -c "import mock"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'mock'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 8a5530ee98d3128c9558e8e8e38f9517fb34d7cf)

cv: support zap by osd fsid

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit fce9f6ef60e3725ac6912bcb150ae59e36ff56fb)

set `any_errors_fatal` true for left out host sections

Many hosts sections in site.yml.sample were left out during the
backport commit 6e2cd0930fa17f5d50c73496eff71074301f55bd.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

use shortname in keyring path

socket.gethostname may return a FQDN. Problem found in Linode.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 8cd0308f5f570635d66295c442ea49dc2c043194)

tests: run lvm_setup.yml only when osd_scenario is lvm

especially for ooo_collocation scenario which is still using ceph-disk
testing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add nodes for container-all_daemons scenario

add back iscsigw and rbdmirror vm in all_daemons testing

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Add a ceph-volume aware shrink-osd playbook

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit f5dacbf7de38a9b08cfcf041438d49acce792afe)

Rename ceph-disk version of shrink-osd playbook

This will be replaced by a ceph-volume aware verison.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 0782cfc546ec398cfa405fb4c9c8226ab52a7960)

tests: specify docker params for shrink-osd

Otherwise, it will go with the default values, eg:

"latest" for `ceph_docker_image_tag`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Fixup shrink_osd[_container] scenario config

** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
(cherry picked from commit 50255b964084ab52d6ca949b50f413c0ad9e2362)

tests: refact testing in stable-3.2

Apply the same refact recently introduced in master to stable-3.2

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

override ceph_release with ceph_stable_release

when `ceph_origin` is set to `'repository'` and `ceph_repository` to
`'community'` we need to ensure `ceph_release` reflect
`ceph_stable_release`.

4a3f180f9d29d5a31468ebb3d1c5f31a53a93960 simply removed the override
while it should just have to be run only when the condition mentioned
above is satisfied.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0bfefdd5bc06b4f1dd03d9060b0a38a6f447b207)