git-server-git.apps.pok.os.sepia.ceph.com Git

replace 'master' references with 'main'

Because the branch'master' was renamed 'main'

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0ce39416e8dcd8d5b1093492df4b7e275fc3abc1)

tests: skip rbdmirror tests on non-secondary daemon

the daemon is not running on the 'primary' daemon.
Therefore, these tests are not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 37e67fb67253b3864a938e592b4a8e52d9a36d3f)

tests: set no_log_on_ceph_key_tasks=false

In order to not have to always reproduce it when a failure shows up in the CI
having the failure logged can make us save some time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1239b6907da5750b8dac17dca78338f864eb5c9)

rbd-mirror: follow up on recent rbd-mirror refactor

- ensure /var/lib/ceph/bootstrap-rbd-mirror exists
- always install ceph-base on rbdmirror nodes (otherwise, ceph-crash
isn't present)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 041435e1e371f00b7112efed3306beec91d5a434)
(cherry picked from commit b634fb1cb3cef6d895678501dfcd817a4fdea99c)
(cherry picked from commit 302da16c274b6b9e5cced970dd0220d7c56e6e5c)

Set ceph_rbd_mirror_pool default value

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 0c50bfac98cd174bd785487574af54a468b82ae6)
(cherry picked from commit 8a0b5a9571df55cff3b4d4d5ab8b41d739a6e1cb)
(cherry picked from commit 4359464bb652f6efad9cb82b34c7cc17fca30a15)

rbd-mirror: major refactor

- Use config-key store to add cluster peer.
- Support multiple pools mirroring.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b74ff6e22c0d1b95e71384e4d7e2fb2ad556ac39)

config: do not always set _osd_memory_target

When 'osd_memory_target' is overridden in ceph_conf_overrides.
The task that sets the fact `osd_memory_target` in the ceph-config role
should be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675#c11
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit cb5d6b48fb2395e99a7f7ee6e8d2afbcba45901d)

Playbook fails when using --limit to install new MDS

"set_fact container_run_cmd" is not set when using --limit on MDS as facts
were not run on first MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2111017
Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 9a4a3f5f1921b76fa3a2a28ada2a632b5df70ca2)

config: followup on 8a5628b51

Add missing `--cluster {{ cluster }}` on task
`set osd_memory_target` in the main.yml file of the
ceph-config role.
Also it moves the task after ceph configuration file is actually written.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f59c7286fcfcf1c15cb02c19f86a7ad91d4e4a3)

shrink-osd: use command instead of ceph_volume_simple_scan

This module isn't available in RHCS 4

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

backup-and-restore: use archive/unarchive approach

current approach is too complex and causes too many issues permission
issues.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit dffe7b47de70b6eeec71a3fa86f8c407adb4dd8e)

backup-and-restore: various fixes

- preserve mode and ownership on main directories
- make sure the directories are well present prior to restoring files.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 047af3a3f66b6135cf7ab6a7c28b9c30dea3c836)

backup-and-restore: fix check on 'target_node' variable

If the user doesn't pass a valid name (present in the inventory)
the playbook will fail like following:

```
fatal: [localhost -> {{ target_node }}]: FAILED! =>
msg: |-
The task includes an option with an undefined variable. The error was: "hostvars['10.70.46.40']" is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b18a1aa3cafa235dfa4f564d6574318ee539f961)

backup-and-restore: fix check on 'mode' variable

Typical failure:

```
fatal: [localhost]: FAILED! =>
msg: |-
The conditional check 'mode not in ['backup', 'restore']' failed. The error was: error while evaluating conditional (mode not in ['backup', 'restore']): 'mode' is undefined
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 848dd03fa66c0febaf1ed081d4986d5de8472a09)

backup-and-restore: fix a typo

Typo introduced during initial implementation.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e28c486e528517439997583bcb0faa3df16ab848)

contrib: add a playbook

this playbook can backup or restore some ceph files.
(/etc/ceph, /var/lib/ceph, ...)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2051640
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed0bba4d770d7b886de83efe25d482e08d558f97)

config/osd: various fixes

- sets `osd_memory_target` per osd host.
- ceph.conf refactor (osd)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2056675
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a5628b51616b0b9740680bd14c669d4316941f3)

config: fix indentation in main.yml

For consistency and readability.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5283fa6e966ae3bcde82559240794b8fd5c0d8f3)

Refresh /etc/ceph/osd json files content before zapping the disks

If the physical disk to device path mapping has changed since the
last ceph-volume simple scan (e.g. addition or removal of disks),
a wrong disk could be deleted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2071035
Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 64e08f2c0bdea6f4c4ad5862dc8f350c6adbe2cd)

facts: fix set_radosgw_address.yml

use `include_tasks` instead of `import_tasks`.
Given that with `import_tasks` statements are preprocessed
and the tasks that defines it hasn't been run yet, it will fail
and complain like following:

```
The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_interface'
```

Using `include_tasks` instead fixes this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 434793e2feec64fdbbb8a7bdb959b08dacbb2006)
(cherry picked from commit d57377ef619c3304c1eaf4d466dae4055b492d7b)

facts: fix deployments with different net interface names

Deployments when radosgws don't have the same names for
network interface.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2095605
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f6b49f78a9f14c37b2ca81fa6172beba8f43adc4)
(cherry picked from commit ab6fc3e6f771f17c8726618158a512cf9a375c71)

purge: reset-failed ceph-crash

This ensures we always reset-failed the ceph-crash service.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e368ee0fc9455d7aa23b84f7690a4d3ee4deb156)

purge: remove ceph directories on client nodes

Otherwise any ceph directories are left over on client nodes
after the purge.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2024815
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 20035852a4e51f3800808bc4c71b0e0be86b8132)
(cherry picked from commit 346d4a1e1dc19238020886e3572b296e6b711c70)

update: speed up client play

there's no need to run the roles ceph-facts, ceph-config and ceph-client
altogether on client nodes in rolling update playbook.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2019831
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 817c03bc0ebee9df270093d6f414d02f8b95866a)
(cherry picked from commit c0da98b1d67177a55ed29bd3b72ac9774d2e320c)

container: align systemd units with rpm

Update `After=` and `Wants=` parameters in container systemd units
and make them be aligned with the systemd units that come
from the packaging.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2027440
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f01536ea195a56c3ea2b31c7232391387e909c41)
(cherry picked from commit 690c879aef47984136244d1fc8672c0afd6961f5)

switch2containers: fail if less than 3 monitors

This playbook doesn't support less than 3 monitors present in the inventory.
Just like the rolling_update playbook, let's fail if less than
3 monitors are present.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2049132
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f08129edf2efb8bad21043b7e90af59e6f3e1a8d)
(cherry picked from commit b970ab6691b10c0156fb8cf4a7b8fcc123a453e3)

dashboard: allow collecting stats from the host

This commit makes podman bindmount `/:/rootfs:ro` so the container can
collect data from the host.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2028775
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0f34cd16d808dbb800efd0f558b29bcfb218bbfa)
(cherry picked from commit 2e2d893d28d74f101a9c8c94ca98d12445ff2267)

purge: ceph-crash purge fixes

This fixes the service file removal and makes the playbook
call `systemctl reset-failed` on the service because in Ceph
Nautilus, ceph-crash doesn't handle `SIGTERM` signal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2055992
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2f11982590e298e92f7d3c4a06026fb00bea40cd)
(cherry picked from commit 7a570c719e16b9b802aab3703c747b7e429b7b77)

common: support setting pg autoscaler to off

The current implementation doesn't allow to disable the pg autoscaler
on created pools. This allows only 'on' or 'warn'.

With this commit, this is now possible to disable it.

Valid values would be ['on', 'yes', 'true', 'off', 'no', 'false']

```
openstack_glance_pool:
  name: "images"
  pg_num: 128
  pgp_num: 128
  rule_name: "replicated_rule"
  type: 1
  application: "rbd"
  size: 3
  pg_autoscale_mode: off
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2062621
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d1ff8f236ed02b6ac7fc8a7f8e056ff6f52f14c)

Turn off SELinux separation for containers MON and RGW

Initially MONs and RGW binded /etc/pki/ca-trust/extracted using the :z flag
(introduced to solve an OSP TripleO issue on RHEL - #3638) but using
this flag prevents local services (like sssd) running on the host from accessing
the certificates/files in that folder.

Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 7e8ce2567ec7f163c763547252a7a5bcc983fd98)
(cherry picked from commit cf44ad76f6858520bae97940bb3aaf171f74e24c)

facts: follow up on aa0cc93

when these variables are defined in the inventory host file,
all tasks are skipped then because the node being played isn't
aware about the values from the rgw nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 328bd7c975f5578e1e852872adb72d105f74a876)

facts: fix mon/mgr collocation

`service dump` hangs when no active mgr is available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 617dce5e10bfb2e309b043124999c8f32ebdaf4f)

dashboard: fix regression

introduced by ceph/ceph-ansible/pull/7150

when no rgw is present, it fails.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2076192
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a56fd6a21b3e39c8acd4d3d20451df9b96754d2)

dashboard: support --limit execution with rgw

When the following conditions are met:

- rgw is deployed,
- dashboard is deployed,
- playbook is called with --limit,
- a node being processed is collocated on either a mon or mgr.

The playbook fails because `rgw_instances` is undefined.
The idea here is to make sure this variable is always defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aa0cc9381d8491c30a47d27b395f58e776c5ef0b)

dashboard: always set `dashboard_server_addr`

When running the playbook with `--limit`, if the play targeted doesn't match
hosts present in the mgr group the playbook can fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2063029
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 72e4654aaea9638627c3e44d917d5af2a6ecdf0c)
(cherry picked from commit d1e4b831062988af8b43416ce9ac3cb97f7f3c80)

dashboard: fix radosgw system user creation

The radosgw system user creation will fail when `rgw_instances`
is set at the host_var level because this variable won't bet set
on monitor nodes, given that this is where the tasks is delegated, it fails.

The idea here is to check over all rgw instances that are defined and set a
boolean fact in order to check if at least one instance has `rgw_zonemaster` set
to `True`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2034595
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

validate: fix bug when using vault

since a variable encrypted with vault is no longer a string but a
encrypted object we can't use the filter | length, we have to convert it
to a string before.

Fixes: #6991
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6ad7e5286920fceff6f4483672cbdca44f06a25f)

mgr: append balancer module to ceph_mgr_modules

otherwise the osd play in rolling_update can fail when it tries to
disable it before upgrading osd nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 45a1d634d850ad971c71dbe9191b97caa3068b5a)

update: move a set_fact

ceph-facts roles makes decisions based on the fact `rolling_update` so
it must be called before we run this role.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5edcc4214348945e1566641a490b68ef8886cf0)

update: support --limit on monitor nodes

Change needed in order to support --limit on mon nodes.
Otherwise, a call to `hostvars[groups[mon_group_name][0]]['_current_monitor_address']`
throws an error:

```
"The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute '_current_monitor_address'"
```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304#c28
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 82eee4303bce3e41b5043bcb03fa3143dcdfd30d)

nfs/rgw: support enforcing keys

if one sets `ceph_nfs_rgw_access_key` and/or `ceph_nfs_rgw_secret_key`,
the nfs/rgw user creation won't take those variables into account and it
will generate a user with automatically generated credentials.
It ends up with a mismatch between what will be set in ganesha.conf and
the created user.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2010754
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

shrink-osd: fix regression because of a wrong regex

968891f4498da9625acfdd34bfb01fe445d1eef2 introduced a regression.
The regex is wrong because it doesn't allow to shrink osds with id
greater than 9

Fixes: #6950
Signed-off-by: Per Abildgaard Toft <per@minfejl.dk>
(cherry picked from commit 84118a3063e38ed9d274cca90d115809353819b4)

shrink-osd: check osd id format

This adds a check early in order to ensure the format of osd ids passed
is correct.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2005734
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 968891f4498da9625acfdd34bfb01fe445d1eef2)

rolling_update: modify default health_osd_check_*

let's do more retries with a shorter delay.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 50a21d695eb571bdc5b4d67bde914cf58c502b44)

tests: add new scenario subset_update

new scenario in order to test the subset upgrade approach using tags.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fb8a66149bc5605c0e51ab137f46c2c48580452a)

rolling_update: fix pre and post osd upgrade play

when using --limit osds, the play before and after osd upgrade are
skipped because we use `hosts: "{{ mon_group_name | default('mons') }}[0]"`
using `hosts: "{{ osds_group_name | default('osds') }}" with
`delegate_to` to the first monitor addresses this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fc9f87c45f6e58a595f59365b58bf1b0e3909bf5)

update: support upgrading a subset of nodes

It can be useful in a large cluster deployment to split the upgrade and
only upgrade a group of nodes at a time.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2014304
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e5cf9db2b04f55196d867f5a7248b455307f4407)

tests: ensure ca-certificates is up to date

otherwise the `rpm_key` module fails because it can't verify the
certificate.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: remove all references to ceph_stable_release

this is legacy and not needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f277a39dfe4ea7ea7d7f211a6a554866ac519f52)

ceph-defaults: set ceph_stable_release default to the stable branch release

ceph_stable_release is a legacy from the time where a single branch of ceph-ansible supported more than one release of ceph

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit fb99626987740d676d649b0bce2215bce72ca0cf)

tests: set rgw_instances in collect-logs.yml

in order to gather rgw logs, we need rgw_instances to be set.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c2e46fe5a5b9ebeab0828da1b7dd6540b3766fb2)

tests: update collect-logs.yml playbook

- change `ceph -s` output to json-pretty.
- gather rgw logs
- add `health detail` command

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b2ccc7234a8413a8bbd5d471da9ff2cf8f3ccde2)

tests: move collect-logs.yml to ceph-ansible repo

related ceph-build PR: ceph/ceph-build#1914

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 702564518b9cb5019648f1a9edcdc4cc962a36d9)

purge: add remove_docker tag

This can help to skip docker removal tasks

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit ff39c8d70b7326f4215d32e78e2f89b632b07008)

purge: add container_binary needed for zap osds

`container_binary` isn't set anymore in the purge osd play because of a
regression introduced by 60aa70a.
The CI didn't catch it because the play purging node-exporter sets this
variable for all nodes before we run the purge osd play.

This commit fixes this regression.

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit a51ce767ca6749b3eb4c0c871e436daf3828e6c6)

ceph-defaults: set quay.io as the default registry

Because the ceph container images are now only pushed to the quay.io
registry then this updates the default registry value.
The docker.io registry can still be used but doesn't receive updated
container images.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e7b43c1fc632237376351f43363597c23bd33cb7)

ceph-container-engine: allow override container_package_name and container_service_name

Only include specific variables when they are undefined

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
(cherry picked from commit 95bce32270c7f5ea7e397588340b674efd7db63f)

purge-dashboard: remove cid files

This adds the service cid file cleanup as supported in the classic purge
playbook since b9dd253

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1786691
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cddc23f51134a6a95fd7492ee27a5d89bf7ebf9f)

tests/rgw: use json format output for user info

If the radosgw user already exists then we need to have the output in json
format because we are expecting to load the output with json.loads()
Otherwise we have pytest failure like:

```console
self = <json.decoder.JSONDecoder object at 0x7fa2f00a5fd0>, s = '', idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.

        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.

        """
        try:
            obj, end = self.scan_once(s, idx)
        except StopIteration as err:
>           raise JSONDecodeError("Expecting value", s, err.value) from None
E           json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f2bd8ae70f0ae5d0c7c7a36bd6ecdad6383feed9)

tests/rgw: add timeout 5s to radosgw-admin command

If the radosgw daemons aren't up and running correctly (like not registered
in the servicemap or the OSD are down) then the radosgw-admin will hang
forever.
Jenkins will kill the jobs after 3h but we don't want to wait until this global
timeout.
Adding the timeout 5 command to the radosgw-admin commands (which is already
present on other ceph calls) allows the job to fail earlier.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f01ae82eeccb1c3ebc97a84b0d8a547f360741d3)

container: explicitly pull monitoring images

We don't pull the monitoring container images (alertmanager, prometheus,
node-exporter and grafana) in a dedicated task like we're doing for the
ceph container image.
This means that the container image pull is done during the start of the
systemd service.
By doing this, pulling the image behind a proxy isn't working with podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1995574
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5bb7240f878ff9b369ea028839d26fd46342ff77)

iscsi: don't set default value for trusted_ip_list

It restricts access to the iSCSI API.
It can be left empty if the API isn't going to be access from outside the
gateway node

Even though this seems to be a limited use case, it's better to leave it
empty by default than having a meaningless default value.

We could make this variable mandatory but that would be a breaking
change. Let's just add a logic in the template in order to set this
variable in the configuration file only if it was specified by users.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1994930
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 6802b8dddd7f8d1f1c47f4eb3b7dd6a6a48820dc)

containers: introduce target systemd unit

This adds ceph-*.target systemd unit files support for containerized
deployments.
This also fixes a regression introduced by PR #6719 (rgw and nfs systemd
units not getting purged)

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1962748
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09ef465f62fde775bd2490be5b43d7796e2a9c6c)

roles: remove leftover from pr #4319

pr #4319 introduced some uesless `become: true` on systemd tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1db8fa89895546571a831289ebbe0f83d02b1e0a)

Vagrantfile: fallback on 'varant_variables.yml.sample'

When using a vagrant command from the root directory of the repo, it
throws an error if no 'vagrant_variables.yml' file is present.

```
Message: Errno::ENOENT: No such file or directory @ rb_sysopen - /home/guits/workspaces/ceph-ansible/vagrant_variables.yml
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3d27f9e7dc7ee775be57c27c3620009f9935ddcc)

update: gather facts only one time

this play doesn't need to gather facts from localhost

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c14e9114baebd155996b42b18744567698178836)

ceph-mon: do not log monitor keyring

We don't want to display the keyring in the ansible log.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e44075abd607648da88b4e3555353a99ecb171a6)

common: do not log keyring secret

let's not display any keyring secret by default in ansible log.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1980744
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7511195738e9d1e8f3d3ec77ad4473fa90d17d22)

ceph-rgw: Work around Jinja2 < 2.8 missng eq test

EL7 ships with Jinja2 version 2.7, which is missing the `eq` test.

Work around this by using `match` instead.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

ceph-rgw: Set pg_num on RGW pool if required

If the `pg_num` value specified in `rgw_create_pools` is different from the
actual value in the cluster, apply it with `ceph osd pool set`.

This corresponds to the behavior of the `ceph_pool` module used in Ceph Ansible
5.0 onward.

Also avoid setting the pool application if it's already done.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>

switch2container: fix mon quorum check

This was reverted by 7ddbe74

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990733
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-dashboard: fix TLS cert openssl generation

With OpenSSL version prior 1.1.1 (like CentOS 7 with 1.0.2k), the -addext
doesn't exist.
As a solution, this uses the default openssl.cnf configuration file as a
template and add the subjectAltName in the v3_ca section. This temp openssl
configuration file is removed after the TLS certificate creation.
This patch also move the run_once statement at the block level.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 5e0ace7e5493f7d8299155e915435691a0f1a007)

dashboard: subj_alt_names fact refactor

the current way the variable is built results in:

```
2021-08-03 04:18:23,020 - ceph.ceph - INFO - ok: [ceph-sangadi-4x-indpt6-node1-installer] => changed=false
  ansible_facts:
    subj_alt_names: |-
      subjectAltName=ceph-sangadi-4x-indpt6-node1-installer/subjectAltName=10.0.210.223/subjectAltName=ceph-sangadi-4x-indpt6-node1-installersubjectAltName=ceph-sangadi-4x-indpt6-node2/subjectAltName=10.0.210.252/subjectAltName=ceph-sangadi-4x-indpt6-node2/
```

which is incorrect.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978869
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6f1a0634f73ad1f41af613a9452dc9c5f70b2702)

Fixes typo in rgw-add-users-buckets playbook

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 478d9fdcb6fe6fb6ef7d00c9fe09dd48acd345cd)

add-osd: use container_exec_cmd fact from mon host

Because we're delegating the task to the first monitor node, we need to be
sure that the container_exec_cmd fact is the one from that node too otherwise
we could have a mismatch on the ceph-mon container name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1990772
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

podman pids.max default value is 2048, docker's one is 4096 which are
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.

pids-limit set to unlimited regardless of the container engine.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041
Signed-off-by: Teoman ONAY <tonay@redhat.com>
(cherry picked from commit 9b5d97adb95a788bc1fdedbba562a9c71a1808be)

infra: use dedicated variables for balancer status

The balancer status is registered during the cephadm-adopt, rolling_update
and swith2container playbooks. But it is also used in the ceph-handler role
which is included in those playbooks too.
Even if the ceph-handler tasks are skipped for rolling_update and
switch2container, the balancer_status variable is erased with the skip task
result.

play1:
  register: balancer_status
play2:
  register: balancer_status <-- skipped
play3:
  when: (balancer_status.stdout | from_json)['active'] | bool

This leads to issue like:

The conditional check '(balancer_status.stdout | from_json)['active'] | bool'
failed. The error was: Unexpected templating type error occurred on
({% if (balancer_status.stdout | from_json)['active'] | bool %} True
{% else %} False {% endif %}): expected string or buffer.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1982054
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 386661699bcfe05a220de6d58b9d50baa7eb6dc1)

osds: use osd pool ls instead of osd dump command

The ceph osd pool ls detail command is a subset of the ceph osd dump
command.

$ ceph osd dump --format json|wc -c
10117
$ ceph osd pool ls detail --format json|wc -c
4740

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 06471a4b82d63ebb35f80d45aa6ae629a4daeedc)

rolling_update: get ceph version when mons exist

eec3878 introduced a regression for upgrade scenarios where there's no
monitor nodes at all (like ganesha standalone, external clients, etc..)

TASK [get the ceph release being deployed] ************************************
task path: infrastructure-playbooks/rolling_update.yml:121
Thursday 29 July 2021 15:55:29 +0000 (0:00:00.484) 0:00:15.802 *********
fatal: [client0]: FAILED! =>
msg: '''dict object'' has no attribute ''mons'''

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit e87a47cf0cc0d01050d0cb94cabbb8bc42db0c57)

infrastructure-playbooks: Get Ceph info in check mode

In the `set osd flags` block, run the Ceph commands that gather information
from the cluster (and don't make any changes to it) even when running in check
mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit d7653dca95247e52c4a6821c1eec00748263082a)

ceph-handler: Fix osd handler in check mode

Run the Ceph commands that only gather information (without making any changes
to the cluster) when running Ansible in check mode.

This allows the tasks that depend on the variables set by those tasks to
succeed in check mode.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 498acd7527410f7f359b2b0181e83ca39c682ec0)

library: remove unused module import

Move the import at the top of the file and remove unused module import.

- E402 module level import not at top of file
- F401 'xxxx' imported but unused

This also removes the '# noqa E402' statement from the code.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 2138a00a3294b222d5e8325495300841ed5a7f5f)

library: flake8 ceph-ansible modules

This commit ensure all ceph-ansible modules pass flake8 properly.

Signed-off-by: Wong Hoi Sing Edison <hswong3i@pantarei-design.com>
(cherry picked from commit beda1fe77381fbacb40fb75e5c06f36fbbad4a4a)

ceph-defaults: add missing grafana dashboards

The radosgw-sync-overview and rbd-details grafana dashboars were missing
from the list.

Closes: #6758
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit f0ccf3ebf0b1ad6737f0d65174c0024b49db00a4)

update: check the ceph release

Check early which Ceph release is going to be deployed and fail if it
doesn't correspond to the ceph-ansible version being used.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1978643
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit eec38784ecfaae6bf51af6cc5e3aea934d1d3d58)

alertmanager: allow disable dashboard tls verify

When using self-signed/untrusted CA certificates, alertmanager displays
an error in logs. With this commit this should make those messages
disappear.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1936299
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9f77b929d145512e0d8886b96caf6047c5072a68)

dashboard: support dedicated network for the dashboard

This introduces a new variable `dashboard_network` in order to support
deploying the dashboard on a different subnet.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1927574
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f4f73b61972f416db9fe6ec305de282094581e07)

multisite: use node fqdn for endpoints when https

When the rgw_multisite_proto variable is set to https then we shoudn't use
the IP address in the zone endpoints list but the node FQDN to match the
TLS certificate CN.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1965504
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit ad05a0816048a69adba0e9b27683ed799e3c40bd)

purge: support osd_auto_discovery

This adds a task that zaps by osd id so we can support the scenario
where osds were deployed with `osd_auto_discovery` is true.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4144074a50ffd1e8893e3af2242fc44a23fd9c3e)

purge: merge playbooks

This refactor merges the two playbooks so we only have to maintain 1
playbook.
(Symlink the old purge-container-cluster.yml playbook for backward
compatibility).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 17cd83bf3a35b482fa453b6bef8445e5e1ad8bce)

purge: drop variables from 'hosts' sections

Those variables are useless given this is not possible to override them.
Let's replace them with the hardcoded name instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6b50401d0c2021fe691ee4f2be083b059d991c8b)

purge: reindent playbook

This commit reindents the playbook.
Also improve readability by adding an extra line between plays.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 60aa70a12820835412835063972c34a1c93cac7d)

ceph-mgr: don't install dashboard pkg by default

This is a partial backport of 2547ab60.

We are currently installing the ceph-mgr-dashboard package even if the
dashboard_enabled variable is set to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-mgr: move mgr module list to common

Populating the ceph_mgr_modules list in the mgr_modules doesn't make sense
since that file is only executed if the list isn't empty or we're using the
dashboard.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit cd06e7c046b3e56920b1f9bdc1907429382bee5c)

ceph-nfs: allow overriding NFS_CORE_PARAM

We already have config override variables for existing block (like
ganesha_ceph_export_overrides, ganesha_log_overrides, etc...) or a
global one (ganesha_conf_overrides) but redefining the NFS_CORE_PARAM
block in that variable will erase all previous values (currently only
Bind_Addr).

ganesha_core_param_overrides: |
Enable_UDP = false;
NFS_Port = 2050;

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1941775
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 9817d29543099ca640ce8b23da2ab9f26179cba5)

lib/ceph-volume: support zapping by osd_id

This commit adds the support for zapping an osd by osd_id in the
ceph_volume module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 70f1d6e2cd9ed4abb4db599f9faa816703430d80)

rolling_update: check quorum state before upgrade

If one a the monitor is out of the quorum then nothing prevents the upgrade
playbook to run.
We only check if we have at least three monitor nodes but we should also
check if those monitor nodes are correctly present in the quorum.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1952571
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 97148dd58c77a84aff1235dc9be3cb8c9d73cc09)

ceph-facts: move device facts to its own file

Instead of reusing the condition 'inventory_hostname in groups[osds]'
on each device facts tasks then we can move all the tasks into a
dedicated file and set the condition on the import_tasks statement.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit d704b05e52d10910cd68c49033933bd7e6ded268)

ceph-validate: check logical volumes

We currently don't check if the logical volume used in lvm_volumes list
for either bluestore data/db/wal or filestore data/journal exist.
We're only doing this on raw devices for batch scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 55bca07cb612b766bc099e14e0a5661185a7f9a6)

ceph-validate: check db/journal/wal devices too

When using dedicated devices for db/journal/wal objecstore with
ceph-volume lvm batch then we should also validate that those devices
exist and don't use a gpt partition table in addition of the devices
and lvm_volume.data variables.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit 808e7106dec5f3f7a743fe343ba3023c9390a1ba)