git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

ceph-osd: Add ulimit nofile on container start

On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Set proper ownership command performance improvement

By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>

ceph-nfs: fail on openSUSE Leap using distro packages

roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>

install ceph-mds packages on SUSE/openSUSE

install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>

handler: do not validate the server certificate against the CA

Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

remove duplicate task installing suse dependencies

roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>

validate: do not validate devices or lvm_volumes in osd_auto_discovery case

we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: remove useless condition

just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

mergify: disable automatic merging on master

automatic merging by mergify is failing for a while now.
Until we can figure out what's wrong, let's disable it on master for now
so we don't merge "failing" PRs although they passed all scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

doc: update backport section

Only maintainers can set labels on PRs, so let's clarify that point in
the doc which says something confusing at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: tests switch_to_containers against octopus

since we have container images for ceph@master, we shouldn't use
nautilus anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: replace shell module

there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

shrink-mon: refact 'verify the monitor is out of the cluster' task

use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: refact 'wait for all osd to be up' task

let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: use discovered_interpreter_python fact

in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: update test_mgr_is_up()

the data structure has changed in octopus:

```
    "mgrmap": {
        "available": true,
        "modules": [
            "dashboard",
            "prometheus"
        ],
        "num_standbys": 0,
        "services": {
            "prometheus": "http://mgr0:9283/"
        }
    },
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: update the check for 'all osd to be up'

the data structure has changed in octopus.
eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

refact python installation

This commit refacts the python installation when no available.

In order to avoid generating errors, we check for each package manager
to detect which system we are running on.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix wrong paths for lv-create in tox.ini
solution: change paths inside tox.ini file
Fixes: #4311
Signed-off-by: Bogomolov Igor <igor95n@gmail.com>

Revert "tests: disable nfs-ganesha deployment"

This reverts commit 83940e624bc4faf6bc7bb1a7637711556d6b3e8c.

Because nfs-ganesha@master (2.9-dev) build has been fixed by [1] then
we can test nfs-ganesha in the CI for master/octopus.

[1] https://github.com/ceph/ceph-build/pull/1346

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

mgr: refact 'wait for all mgr to be up' task

There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

mgr/dashboard: Fix grafana/prometheus url config

When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

dashboard: run dashboard role on mgr/mon nodes

We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-dashboard: Add run_once on delegate tasks

Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

only support openSUSE Leap 15.x, fail on 42.x

openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>

ceph-infra: Apply firewall rules with container

We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-grafana: Set grafana uid/gid on files

We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

dashboard: do not deploy on Debian based OS/non-containerized

in non-containerized deployment, we can't deploy dashboard on Debian
based distribution since the package `ceph-grafana-dashboards` isn't
available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

docs: Correct weird wording

for the Ceph master branch.

Signed-off-by: Theo Ouzhinski touzhinski@gmail.com

tests/shrink_rgw: Disable dashboard

The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests: add more memory in podman job

Typical error :

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - podman
  - exec
  - ceph-mon-mon0
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/ssl
  - 'false'
  delta: '0:00:00.644870'
  end: '2019-07-30 10:17:32.715639'
  msg: non-zero return code
  rc: 1
  start: '2019-07-30 10:17:32.070769'
  stderr: |-
    Traceback (most recent call last):
      File "/usr/bin/ceph", line 140, in <module>
        import rados
    ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
    Error: exit status 1
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Let's add more memory to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: deploy dashboard on mons

there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

dashboard: fix timeout usage on rgw user creation command

For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests/functional: add a test for shrink-rgw.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

add a playbook the remove rgw from a given node

Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>

osd: add 'osd blacklist' cap for osp keyrings

This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-osd: check container engine rc for pools

When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests: test dashboard deployment with podman scenario

This commit adds a grafana-server section in order to test dashboard
deployment with podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

validate: add checks for grafana-server group definition

this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

mgr: fix a typo

this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

dashboard: remove cfg80211 module installation

According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] https://github.com/ceph/ceph-ansible/commit/dbf81b6b5bd6d7e977706a93f9e75b38efe32305#diff-961545214e21efed3b84a9e178927a08L21-L23

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

dashboard: use dedicated group only

There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

dashboard: move code into a dedicated playbook

Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

dashboard: enable dashboard by default

This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Remove NBSP characters

Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

infra-playbooks: rewite a condition for better readability

Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

container: rename docker directories

Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: disable nfs-ganesha deployment

nfs-ganesha repositories @ dev are broken, this commit disables the
nfs-ganesha deployment so the CI isn't stuck.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Avoid to setup provisioners in a fully containerized environment

This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>

Fix backward compat with old cephfs_pools format

Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>

handler: fix bug in osd handlers

fbf4ed42aee8fa5fd18c4c289cbb80ffeda8f72e introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

library/ceph_volume.py: remove six dependency

The ceph nodes couldn't have the python six library installed which
could lead to error during the ceph_volume custom module execution.

ImportError: No module named six

The six library isn't useful in this module if we're sure that all
action variables passed to the build_ceph_volume_cmd function are a
list and not a string.

Resolves: #4071

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

validate: fail if gpt header found on unprepared devices

ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-dashboard: enable rgw options conditionally

The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests/dashboard: use the dedicated grafana node

The Vagrant dashboard scenario creates a dedicated grafana node but
was not use in the ansible inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

dashboard: use variables for port value

The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

tests: remove useless setting

this setting is not needed here since we explicitely set it for
container and non container context.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

shrink-rbdmirror: check if rbdmirror is well removed from cluster

This commits adds a check to ensure the daemon has been removed from the
cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests/functional: add a test for shrink-rbdmirror.yml

Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

add a playbook that removes rbd-mirror from a node

Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>

ceph-infra: update handler with daemon variable

Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-infra: Open prometheus port

The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

handler: remove legacy condition

since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

validate: improve message printed in check_devices.yml

The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-iscsi: Update gateway config/template

- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-dashboard: remove bool filter for rgw vars

Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

dashboard: Use upstream default port

We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>

tests/functional: add a test for shrink-mgr.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

add a playbook that removes manager from a node

Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>

ceph-handler: fix cluster name in socket path

c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

shrink-mds: refact post tasks

This commit refacts the way we check the "mds_to_kill" node is well
stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>

tests/functional: add a test for shrink-mds.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

add a playbook that removes mds from a node

Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>

Add package-install tag on ceph-grafana-dashboard pkg install.

According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.

See: /issues/4197 for details

Fixes: #4197
Signed-off-by: fmount <fpantano@redhat.com>

ceph-iscsi-gw: Update log directories bind mount

On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

ceph-mon: Fix cluster name parameter

The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.

Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>

iscsi: refact deprecated variables

This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

igw: Add check for missing iqn

If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: Add check for mismatch ceph-iscsi and iscsigws.yml settings

If the user has manually installed ceph-iscsi but is trying to setup a
iscsi object in iscsigws.yml you will just a python crash. This patch
adds a check and more user friendly error message for the case.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: Update tests to use ceph-iscsi package

gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: Update iscsigws.yml.sample for ceph-iscsi support

Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: Support ceph-iscsi package for install

This adds support for the ceph-iscsi package during install. ceph-iscsi
does not support setting up targets/gws, luns and clients with the
current library/igw_* code. Going forward those tasks should be done with
gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi
objects setup so we do not break existing setups.

The next patch will update the iscsigws.yml.sample to document that
users must not setup any iscsi object if they want to use the new
package and tools.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: drop gateway_ip_list for container setups

The gateway_ip_list is not used in container setups, so drop it
for that case.

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: move gateway_ip_list check to validate role

Signed-off-by: Mike Christie <mchristi@redhat.com>

igw: Support new ceph-iscsi package during purge

The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.

Signed-off-by: Mike Christie <mchristi@redhat.com>

tests: wait 30sec before running testinfra

adding back a sleep 30s after nodes have rebooted before running
testinfra.
This was removed accidentally by d5be83e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-handler: Fix rgw socket in restart script

Since Mimic the radosgw socket has two extra fields in the socket
name (before the .asok suffix): <pid>.<ctid>

Before:
  /var/run/ceph/ceph-client.rgw.cephaio-1.asok
After:
  /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok

The radosgw restart script doesn't handle this and could fail during
an upgrade.
If the SOCKETS variable isn't defined in the script then the test
command won't fail because the return code is 0

$ test -S
$ echo $?
0

There multiple issues in that script:
  - The default SOCKETS value isn't defined due to a typo
SOCKET vs SOCKETS.
  - Because the socket name uses the pid then we need to check the
socket name after the service restart.
  - After restarting the radosgw service we need to wait few seconds
otherwise the socket won't be created.
  - Update the wget parameters because the command is doing a loop.
We now use the same option than curl.
  - The check_rest function doesn't test the radosgw at all due to
a wrong test command (test against a string) and always returns 0.
This needs to use the DOCKER_EXECS variable in order to execute the
command.

$ test 'wget http://192.168.100.11:8080'
$ echo $?
0

Also remove the test based on the ansible_fqdn because we only use
the ansible_hostname + rgw instance name.

Finally group all for loop into a single one.

Resolves: #3926

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Add radosgw_frontend_ssl_certificate parameter

This is necessary when configuring RGW with SSL because
in addition to passing specific frontend options, civetweb
appends the 's' character to the binding port and beast uses
ssl_endpoint instead of endpoint.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071
Signed-off-by: Giulio Fidente <gfidente@redhat.com>

tox-podman.ini: add missing reruns option

The first py.test call didn't have the reruns option. The commit
fixes it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Add condition on dashboard installer phase

Even if dashboard feature is disabled then the installer status will
still report dashboard, grafana and node-exporter roles timing.

INSTALLER STATUS **********************************
Install Ceph Monitor           : Complete (0:01:21)
Install Ceph Manager           : Complete (0:00:49)
Install Ceph Dashboard         : Complete (0:00:00)
Install Ceph Grafana           : Complete (0:00:02)

When need to set the dashboard_enabled condition on those installer
phase pre/post tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

nfs: clean template

remove legacy options

```
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max)
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs)

```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

containers: improve logging

bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-osd: Add CONTAINER_IMAGE env variable

This environment variable was added in cb381b4 but was removed in
4d35e9e.
This commit reintroduces the change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>

Set grafana_server_addr fact for ipv6 scenarios.

As the bz1721914 describes, the grafana_server_addr
fact is not defined if ip_version used is ipv6.
This commit adds the ip_version condition to set
correctly this fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914
Signed-off-by: fmount <fpantano@redhat.com>

facts: fix bug in grafana_server_addr fact setting

If no grafana-server group is defined while an mgr group is, that task
will fail because `hostvars[groups[grafana_server_group_name][0]` can't
return anything since `groups['grafana-server']` will be a non existing
key.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: clean nfs_ganesha variables

- clean some leftover.
- move nfs_ganesha_[stable|dev] in group_vars so dev_setup.yml can modify them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

nfs: add missing | bool filters

To address this warning:
```
[DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this
behaviour will go away and you might need to add |bool to the expression in the
future
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

nfs: remove duplicate task

This task is already present in pre_requisite_non_container.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: test nfs-ganesha deployment

Add back the nfs-ganesha deployment testing which was removed because of
broken dependencies.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

validate.py: Fix alphabetical order on uca

Alphabetized ceph_repository_uca keys due to errors validating when
using UCA/queens repository on Ubuntu 16.04

An exception occurred during task execution. To see the full
traceback, use -vvv. The error was:
SchemaError: -> ceph_stable_repo_uca schema item is not
alphabetically ordered

Closes: #4154
Signed-off-by: Gabriel Ramirez <gabrielramirez1109@gmail.com>