]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agoceph-osd: Add ulimit nofile on container start
Dimitri Savineau [Tue, 6 Aug 2019 15:52:59 +0000 (11:52 -0400)]
ceph-osd: Add ulimit nofile on container start

On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoSet proper ownership command performance improvement
Kevin Jones [Sat, 10 Aug 2019 19:44:32 +0000 (15:44 -0400)]
Set proper ownership command performance improvement

By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>
6 years agoceph-nfs: fail on openSUSE Leap using distro packages
Johannes Kastl [Fri, 16 Aug 2019 09:53:16 +0000 (11:53 +0200)]
ceph-nfs: fail on openSUSE Leap using distro packages

roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoinstall ceph-mds packages on SUSE/openSUSE
Johannes Kastl [Wed, 14 Aug 2019 20:48:34 +0000 (22:48 +0200)]
install ceph-mds packages on SUSE/openSUSE

install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agohandler: do not validate the server certificate against the CA
Guillaume Abrioux [Tue, 20 Aug 2019 09:47:48 +0000 (11:47 +0200)]
handler: do not validate the server certificate against the CA

Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoremove duplicate task installing suse dependencies
Johannes Kastl [Tue, 20 Aug 2019 09:23:29 +0000 (11:23 +0200)]
remove duplicate task installing suse dependencies

roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agovalidate: do not validate devices or lvm_volumes in osd_auto_discovery case
Guillaume Abrioux [Wed, 14 Aug 2019 12:20:58 +0000 (14:20 +0200)]
validate: do not validate devices or lvm_volumes in osd_auto_discovery case

we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove useless condition
Guillaume Abrioux [Mon, 19 Aug 2019 11:51:14 +0000 (13:51 +0200)]
osd: remove useless condition

just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomergify: disable automatic merging on master
Guillaume Abrioux [Tue, 6 Aug 2019 07:35:25 +0000 (09:35 +0200)]
mergify: disable automatic merging on master

automatic merging by mergify is failing for a while now.
Until we can figure out what's wrong, let's disable it on master for now
so we don't merge "failing" PRs although they passed all scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodoc: update backport section
Guillaume Abrioux [Wed, 14 Aug 2019 12:32:51 +0000 (14:32 +0200)]
doc: update backport section

Only maintainers can set labels on PRs, so let's clarify that point in
the doc which says something confusing at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: tests switch_to_containers against octopus
Guillaume Abrioux [Wed, 14 Aug 2019 11:58:47 +0000 (13:58 +0200)]
tests: tests switch_to_containers against octopus

since we have container images for ceph@master, we shouldn't use
nautilus anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: replace shell module
Guillaume Abrioux [Wed, 14 Aug 2019 09:10:12 +0000 (11:10 +0200)]
common: replace shell module

there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-mon: refact 'verify the monitor is out of the cluster' task
Guillaume Abrioux [Wed, 14 Aug 2019 09:04:30 +0000 (11:04 +0200)]
shrink-mon: refact 'verify the monitor is out of the cluster' task

use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: refact 'wait for all osd to be up' task
Guillaume Abrioux [Wed, 14 Aug 2019 08:47:40 +0000 (10:47 +0200)]
osd: refact 'wait for all osd to be up' task

let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: use discovered_interpreter_python fact
Guillaume Abrioux [Wed, 14 Aug 2019 07:56:41 +0000 (09:56 +0200)]
common: use discovered_interpreter_python fact

in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: update test_mgr_is_up()
Guillaume Abrioux [Wed, 14 Aug 2019 07:08:24 +0000 (09:08 +0200)]
tests: update test_mgr_is_up()

the data structure has changed in octopus:

```
    "mgrmap": {
        "available": true,
        "modules": [
            "dashboard",
            "prometheus"
        ],
        "num_standbys": 0,
        "services": {
            "prometheus": "http://mgr0:9283/"
        }
    },
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: update the check for 'all osd to be up'
Guillaume Abrioux [Wed, 14 Aug 2019 05:31:09 +0000 (07:31 +0200)]
osd: update the check for 'all osd to be up'

the data structure has changed in octopus.
eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorefact python installation
Guillaume Abrioux [Thu, 1 Aug 2019 07:37:34 +0000 (09:37 +0200)]
refact python installation

This commit refacts the python installation when no available.

In order to avoid generating errors, we check for each package manager
to detect which system we are running on.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: fix wrong paths for lv-create in tox.ini
Igor [Tue, 6 Aug 2019 08:57:26 +0000 (11:57 +0300)]
tests: fix wrong paths for lv-create in tox.ini
solution: change paths inside tox.ini file
Fixes: #4311
Signed-off-by: Bogomolov Igor <igor95n@gmail.com>
6 years agoRevert "tests: disable nfs-ganesha deployment"
Dimitri Savineau [Fri, 2 Aug 2019 15:48:32 +0000 (11:48 -0400)]
Revert "tests: disable nfs-ganesha deployment"

This reverts commit 83940e624bc4faf6bc7bb1a7637711556d6b3e8c.

Because nfs-ganesha@master (2.9-dev) build has been fixed by [1] then
we can test nfs-ganesha in the CI for master/octopus.

[1] https://github.com/ceph/ceph-build/pull/1346

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agomgr: refact 'wait for all mgr to be up' task
Guillaume Abrioux [Tue, 6 Aug 2019 09:11:15 +0000 (11:11 +0200)]
mgr: refact 'wait for all mgr to be up' task

There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr/dashboard: Fix grafana/prometheus url config
Dimitri Savineau [Fri, 2 Aug 2019 17:35:22 +0000 (13:35 -0400)]
mgr/dashboard: Fix grafana/prometheus url config

When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: run dashboard role on mgr/mon nodes
Dimitri Savineau [Fri, 2 Aug 2019 15:24:03 +0000 (11:24 -0400)]
dashboard: run dashboard role on mgr/mon nodes

We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-dashboard: Add run_once on delegate tasks
Dimitri Savineau [Fri, 2 Aug 2019 14:58:11 +0000 (10:58 -0400)]
ceph-dashboard: Add run_once on delegate tasks

Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoonly support openSUSE Leap 15.x, fail on 42.x
Johannes Kastl [Sat, 27 Jul 2019 14:09:26 +0000 (16:09 +0200)]
only support openSUSE Leap 15.x, fail on 42.x

openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-infra: Apply firewall rules with container
Dimitri Savineau [Wed, 31 Jul 2019 18:02:41 +0000 (14:02 -0400)]
ceph-infra: Apply firewall rules with container

We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-grafana: Set grafana uid/gid on files
Dimitri Savineau [Tue, 30 Jul 2019 20:09:47 +0000 (16:09 -0400)]
ceph-grafana: Set grafana uid/gid on files

We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: do not deploy on Debian based OS/non-containerized
Guillaume Abrioux [Wed, 31 Jul 2019 14:15:43 +0000 (16:15 +0200)]
dashboard: do not deploy on Debian based OS/non-containerized

in non-containerized deployment, we can't deploy dashboard on Debian
based distribution since the package `ceph-grafana-dashboards` isn't
available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocs: Correct weird wording
Theo Ouzhinski [Thu, 1 Aug 2019 00:25:54 +0000 (20:25 -0400)]
docs: Correct weird wording

for the Ceph master branch.

Signed-off-by: Theo Ouzhinski touzhinski@gmail.com
6 years agotests/shrink_rgw: Disable dashboard
Dimitri Savineau [Wed, 31 Jul 2019 18:17:48 +0000 (14:17 -0400)]
tests/shrink_rgw: Disable dashboard

The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: add more memory in podman job
Guillaume Abrioux [Tue, 30 Jul 2019 11:04:15 +0000 (13:04 +0200)]
tests: add more memory in podman job

Typical error :

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - podman
  - exec
  - ceph-mon-mon0
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/ssl
  - 'false'
  delta: '0:00:00.644870'
  end: '2019-07-30 10:17:32.715639'
  msg: non-zero return code
  rc: 1
  start: '2019-07-30 10:17:32.070769'
  stderr: |-
    Traceback (most recent call last):
      File "/usr/bin/ceph", line 140, in <module>
        import rados
    ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
    Error: exit status 1
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Let's add more memory to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: deploy dashboard on mons
Guillaume Abrioux [Tue, 30 Jul 2019 09:25:59 +0000 (11:25 +0200)]
tests: deploy dashboard on mons

there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: fix timeout usage on rgw user creation command
Guillaume Abrioux [Tue, 30 Jul 2019 08:16:23 +0000 (10:16 +0200)]
dashboard: fix timeout usage on rgw user creation command

For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests/functional: add a test for shrink-rgw.yml
Rishabh Dave [Wed, 26 Jun 2019 06:09:45 +0000 (11:39 +0530)]
tests/functional: add a test for shrink-rgw.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook the remove rgw from a given node
Rishabh Dave [Wed, 26 Jun 2019 05:59:50 +0000 (11:29 +0530)]
add a playbook the remove rgw from a given node

Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoosd: add 'osd blacklist' cap for osp keyrings
Guillaume Abrioux [Mon, 15 Jul 2019 07:57:06 +0000 (09:57 +0200)]
osd: add 'osd blacklist' cap for osp keyrings

This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd: check container engine rc for pools
Dimitri Savineau [Mon, 22 Jul 2019 20:58:40 +0000 (16:58 -0400)]
ceph-osd: check container engine rc for pools

When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: test dashboard deployment with podman scenario
Guillaume Abrioux [Mon, 29 Jul 2019 12:08:40 +0000 (14:08 +0200)]
tests: test dashboard deployment with podman scenario

This commit adds a grafana-server section in order to test dashboard
deployment with podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: add checks for grafana-server group definition
Guillaume Abrioux [Mon, 29 Jul 2019 08:03:48 +0000 (10:03 +0200)]
validate: add checks for grafana-server group definition

this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: fix a typo
Guillaume Abrioux [Fri, 26 Jul 2019 15:33:07 +0000 (17:33 +0200)]
mgr: fix a typo

this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: remove cfg80211 module installation
Guillaume Abrioux [Fri, 26 Jul 2019 15:23:19 +0000 (17:23 +0200)]
dashboard: remove cfg80211 module installation

According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi          Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] https://github.com/ceph/ceph-ansible/commit/dbf81b6b5bd6d7e977706a93f9e75b38efe32305#diff-961545214e21efed3b84a9e178927a08L21-L23

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: use dedicated group only
Guillaume Abrioux [Thu, 25 Jul 2019 17:08:22 +0000 (19:08 +0200)]
dashboard: use dedicated group only

There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: move code into a dedicated playbook
Dimitri Savineau [Fri, 28 Jun 2019 14:39:38 +0000 (10:39 -0400)]
dashboard: move code into a dedicated playbook

Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: enable dashboard by default
Guillaume Abrioux [Thu, 25 Jul 2019 12:29:11 +0000 (14:29 +0200)]
dashboard: enable dashboard by default

This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoRemove NBSP characters
Dimitri Savineau [Thu, 18 Jul 2019 18:57:46 +0000 (14:57 -0400)]
Remove NBSP characters

Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoinfra-playbooks: rewite a condition for better readability
Rishabh Dave [Thu, 25 Jul 2019 11:32:32 +0000 (17:02 +0530)]
infra-playbooks: rewite a condition for better readability

Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agocontainer: rename docker directories
Guillaume Abrioux [Wed, 24 Jul 2019 08:44:34 +0000 (10:44 +0200)]
container: rename docker directories

Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: disable nfs-ganesha deployment
Guillaume Abrioux [Tue, 23 Jul 2019 07:50:51 +0000 (09:50 +0200)]
tests: disable nfs-ganesha deployment

nfs-ganesha repositories @ dev are broken, this commit disables the
nfs-ganesha deployment so the CI isn't stuck.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoAvoid to setup provisioners in a fully containerized environment
fmount [Fri, 12 Jul 2019 09:03:57 +0000 (11:03 +0200)]
Avoid to setup provisioners in a fully containerized environment

This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agoFix backward compat with old cephfs_pools format
Giulio Fidente [Fri, 19 Jul 2019 08:58:49 +0000 (10:58 +0200)]
Fix backward compat with old cephfs_pools format

Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>
6 years agohandler: fix bug in osd handlers
Guillaume Abrioux [Thu, 18 Jul 2019 12:06:23 +0000 (14:06 +0200)]
handler: fix bug in osd handlers

fbf4ed42aee8fa5fd18c4c289cbb80ffeda8f72e introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agolibrary/ceph_volume.py: remove six dependency
Dimitri Savineau [Fri, 12 Jul 2019 14:19:48 +0000 (10:19 -0400)]
library/ceph_volume.py: remove six dependency

The ceph nodes couldn't have the python six library installed which
could lead to error during the ceph_volume custom module execution.

  ImportError: No module named six

The six library isn't useful in this module if we're sure that all
action variables passed to the build_ceph_volume_cmd function are a
list and not a string.

Resolves: #4071

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agovalidate: fail if gpt header found on unprepared devices
Guillaume Abrioux [Wed, 17 Jul 2019 13:55:12 +0000 (15:55 +0200)]
validate: fail if gpt header found on unprepared devices

ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-dashboard: enable rgw options conditionally
Dimitri Savineau [Thu, 11 Jul 2019 14:38:44 +0000 (10:38 -0400)]
ceph-dashboard: enable rgw options conditionally

The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests/dashboard: use the dedicated grafana node
Dimitri Savineau [Thu, 11 Jul 2019 14:06:09 +0000 (10:06 -0400)]
tests/dashboard: use the dedicated grafana node

The Vagrant dashboard scenario creates a dedicated grafana node but
was not use in the ansible inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: use variables for port value
Dimitri Savineau [Wed, 10 Jul 2019 21:15:45 +0000 (17:15 -0400)]
dashboard: use variables for port value

The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: remove useless setting
Guillaume Abrioux [Mon, 15 Jul 2019 08:57:46 +0000 (10:57 +0200)]
tests: remove useless setting

this setting is not needed here since we explicitely set it for
container and non container context.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-rbdmirror: check if rbdmirror is well removed from cluster 4238/head
Guillaume Abrioux [Thu, 11 Jul 2019 09:26:50 +0000 (11:26 +0200)]
shrink-rbdmirror: check if rbdmirror is well removed from cluster

This commits adds a check to ensure the daemon has been removed from the
cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests/functional: add a test for shrink-rbdmirror.yml
Rishabh Dave [Tue, 25 Jun 2019 13:30:53 +0000 (19:00 +0530)]
tests/functional: add a test for shrink-rbdmirror.yml

Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes rbd-mirror from a node
Rishabh Dave [Tue, 25 Jun 2019 13:08:17 +0000 (18:38 +0530)]
add a playbook that removes rbd-mirror from a node

Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-infra: update handler with daemon variable
Dimitri Savineau [Wed, 10 Jul 2019 18:58:58 +0000 (14:58 -0400)]
ceph-infra: update handler with daemon variable

Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-infra: Open prometheus port
Dimitri Savineau [Wed, 10 Jul 2019 21:19:29 +0000 (17:19 -0400)]
ceph-infra: Open prometheus port

The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agohandler: remove legacy condition
Guillaume Abrioux [Tue, 9 Jul 2019 14:03:26 +0000 (16:03 +0200)]
handler: remove legacy condition

since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: improve message printed in check_devices.yml
Guillaume Abrioux [Wed, 10 Jul 2019 13:08:39 +0000 (15:08 +0200)]
validate: improve message printed in check_devices.yml

The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-iscsi: Update gateway config/template
Dimitri Savineau [Mon, 8 Jul 2019 18:36:07 +0000 (14:36 -0400)]
ceph-iscsi: Update gateway config/template

- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-dashboard: remove bool filter for rgw vars
Dimitri Savineau [Mon, 8 Jul 2019 13:43:13 +0000 (09:43 -0400)]
ceph-dashboard: remove bool filter for rgw vars

Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: Use upstream default port
Boris Ranto [Tue, 9 Jul 2019 20:32:38 +0000 (22:32 +0200)]
dashboard: Use upstream default port

We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>
6 years agotests/functional: add a test for shrink-mgr.yml
Rishabh Dave [Thu, 20 Jun 2019 10:23:22 +0000 (15:53 +0530)]
tests/functional: add a test for shrink-mgr.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes manager from a node
Rishabh Dave [Wed, 12 Jun 2019 04:46:00 +0000 (10:16 +0530)]
add a playbook that removes manager from a node

Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-handler: fix cluster name in socket path
Dimitri Savineau [Wed, 3 Jul 2019 15:26:42 +0000 (11:26 -0400)]
ceph-handler: fix cluster name in socket path

c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoshrink-mds: refact post tasks
Guillaume Abrioux [Wed, 3 Jul 2019 08:45:46 +0000 (10:45 +0200)]
shrink-mds: refact post tasks

This commit refacts the way we check the "mds_to_kill" node is well
stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotests/functional: add a test for shrink-mds.yml
Rishabh Dave [Fri, 10 May 2019 16:10:07 +0000 (21:40 +0530)]
tests/functional: add a test for shrink-mds.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes mds from a node
Rishabh Dave [Thu, 20 Jun 2019 09:59:25 +0000 (15:29 +0530)]
add a playbook that removes mds from a node

Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoAdd package-install tag on ceph-grafana-dashboard pkg install.
fmount [Thu, 4 Jul 2019 09:43:39 +0000 (11:43 +0200)]
Add package-install tag on ceph-grafana-dashboard pkg install.

According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.

See: /issues/4197 for details

Fixes: #4197
Signed-off-by: fmount <fpantano@redhat.com>
6 years agoceph-iscsi-gw: Update log directories bind mount
Dimitri Savineau [Thu, 4 Jul 2019 14:10:00 +0000 (10:10 -0400)]
ceph-iscsi-gw: Update log directories bind mount

On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-mon: Fix cluster name parameter
ilyashestopalov [Wed, 3 Jul 2019 16:58:32 +0000 (19:58 +0300)]
ceph-mon: Fix cluster name parameter

The ability to add nodes with the monitor role to an existing cluster
whose name differs from the default name is fixed.

Signed-off-by: ilyashestopalov <usr.tester@yandex.ru>
6 years agoiscsi: refact deprecated variables
Guillaume Abrioux [Tue, 2 Jul 2019 13:30:12 +0000 (15:30 +0200)]
iscsi: refact deprecated variables

This commit moves some old variables into ceph-defaults so we can move
the `use_new_ceph_iscsi` fact in ceph-facts role in order.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoigw: Add check for missing iqn
Mike Christie [Thu, 30 May 2019 16:17:09 +0000 (11:17 -0500)]
igw: Add check for missing iqn

If the user is still using the older packages and does not setup
the target iqn you will just get a vague error message later on.
This adds a check during the validate task, so it is clear to the
user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Add check for mismatch ceph-iscsi and iscsigws.yml settings
Mike Christie [Mon, 13 May 2019 17:59:27 +0000 (12:59 -0500)]
igw: Add check for mismatch ceph-iscsi and iscsigws.yml settings

If the user has manually installed ceph-iscsi but is trying to setup a
iscsi object in iscsigws.yml you will just a python crash. This patch
adds a check and more user friendly error message for the case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Update tests to use ceph-iscsi package
Mike Christie [Thu, 6 Jun 2019 14:47:56 +0000 (09:47 -0500)]
igw: Update tests to use ceph-iscsi package

gateway_ip_list is depreciated and is only used when using the old
ceph-iscsi-config/cli packages that are no longer being developed
(GH repos are archived). Because ceph-iscsi-config/cli is no longer
being worked on, this modifies the tests to stress the ceph-iscsi
based installs.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Update iscsigws.yml.sample for ceph-iscsi support
Mike Christie [Sun, 12 May 2019 07:38:46 +0000 (02:38 -0500)]
igw: Update iscsigws.yml.sample for ceph-iscsi support

Update iscsigws.yml.sample to document that we cannot use ansible to
setup iSCSI objects and use the new ceph-iscsi package.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Support ceph-iscsi package for install
Mike Christie [Thu, 30 May 2019 16:28:34 +0000 (11:28 -0500)]
igw: Support ceph-iscsi package for install

This adds support for the ceph-iscsi package during install. ceph-iscsi
does not support setting up targets/gws, luns and clients with the
current library/igw_* code. Going forward those tasks should be done with
gwcli or dashboard. ceph-iscsi will only be used if the user has no iscsi
objects setup so we do not break existing setups.

The next patch will update the iscsigws.yml.sample to document that
users must not setup any iscsi object if they want to use the new
package and tools.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: drop gateway_ip_list for container setups
Mike Christie [Thu, 30 May 2019 15:55:45 +0000 (10:55 -0500)]
igw: drop gateway_ip_list for container setups

The gateway_ip_list is not used in container setups, so drop it
for that case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: move gateway_ip_list check to validate role
Mike Christie [Thu, 30 May 2019 15:54:04 +0000 (10:54 -0500)]
igw: move gateway_ip_list check to validate role

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Support new ceph-iscsi package during purge
Mike Christie [Sat, 11 May 2019 22:23:37 +0000 (17:23 -0500)]
igw: Support new ceph-iscsi package during purge

The ceph-iscsi-config and ceph-iscsi-cli packages were combined into
ceph-iscsi and its APIs changed. This fixes up the iscsi purge task to
support the new API and old one.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agotests: wait 30sec before running testinfra
Guillaume Abrioux [Wed, 3 Jul 2019 13:27:18 +0000 (15:27 +0200)]
tests: wait 30sec before running testinfra

adding back a sleep 30s after nodes have rebooted before running
testinfra.
This was removed accidentally by d5be83e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-handler: Fix rgw socket in restart script
Dimitri Savineau [Tue, 7 May 2019 20:33:21 +0000 (16:33 -0400)]
ceph-handler: Fix rgw socket in restart script

Since Mimic the radosgw socket has two extra fields in the socket
name (before the .asok suffix): <pid>.<ctid>

Before:
  /var/run/ceph/ceph-client.rgw.cephaio-1.asok
After:
  /var/run/ceph/ceph-client.rgw.cephaio-1.16913.23928832.asok

The radosgw restart script doesn't handle this and could fail during
an upgrade.
If the SOCKETS variable isn't defined in the script then the test
command won't fail because the return code is 0

$ test -S
$ echo $?
0

There multiple issues in that script:
  - The default SOCKETS value isn't defined due to a typo
SOCKET vs SOCKETS.
  - Because the socket name uses the pid then we need to check the
socket name after the service restart.
  - After restarting the radosgw service we need to wait few seconds
otherwise the socket won't be created.
  - Update the wget parameters because the command is doing a loop.
We now use the same option than curl.
  - The check_rest function doesn't test the radosgw at all due to
a wrong test command (test against a string) and always returns 0.
This needs to use the DOCKER_EXECS variable in order to execute the
command.

$ test 'wget http://192.168.100.11:8080'
$ echo $?
0

Also remove the test based on the ansible_fqdn because we only use
the ansible_hostname + rgw instance name.

Finally group all for loop into a single one.

Resolves: #3926

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd radosgw_frontend_ssl_certificate parameter
Giulio Fidente [Wed, 19 Jun 2019 12:59:15 +0000 (14:59 +0200)]
Add radosgw_frontend_ssl_certificate parameter

This is necessary when configuring RGW with SSL because
in addition to passing specific frontend options, civetweb
appends the 's' character to the binding port and beast uses
ssl_endpoint instead of endpoint.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1722071
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
6 years agotox-podman.ini: add missing reruns option
Dimitri Savineau [Fri, 28 Jun 2019 20:07:20 +0000 (16:07 -0400)]
tox-podman.ini: add missing reruns option

The first py.test call didn't have the reruns option. The commit
fixes it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd condition on dashboard installer phase
Dimitri Savineau [Fri, 28 Jun 2019 18:14:53 +0000 (14:14 -0400)]
Add condition on dashboard installer phase

Even if dashboard feature is disabled then the installer status will
still report dashboard, grafana and node-exporter roles timing.

INSTALLER STATUS **********************************
Install Ceph Monitor           : Complete (0:01:21)
Install Ceph Manager           : Complete (0:00:49)
Install Ceph Dashboard         : Complete (0:00:00)
Install Ceph Grafana           : Complete (0:00:02)

When need to set the dashboard_enabled condition on those installer
phase pre/post tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agonfs: clean template
Guillaume Abrioux [Fri, 21 Jun 2019 16:04:47 +0000 (18:04 +0200)]
nfs: clean template

remove legacy options

```
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:13): Unknown parameter (Dir_Max)
ganesha.nfsd-115[main] config_errs_to_log :CONFIG :WARN :Config File (/etc/ganesha/ganesha.conf:14): Unknown parameter (Cache_FDs)

```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocontainers: improve logging
Guillaume Abrioux [Wed, 26 Jun 2019 13:38:58 +0000 (15:38 +0200)]
containers: improve logging

bindmount /var/log/ceph on all containers so it's possible to retrieve
logs from the host.

related ceph-container PR: ceph/ceph-container#1408

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1710548
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd: Add CONTAINER_IMAGE env variable
Dimitri Savineau [Thu, 27 Jun 2019 14:26:40 +0000 (10:26 -0400)]
ceph-osd: Add CONTAINER_IMAGE env variable

This environment variable was added in cb381b4 but was removed in
4d35e9e.
This commit reintroduces the change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoSet grafana_server_addr fact for ipv6 scenarios.
fmount [Wed, 19 Jun 2019 11:52:31 +0000 (13:52 +0200)]
Set grafana_server_addr fact for ipv6 scenarios.

As the bz1721914 describes, the grafana_server_addr
fact is not defined if ip_version used is ipv6.
This commit adds the ip_version condition to set
correctly this fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721914
Signed-off-by: fmount <fpantano@redhat.com>
6 years agofacts: fix bug in grafana_server_addr fact setting
Guillaume Abrioux [Mon, 24 Jun 2019 20:30:23 +0000 (22:30 +0200)]
facts: fix bug in grafana_server_addr fact setting

If no grafana-server group is defined while an mgr group is, that task
will fail because `hostvars[groups[grafana_server_group_name][0]` can't
return anything since `groups['grafana-server']` will be a non existing
key.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: clean nfs_ganesha variables
Guillaume Abrioux [Tue, 25 Jun 2019 16:03:27 +0000 (18:03 +0200)]
tests: clean nfs_ganesha variables

- clean some leftover.
- move nfs_ganesha_[stable|dev] in group_vars so dev_setup.yml can modify them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agonfs: add missing | bool filters
Guillaume Abrioux [Tue, 25 Jun 2019 15:11:28 +0000 (17:11 +0200)]
nfs: add missing | bool filters

To address this warning:
```
[DEPRECATION WARNING]: evaluating nfs_ganesha_dev as a bare variable, this
behaviour will go away and you might need to add |bool to the expression in the
 future
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agonfs: remove duplicate task
Guillaume Abrioux [Tue, 25 Jun 2019 08:23:57 +0000 (10:23 +0200)]
nfs: remove duplicate task

This task is already present in pre_requisite_non_container.yml

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: test nfs-ganesha deployment
Guillaume Abrioux [Tue, 25 Jun 2019 05:44:43 +0000 (07:44 +0200)]
tests: test nfs-ganesha deployment

Add back the nfs-ganesha deployment testing which was removed because of
broken dependencies.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate.py: Fix alphabetical order on uca
Gabriel Ramirez [Tue, 25 Jun 2019 04:52:11 +0000 (21:52 -0700)]
validate.py: Fix alphabetical order on uca

Alphabetized ceph_repository_uca keys due to errors validating when
using UCA/queens repository on Ubuntu 16.04

An exception occurred during task execution. To see the full
traceback, use -vvv. The error was:
SchemaError: -> ceph_stable_repo_uca  schema item is not
alphabetically ordered

Closes: #4154
Signed-off-by: Gabriel Ramirez <gabrielramirez1109@gmail.com>