]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agolint: fix error [201,206]
Dimitri Savineau [Thu, 29 Aug 2019 18:11:46 +0000 (14:11 -0400)]
lint: fix error [201,206]

 [201] Trailing whitespace
 [206] Variables should have spaces before and after: {{ var_name }}

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-common: remove ceph_stable repo on dev
Dimitri Savineau [Tue, 9 Apr 2019 13:44:27 +0000 (09:44 -0400)]
ceph-common: remove ceph_stable repo on dev

When upgrading from stable to devel release with redhat community
packages, the rpm packages are not updated due to priority introduced
via a7b1e35 (starting nautilus).
We need to remove the ceph stable repositories when configuring the
dev repositories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd octopus release
Dimitri Savineau [Thu, 4 Apr 2019 16:52:49 +0000 (12:52 -0400)]
Add octopus release

Add the 15th ceph release: octopus.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoAdd http_addr option to grafana config
fmount [Fri, 23 Aug 2019 08:00:30 +0000 (10:00 +0200)]
Add http_addr option to grafana config

We have no reason to make grafana container
listen on *:<port>, so this change adds the
http_addr option to the grafana config file
and adds the related option on the wait_for
tasks.
Since grafana_server_addr should exists, we
shouldn't rely on the _current_monitor_addr
default on prometheus/grafana templates.
This change also remove this default value
that is not necessary anymore.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agoceph_custom_repo: define apt and rpm key for custom repo
Anthony Rusdi [Sun, 25 Aug 2019 18:47:32 +0000 (01:47 +0700)]
ceph_custom_repo: define apt and rpm key for custom repo

This commit also remove the notify on new added debian repo,
force update_cache to yes and define sample ceph_custom_key vars.

Signed-off-by: Anthony Rusdi <33247310+antrusd@users.noreply.github.com>
6 years agoopenSUSE OBS repo using ceph_stable_release
Johannes Kastl [Wed, 21 Aug 2019 19:45:57 +0000 (21:45 +0200)]
openSUSE OBS repo using ceph_stable_release

Instead of hardcoding `luminous`, use the `ceph_stable_release` variable
to point to the correct repository.

This is now uncommented in roles/ceph-defaults/defaults/main.yml to be
available, as it is only used if ceph_repository is set to 'obs'.

group_vars/*.sample files have been regenerated using the
./generate_group_vars_sample.sh script.

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agofix openSUSE OBS repo creation
Johannes Kastl [Thu, 22 Aug 2019 20:12:51 +0000 (22:12 +0200)]
fix openSUSE OBS repo creation

roles/ceph-common/tasks/installs/suse_obs_repository.yml:
ansible's zypper_repository module does not know a parameter 'uri', this is
called 'repo' instead

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-infra: open ceph iscsi/prometheus port
Nick Erdmann [Tue, 27 Aug 2019 09:25:02 +0000 (11:25 +0200)]
ceph-infra: open ceph iscsi/prometheus port

Signed-off-by: Nick Erdmann <n@nirf.de>
6 years agotests: use a single grafana node on podman
Dimitri Savineau [Wed, 28 Aug 2019 14:59:30 +0000 (10:59 -0400)]
tests: use a single grafana node on podman

We don't use multiple grafana nodes for the moment on the others
scenarios and I don't think this is supposed to be working.
We can often see failure on grafana on that scenario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: change container image tag for switch_to_containers
Guillaume Abrioux [Fri, 23 Aug 2019 07:29:19 +0000 (09:29 +0200)]
tests: change container image tag for switch_to_containers

test switch_to_containers job against the latest ceph@master
ceph-container image tag available.
In order to be sure the ceph release deployed in the first step (non
containerized deployment) isn't newer than the tag used for the
containerized migration (which would mean we try to downgrade the
version).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoset discovered_python_interpreter if ansible_python_interpreter is defined
Johannes Kastl [Thu, 22 Aug 2019 15:46:05 +0000 (17:46 +0200)]
set discovered_python_interpreter if ansible_python_interpreter is defined

If the user has set the `ansible_python_interpreter`, ansible will not try to
discover python, so `discovered_python_interpreter` will not be set.

Solution: Set `discovered_python_interpreter` to `ansible_python_interpreter`
if `ansible_python_interpreter` is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-mon: Bind mount the ca-trust directory
Dimitri Savineau [Mon, 26 Aug 2019 14:47:05 +0000 (10:47 -0400)]
ceph-mon: Bind mount the ca-trust directory

On containerized deployment, the mon container sometimes needs to
access to the radosgw endpoint (via the radosgw-admin command). When
using TLS on the radosgw with self-signed certificates then we need to
access to the CA certification from the mon container.
The CA certificate needs to be added on the host and then the directory
will be bind mount on the container.

Resolves: #4358

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-client: Use profile rbd in keyring caps
Dimitri Savineau [Mon, 26 Aug 2019 19:35:19 +0000 (15:35 -0400)]
ceph-client: Use profile rbd in keyring caps

Like the OpenStack keyrings, we can use the profile rbd for the clients
keyring (both mon and osd).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoRevert "osd: add 'osd blacklist' cap for osp keyrings"
Dimitri Savineau [Mon, 26 Aug 2019 19:04:41 +0000 (15:04 -0400)]
Revert "osd: add 'osd blacklist' cap for osp keyrings"

This reverts commit 2d955757ee9324a018374f628664e2e15dcb7903.

The "osd blacklist" isn't an osd caps but should be used with mon caps.
Also the correct caps for this is: 'allow command "osd blacklist"'.
The current change is breaking the openstack and clients keyrings.
By using the profile rbd (which is already used) we already rely on the
ability to blacklist dead client.

Resolves: #4385

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoglobal: add newline at end of file
Guillaume Abrioux [Thu, 22 Aug 2019 18:29:40 +0000 (20:29 +0200)]
global: add newline at end of file

This commit re-add a newline at end of files when it's missing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoglobal: make directories mode parameterizable
Artur Fijalkowski [Wed, 1 Aug 2018 12:37:40 +0000 (14:37 +0200)]
global: make directories mode parameterizable

This commit makes it possible to parametrize the ceph directories modes.
So it changes hardocded mode for ceph related directories from 0755 to
customizable with `ceph_directories_mode` variable.

Closes: #2920
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf
guihecheng [Wed, 23 Jan 2019 01:36:25 +0000 (09:36 +0800)]
rgw/multisite: assign 'rgw_zone' to the exact section in ceph.conf

since the following commit:
  commit 1ac94c048ff1d1385de2892d0ecef7879ec563e9
  rgw: add support for multiple rgw instances on a single host

we have multi-instance rgw support on a single host and
the config section name of the rgw changed from
[client.rgw.$(hostname)] -> [client.rgw.$(hostname).rgwX]
when X is the sequence number: 0,1,2,...
So we should assign 'rgw_zone' item to the exact rgw instance
config section in ceph.conf

Signed-off-by: guihecheng <guihecheng@cmiot.chinamobile.com>
6 years agolint: fix error [301], add `changed_when: false` when needed
Guillaume Abrioux [Wed, 31 Jul 2019 07:51:12 +0000 (09:51 +0200)]
lint: fix error [301], add `changed_when: false` when needed

This commit fixes the error [301]:

`[301] Commands should not change things if nothing needs doing`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agolint: fix error [306], add pipefail on shell command using pipe
Guillaume Abrioux [Wed, 31 Jul 2019 07:31:50 +0000 (09:31 +0200)]
lint: fix error [306], add pipefail on shell command using pipe

This commit fixes the error [306]:

`[306] Shells that use pipes should set the pipefail option`

using `/bin/bash` as executable because Debian/Ubuntu systems use `dash`
by default which doesn't have the `-o pipefail`. (See:
https://github.com/ansible/ansible-lint/issues/497#issue-424623501)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoplugins/actions/validate.py: allow ceph_repository 'obs' on openSUSE
Johannes Kastl [Wed, 21 Aug 2019 19:54:03 +0000 (21:54 +0200)]
plugins/actions/validate.py: allow ceph_repository 'obs' on openSUSE

Allow the use of 'obs' as a valid value for ceph_repository, and validate that
- OS is openSUSE
- ceph_obs_repo is defined

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-validate: Refactor check for installation check on SUSE/openSUSE
Johannes Kastl [Wed, 21 Aug 2019 18:56:36 +0000 (20:56 +0200)]
ceph-validate: Refactor check for installation check on SUSE/openSUSE

Move the validation from roles/ceph-common/tasks/installs/install_on_suse.yml
to roles/ceph-validate/ and fix the syntax.

There are two valid combinations of `ceph_origin` and `ceph_repository` on
SUSE/openSUSE:
- ceph_origin == 'distro'
- ceph_origin == 'repository' and ceph_repository == 'obs'

The current when condition would fail even in the valid second combination,
as ceph_origin != distro would be true then

Fixes: #4362
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agofacts: fix a typo
Johannes Kastl [Thu, 22 Aug 2019 15:39:47 +0000 (17:39 +0200)]
facts: fix a typo

This commit fixes a typo in roles/ceph-facts/tasks/facts.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-config: Set changed_when to false on fact gathering statements
Kevin Coakley [Thu, 8 Aug 2019 22:32:38 +0000 (15:32 -0700)]
ceph-config: Set changed_when to false on fact gathering statements

The "run 'ceph-volume lvm batch --report' to see how many osds are to be
created" and "run 'ceph-volume lvm list' to see how many osds have already been
created" statements only register the lvm_batch_report and lvm_list variables.
Running those ceph-volume commands should never produce a change on the system.
Adding changed_when: false prevents irrelevant change messages from Ansible.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
6 years agofix SUSE/openSUSE naming
Johannes Kastl [Wed, 21 Aug 2019 16:17:00 +0000 (18:17 +0200)]
fix SUSE/openSUSE naming

As SUSE 15.x and openSUSE Leap 15.x share the same base, make clear
that both are targeted by the respective tasks

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoroles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions
Johannes Kastl [Wed, 21 Aug 2019 19:36:38 +0000 (21:36 +0200)]
roles/ceph-validate/tasks/check_system.yml: fail on unsupported SUSE versions

Fail if SUSE distributions other than 15.x are found, similar to what we have
for openSUSE

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-osd: Add ulimit nofile on container start
Dimitri Savineau [Tue, 6 Aug 2019 15:52:59 +0000 (11:52 -0400)]
ceph-osd: Add ulimit nofile on container start

On containerized deployment, the OSD entrypoint runs some ceph-volume
commands (lvm/simple scan and/or activate) which perform badly without
the ulimit option.
This option was added for all previous ceph-volume commands but not on
the ceph-osd container startup.
Also updating hard limit value to 4096 to reflect default baremetal
value.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoSet proper ownership command performance improvement
Kevin Jones [Sat, 10 Aug 2019 19:44:32 +0000 (15:44 -0400)]
Set proper ownership command performance improvement

By changing the set ownership command from using the file module in combination with a with_items loop to a raw chown command, we can achieve a 98% performance increase here.

On a ceph cluster with a significant amount of directories and files in /var/lib/ceph, the file module has to run checks on ownership of all those directories and files to determine whether a change is needed.

In this case, we just want to explicitly set the ownership of all these directories and files to the ceph_uid

Added context note to all set proper ownership tasks

Signed-off-by: Kevin Jones <kevinjones@redhat.com>
6 years agoceph-nfs: fail on openSUSE Leap using distro packages
Johannes Kastl [Fri, 16 Aug 2019 09:53:16 +0000 (11:53 +0200)]
ceph-nfs: fail on openSUSE Leap using distro packages

roles/ceph-validate/tasks/check_nfs.yml: fail on openSUSE Leap
using `ceph_origin = distro`, as the ganesha packages are not available from
the distribution repositories

Fixes: #4342
Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoinstall ceph-mds packages on SUSE/openSUSE
Johannes Kastl [Wed, 14 Aug 2019 20:48:34 +0000 (22:48 +0200)]
install ceph-mds packages on SUSE/openSUSE

install packages on SUSE/openSUSE distributions, using the
same logic as on RedHat-based distributions

Fixes #4340

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agohandler: do not validate the server certificate against the CA
Guillaume Abrioux [Tue, 20 Aug 2019 09:47:48 +0000 (11:47 +0200)]
handler: do not validate the server certificate against the CA

Otherwise rgw handler ends up with an error when using https.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoremove duplicate task installing suse dependencies
Johannes Kastl [Tue, 20 Aug 2019 09:23:29 +0000 (11:23 +0200)]
remove duplicate task installing suse dependencies

roles/ceph-common/tasks/installs/install_on_suse.yml: remove the task that
installs the dependencies, as this is done later in install_suse_packages.yml

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agovalidate: do not validate devices or lvm_volumes in osd_auto_discovery case
Guillaume Abrioux [Wed, 14 Aug 2019 12:20:58 +0000 (14:20 +0200)]
validate: do not validate devices or lvm_volumes in osd_auto_discovery case

we shouldn't validate these two variables when `osd_auto_discovery` is
set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1644623
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove useless condition
Guillaume Abrioux [Mon, 19 Aug 2019 11:51:14 +0000 (13:51 +0200)]
osd: remove useless condition

just like `ceph_osd_pool_default_size`, a pool size might change after an
initial deployment. Having this condition prevents from customizing the
pool in that case.
This is not needed so let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomergify: disable automatic merging on master
Guillaume Abrioux [Tue, 6 Aug 2019 07:35:25 +0000 (09:35 +0200)]
mergify: disable automatic merging on master

automatic merging by mergify is failing for a while now.
Until we can figure out what's wrong, let's disable it on master for now
so we don't merge "failing" PRs although they passed all scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodoc: update backport section
Guillaume Abrioux [Wed, 14 Aug 2019 12:32:51 +0000 (14:32 +0200)]
doc: update backport section

Only maintainers can set labels on PRs, so let's clarify that point in
the doc which says something confusing at the moment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: tests switch_to_containers against octopus
Guillaume Abrioux [Wed, 14 Aug 2019 11:58:47 +0000 (13:58 +0200)]
tests: tests switch_to_containers against octopus

since we have container images for ceph@master, we shouldn't use
nautilus anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: replace shell module
Guillaume Abrioux [Wed, 14 Aug 2019 09:10:12 +0000 (11:10 +0200)]
common: replace shell module

there is no need to use `shell` in these tasks. Let's use `command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-mon: refact 'verify the monitor is out of the cluster' task
Guillaume Abrioux [Wed, 14 Aug 2019 09:04:30 +0000 (11:04 +0200)]
shrink-mon: refact 'verify the monitor is out of the cluster' task

use `from_json` filter instead of a `| python` so we can get rid of the
`shell` module usage here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: refact 'wait for all osd to be up' task
Guillaume Abrioux [Wed, 14 Aug 2019 08:47:40 +0000 (10:47 +0200)]
osd: refact 'wait for all osd to be up' task

let's use `until` instead of doing test in bash using python oneliner
also, use `command` instead of `shell`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agocommon: use discovered_interpreter_python fact
Guillaume Abrioux [Wed, 14 Aug 2019 07:56:41 +0000 (09:56 +0200)]
common: use discovered_interpreter_python fact

in order to use the right binary name when using python cli in command
or shell module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: update test_mgr_is_up()
Guillaume Abrioux [Wed, 14 Aug 2019 07:08:24 +0000 (09:08 +0200)]
tests: update test_mgr_is_up()

the data structure has changed in octopus:

```
    "mgrmap": {
        "available": true,
        "modules": [
            "dashboard",
            "prometheus"
        ],
        "num_standbys": 0,
        "services": {
            "prometheus": "http://mgr0:9283/"
        }
    },
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: update the check for 'all osd to be up'
Guillaume Abrioux [Wed, 14 Aug 2019 05:31:09 +0000 (07:31 +0200)]
osd: update the check for 'all osd to be up'

the data structure has changed in octopus.
eg: the path to `num_osds` is now `["osdmap"]["num_osds"]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorefact python installation
Guillaume Abrioux [Thu, 1 Aug 2019 07:37:34 +0000 (09:37 +0200)]
refact python installation

This commit refacts the python installation when no available.

In order to avoid generating errors, we check for each package manager
to detect which system we are running on.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: fix wrong paths for lv-create in tox.ini
Igor [Tue, 6 Aug 2019 08:57:26 +0000 (11:57 +0300)]
tests: fix wrong paths for lv-create in tox.ini
solution: change paths inside tox.ini file
Fixes: #4311
Signed-off-by: Bogomolov Igor <igor95n@gmail.com>
6 years agoRevert "tests: disable nfs-ganesha deployment"
Dimitri Savineau [Fri, 2 Aug 2019 15:48:32 +0000 (11:48 -0400)]
Revert "tests: disable nfs-ganesha deployment"

This reverts commit 83940e624bc4faf6bc7bb1a7637711556d6b3e8c.

Because nfs-ganesha@master (2.9-dev) build has been fixed by [1] then
we can test nfs-ganesha in the CI for master/octopus.

[1] https://github.com/ceph/ceph-build/pull/1346

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agomgr: refact 'wait for all mgr to be up' task
Guillaume Abrioux [Tue, 6 Aug 2019 09:11:15 +0000 (11:11 +0200)]
mgr: refact 'wait for all mgr to be up' task

There's no need to use `shell` module here.
Instead of using `| python -c`, let's use `from_json` filter.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr/dashboard: Fix grafana/prometheus url config
Dimitri Savineau [Fri, 2 Aug 2019 17:35:22 +0000 (13:35 -0400)]
mgr/dashboard: Fix grafana/prometheus url config

When configuring grafana/prometheus embed in the mgr/dashboard, we need
to use the address of the grafana-server node and not the current
hostname because mgr/dashboard and grafana/prometheus could be present
on different hosts.
We should instead rely on the grafana_server_addr variable and remove
the dashboard_url.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: run dashboard role on mgr/mon nodes
Dimitri Savineau [Fri, 2 Aug 2019 15:24:03 +0000 (11:24 -0400)]
dashboard: run dashboard role on mgr/mon nodes

We don't need to execute the ceph-dashboard role on the nodes present
in the grafana-server group. This one is dedicated to the grafana and
prometheus stack.
The ceph-dashboard needs to executed where the ceph-mgr is running. It
is either on the dedicated mgr nodes or if mgr and mon are collocated
implicitly on the mon nodes.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-dashboard: Add run_once on delegate tasks
Dimitri Savineau [Fri, 2 Aug 2019 14:58:11 +0000 (10:58 -0400)]
ceph-dashboard: Add run_once on delegate tasks

Because we need to execute commands from a monitor node (the first one
in the mons list) we are using delegate_to option.
If there's multiple nodes running the ceph-dashboard role then the
delegated task will be executed multiple times.
Also remove a mgr config-key option not present for nautilus+ releases.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoonly support openSUSE Leap 15.x, fail on 42.x
Johannes Kastl [Sat, 27 Jul 2019 14:09:26 +0000 (16:09 +0200)]
only support openSUSE Leap 15.x, fail on 42.x

openSUSE switched from 'openSUSE 13.x' to 'openSUSE Leap 42.x' and then to
'openSUSE Leap 15.x' to align with SLES15 development.
The previous logic did not correctly allow the current release, as 15.x matched
the 'less than 42.3' condition.

For now only support openSUSE Leap 15.x, and extend support once 16.x is
released (or whatever the exact version will be)

Signed-off-by: Johannes Kastl <kastl@b1-systems.de>
6 years agoceph-infra: Apply firewall rules with container
Dimitri Savineau [Wed, 31 Jul 2019 18:02:41 +0000 (14:02 -0400)]
ceph-infra: Apply firewall rules with container

We don't have a reason to not apply firewall rules on the host when
using a containerized deployment.
The TripleO environments already manage the ceph firewall rules outside
ceph-ansible and set the configure_firewall variable to false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1733251
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-grafana: Set grafana uid/gid on files
Dimitri Savineau [Tue, 30 Jul 2019 20:09:47 +0000 (16:09 -0400)]
ceph-grafana: Set grafana uid/gid on files

We don't need to create a grafana system user (in fact we even don't
set the righ uid to this user) because we're using a container setup.
Instead we just need to be sure to set the owner/group to 472 (grafana
user/group from the container) like we do for ceph/167.
We don't need to set the user/group recursively on /etc/grafana
directory in a dedicated task.
Also on Ubuntu system, the ceph-grafana-dashboards isn't present so on
non containerized deployment we won't have the
/etc/grafana/dashboards/ceph-dashboard directory present (coming with
the package) so we need to be sure it exists.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: do not deploy on Debian based OS/non-containerized
Guillaume Abrioux [Wed, 31 Jul 2019 14:15:43 +0000 (16:15 +0200)]
dashboard: do not deploy on Debian based OS/non-containerized

in non-containerized deployment, we can't deploy dashboard on Debian
based distribution since the package `ceph-grafana-dashboards` isn't
available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocs: Correct weird wording
Theo Ouzhinski [Thu, 1 Aug 2019 00:25:54 +0000 (20:25 -0400)]
docs: Correct weird wording

for the Ceph master branch.

Signed-off-by: Theo Ouzhinski touzhinski@gmail.com
6 years agotests/shrink_rgw: Disable dashboard
Dimitri Savineau [Wed, 31 Jul 2019 18:17:48 +0000 (14:17 -0400)]
tests/shrink_rgw: Disable dashboard

The shrink_rgw scenario has been merge just after the PR about enable
ceph dashboard by default.
So right now the shrink_rgw scenrio doesn't have nodes in the grafana
group and fails.
We just need to set dashboard_enabled to false.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: add more memory in podman job
Guillaume Abrioux [Tue, 30 Jul 2019 11:04:15 +0000 (13:04 +0200)]
tests: add more memory in podman job

Typical error :

```
fatal: [mon1 -> mon0]: FAILED! => changed=true
  cmd:
  - podman
  - exec
  - ceph-mon-mon0
  - ceph
  - config
  - set
  - mgr
  - mgr/dashboard/ssl
  - 'false'
  delta: '0:00:00.644870'
  end: '2019-07-30 10:17:32.715639'
  msg: non-zero return code
  rc: 1
  start: '2019-07-30 10:17:32.070769'
  stderr: |-
    Traceback (most recent call last):
      File "/usr/bin/ceph", line 140, in <module>
        import rados
    ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
    Error: exit status 1
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

Let's add more memory to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: deploy dashboard on mons
Guillaume Abrioux [Tue, 30 Jul 2019 09:25:59 +0000 (11:25 +0200)]
tests: deploy dashboard on mons

there's no dedicated nodes for mgr, let's use monitor nodes.
The mgr0 instance spawned isn't used, so if this node is part of the
inventory for this scenario, testinfra will complain because there's no
ceph.conf on this node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: fix timeout usage on rgw user creation command
Guillaume Abrioux [Tue, 30 Jul 2019 08:16:23 +0000 (10:16 +0200)]
dashboard: fix timeout usage on rgw user creation command

For some reason, this is making the playbook failing like following:

```
TASK [ceph-dashboard : create radosgw system user] ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/guits/ceph-ansible/roles/ceph-dashboard/tasks/configure_dashboard.yml:106
Tuesday 30 July 2019  10:04:54 +0200 (0:00:01.910)       0:11:22.319 **********
FAILED - RETRYING: create radosgw system user (3 retries left).
FAILED - RETRYING: create radosgw system user (2 retries left).
FAILED - RETRYING: create radosgw system user (1 retries left).
fatal: [mgr0 -> mon0]: FAILED! => changed=true
  attempts: 3
  cmd: timeout 20 podman exec ceph-mon-mon0 radosgw-admin user create --uid=ceph-dashboard --display-name='Ceph dashboard' --system
  delta: '0:00:20.021973'
  end: '2019-07-30 08:06:32.656066'
  msg: non-zero return code
  rc: 124
  start: '2019-07-30 08:06:12.634093'
  stderr: 'exec failed: container_linux.go:336: starting container process caused "process_linux.go:82: copying bootstrap data to pipe caused \"write init-p: broken pipe\""'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
```

using `timeout -f -s KILL` fixes this issue.

Also, there is no need to use `shell` module here, let's switch to
`command`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests/functional: add a test for shrink-rgw.yml
Rishabh Dave [Wed, 26 Jun 2019 06:09:45 +0000 (11:39 +0530)]
tests/functional: add a test for shrink-rgw.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and RGW and then runs shrink-rgw.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook the remove rgw from a given node
Rishabh Dave [Wed, 26 Jun 2019 05:59:50 +0000 (11:29 +0530)]
add a playbook the remove rgw from a given node

Add a playbook named shrink-rgw.yml to infrastructure-playbooks/ that
can remove a RGW from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoosd: add 'osd blacklist' cap for osp keyrings
Guillaume Abrioux [Mon, 15 Jul 2019 07:57:06 +0000 (09:57 +0200)]
osd: add 'osd blacklist' cap for osp keyrings

This commits adds the `osd blacklist` cap on all OSP clients keyrings.

Fixes: #2296
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd: check container engine rc for pools
Dimitri Savineau [Mon, 22 Jul 2019 20:58:40 +0000 (16:58 -0400)]
ceph-osd: check container engine rc for pools

When creating OpenStack pools, we only check if the return code from
the pool list command isn't 0 (ie: if it doesn't exist). In that case,
the return code will be 2. That's why the next condition is rc != 0 for
the pool creation.
But in containerized deployment, the return code could be different if
there's a failure on the container engine command (like container not
running). In that case, the return code could but either 1 (docker) or
125 (podman) so we should fail at this point and not in the next tasks.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1732157

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: test dashboard deployment with podman scenario
Guillaume Abrioux [Mon, 29 Jul 2019 12:08:40 +0000 (14:08 +0200)]
tests: test dashboard deployment with podman scenario

This commit adds a grafana-server section in order to test dashboard
deployment with podman.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: add checks for grafana-server group definition
Guillaume Abrioux [Mon, 29 Jul 2019 08:03:48 +0000 (10:03 +0200)]
validate: add checks for grafana-server group definition

this commit adds two checks:
- check that the `[grafana-server]` group is defined
- check that the `[grafana-server]` contains at least one node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: fix a typo
Guillaume Abrioux [Fri, 26 Jul 2019 15:33:07 +0000 (17:33 +0200)]
mgr: fix a typo

this tasks isn't using the right container_exec_cmd, that's delegating
to the wrong node.
Let's use the right fact to fix this command.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: remove cfg80211 module installation
Guillaume Abrioux [Fri, 26 Jul 2019 15:23:19 +0000 (17:23 +0200)]
dashboard: remove cfg80211 module installation

According to this comment [1], this seems to be needed to detect wifi
devices.

In node exporter we can see this:

```
--collector.wifi          Enable the wifi collector (default: disabled).
```

since it's enabled by default and we don't even change this in our
systemd templates for node-exporter, we can easily assume in the end
it's not needed. Therefore, let's remove this.

[1] https://github.com/ceph/ceph-ansible/commit/dbf81b6b5bd6d7e977706a93f9e75b38efe32305#diff-961545214e21efed3b84a9e178927a08L21-L23

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: use dedicated group only
Guillaume Abrioux [Thu, 25 Jul 2019 17:08:22 +0000 (19:08 +0200)]
dashboard: use dedicated group only

There's no need to add complexity and trying to fallback on other group.
Let's deploy dashboard on all nodes present in grafana-server group.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodashboard: move code into a dedicated playbook
Dimitri Savineau [Fri, 28 Jun 2019 14:39:38 +0000 (10:39 -0400)]
dashboard: move code into a dedicated playbook

Move dashboard, grafana/prometheus and node-exporter plays into a
dedicated playbook in infrastructure-playbook directory.
To avoid using 'dashboard_enabled | bool' condition multiple time
in the main playbook we can just import the dashboard playbook or
not.
This patch also allows to use an unique dashboard playbook for
both baremetal and container playbooks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: enable dashboard by default
Guillaume Abrioux [Thu, 25 Jul 2019 12:29:11 +0000 (14:29 +0200)]
dashboard: enable dashboard by default

This commit enables dashboard deployment by default.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1726739
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoRemove NBSP characters
Dimitri Savineau [Thu, 18 Jul 2019 18:57:46 +0000 (14:57 -0400)]
Remove NBSP characters

Some NBSP are still present in the yaml files.
Adding a test in travis CI.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoinfra-playbooks: rewite a condition for better readability
Rishabh Dave [Thu, 25 Jul 2019 11:32:32 +0000 (17:02 +0530)]
infra-playbooks: rewite a condition for better readability

Use facility built-in in Ansible to check whether a command was executed
successfully rather looking at its return value.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agocontainer: rename docker directories
Guillaume Abrioux [Wed, 24 Jul 2019 08:44:34 +0000 (10:44 +0200)]
container: rename docker directories

Those 2 directories should be renamed to be more generic (docker vs.
podman).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: disable nfs-ganesha deployment
Guillaume Abrioux [Tue, 23 Jul 2019 07:50:51 +0000 (09:50 +0200)]
tests: disable nfs-ganesha deployment

nfs-ganesha repositories @ dev are broken, this commit disables the
nfs-ganesha deployment so the CI isn't stuck.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoAvoid to setup provisioners in a fully containerized environment
fmount [Fri, 12 Jul 2019 09:03:57 +0000 (11:03 +0200)]
Avoid to setup provisioners in a fully containerized environment

This commit adds a when clause to avoid the setup of grafana
provisioners in a fully containerized scenario.
This is needed when the ceph-grafana-dashboards package is not
installed and this task could result in a wrong grafana
configuration that let the container crash.

Signed-off-by: fmount <fpantano@redhat.com>
6 years agoFix backward compat with old cephfs_pools format
Giulio Fidente [Fri, 19 Jul 2019 08:58:49 +0000 (10:58 +0200)]
Fix backward compat with old cephfs_pools format

Previously cephfs_pools items used to have a pgs: key but not
pgp_num: nor pg_num:

Signed-off-by: Giulio Fidente <gfidente@redhat.com>
6 years agohandler: fix bug in osd handlers
Guillaume Abrioux [Thu, 18 Jul 2019 12:06:23 +0000 (14:06 +0200)]
handler: fix bug in osd handlers

fbf4ed42aee8fa5fd18c4c289cbb80ffeda8f72e introduced a bug when
container binary is podman.
podman doesn't support ps -f using regular expression, the container id
is never set in the restart script causing the handler to fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1721536
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agolibrary/ceph_volume.py: remove six dependency
Dimitri Savineau [Fri, 12 Jul 2019 14:19:48 +0000 (10:19 -0400)]
library/ceph_volume.py: remove six dependency

The ceph nodes couldn't have the python six library installed which
could lead to error during the ceph_volume custom module execution.

  ImportError: No module named six

The six library isn't useful in this module if we're sure that all
action variables passed to the build_ceph_volume_cmd function are a
list and not a string.

Resolves: #4071

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agovalidate: fail if gpt header found on unprepared devices
Guillaume Abrioux [Wed, 17 Jul 2019 13:55:12 +0000 (15:55 +0200)]
validate: fail if gpt header found on unprepared devices

ceph-volume will complain if gpt headers are found on devices.
This commit checks whether a gpt header is present on devices passed in
`devices` variable and fail early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1730541
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-dashboard: enable rgw options conditionally
Dimitri Savineau [Thu, 11 Jul 2019 14:38:44 +0000 (10:38 -0400)]
ceph-dashboard: enable rgw options conditionally

The dashboard rgw frontend options only need to be applied when there's
some nodes present in the rgw ansible group.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests/dashboard: use the dedicated grafana node
Dimitri Savineau [Thu, 11 Jul 2019 14:06:09 +0000 (10:06 -0400)]
tests/dashboard: use the dedicated grafana node

The Vagrant dashboard scenario creates a dedicated grafana node but
was not use in the ansible inventory.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: use variables for port value
Dimitri Savineau [Wed, 10 Jul 2019 21:15:45 +0000 (17:15 -0400)]
dashboard: use variables for port value

The current port value for alertmanager, grafana, node-exporter and
prometheus is hardcoded in the roles so it's not possible to change the
port binding of those services.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: remove useless setting
Guillaume Abrioux [Mon, 15 Jul 2019 08:57:46 +0000 (10:57 +0200)]
tests: remove useless setting

this setting is not needed here since we explicitely set it for
container and non container context.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-rbdmirror: check if rbdmirror is well removed from cluster 4238/head
Guillaume Abrioux [Thu, 11 Jul 2019 09:26:50 +0000 (11:26 +0200)]
shrink-rbdmirror: check if rbdmirror is well removed from cluster

This commits adds a check to ensure the daemon has been removed from the
cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests/functional: add a test for shrink-rbdmirror.yml
Rishabh Dave [Tue, 25 Jun 2019 13:30:53 +0000 (19:00 +0530)]
tests/functional: add a test for shrink-rbdmirror.yml

Add a new functional test that deploys Ceph cluster with three nodes for
MON, OSD and RBD Mirror and, then, runs shrink-rbdmirror.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes rbd-mirror from a node
Rishabh Dave [Tue, 25 Jun 2019 13:08:17 +0000 (18:38 +0530)]
add a playbook that removes rbd-mirror from a node

Add a playbook named "shrink-rbdmirror.yml" in infrastructure-playbooks/
that removes a RBD Mirror from a node in an already deployed Ceph
cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-infra: update handler with daemon variable
Dimitri Savineau [Wed, 10 Jul 2019 18:58:58 +0000 (14:58 -0400)]
ceph-infra: update handler with daemon variable

Both ntp and chrony daemon use variable for the service name because it
could be different depending on the GNU/Linux distribution.
This has been update in 9d88d3199 for chrony but only for the start part
not for the handler.
The commit fixes this for both ntp and chrony.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-infra: Open prometheus port
Dimitri Savineau [Wed, 10 Jul 2019 21:19:29 +0000 (17:19 -0400)]
ceph-infra: Open prometheus port

The Prometheus porrt 9090 isn't open in the firewall configuration.
Also the dashboard task on the grafana node was not required because
it's already present on the mgr node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agohandler: remove legacy condition
Guillaume Abrioux [Tue, 9 Jul 2019 14:03:26 +0000 (16:03 +0200)]
handler: remove legacy condition

since everything is already in a block with the same condition, it's not
needed to leave all of them on these tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: improve message printed in check_devices.yml
Guillaume Abrioux [Wed, 10 Jul 2019 13:08:39 +0000 (15:08 +0200)]
validate: improve message printed in check_devices.yml

The message prints the whole content of the registered variable in the
playbook, this is not needed and makes the message pretty unclear and
unreadable.

```
"msg": "{'_ansible_parsed': True, 'changed': False, '_ansible_no_log': False, u'err': u'Error: Could not stat device /dev/sdf - No such file or directory.\\n', 'item': u'/dev/sdf', '_ansible_item_result': True, u'failed': False, '_ansible_item_label': u'/dev/sdf', u'msg': u\"Error while getting device information with parted script: '/sbin/parted -s -m /dev/sdf -- unit 'MiB' print'\", u'rc': 1, u'invocation': {u'module_args': {u'part_start': u'0%', u'part_end': u'100%', u'name': None, u'align': u'optimal', u'number': None, u'label': u'msdos', u'state': u'info', u'part_type': u'primary', u'flags': None, u'device': u'/dev/sdf', u'unit': u'MiB'}}, 'failed_when_result': False, '_ansible_ignore_errors': None, u'out': u''} is not a block special file!"
```

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1719023
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-iscsi: Update gateway config/template
Dimitri Savineau [Mon, 8 Jul 2019 18:36:07 +0000 (14:36 -0400)]
ceph-iscsi: Update gateway config/template

- Remove gateway_keyring from the configuration file because it's
not used in ceph-iscsi 3.x release.
- Use config_template instead of template module for iscsi-gateway
configuration file. Because the file is an ini file and we might want
to override more parameters than those present in ceph-ansible.
- Because we can now set the pool name in the configuration, we should
use a variable for that. This is refact with the iscsi_pool_* variables
also used to configure the pool size.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-dashboard: remove bool filter for rgw vars
Dimitri Savineau [Mon, 8 Jul 2019 13:43:13 +0000 (09:43 -0400)]
ceph-dashboard: remove bool filter for rgw vars

Some dashboard_rgw_api_* variables are using the bool filter but those
variables are strings with an empty string as default value.
So we should test the variable against an empty string instead of a
bool.

dashboard_rgw_api_host: ''
dashboard_rgw_api_port: ''
dashboard_rgw_api_scheme: ''
dashboard_rgw_api_admin_resource: ''

Resolves: #4179

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodashboard: Use upstream default port
Boris Ranto [Tue, 9 Jul 2019 20:32:38 +0000 (22:32 +0200)]
dashboard: Use upstream default port

We are currently using incorrect dashboard default port. The upstream
uses 8443 instead of 8234 by default. This should get us closer to the
upstream project.

Signed-off-by: Boris Ranto <branto@redhat.com>
6 years agotests/functional: add a test for shrink-mgr.yml
Rishabh Dave [Thu, 20 Jun 2019 10:23:22 +0000 (15:53 +0530)]
tests/functional: add a test for shrink-mgr.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MGR and then runs shrink-mgr.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes manager from a node
Rishabh Dave [Wed, 12 Jun 2019 04:46:00 +0000 (10:16 +0530)]
add a playbook that removes manager from a node

Add a playbook, named "shrink-mgr.yml", in infrastructure-playbooks/
that removes a MGR from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-handler: fix cluster name in socket path
Dimitri Savineau [Wed, 3 Jul 2019 15:26:42 +0000 (11:26 -0400)]
ceph-handler: fix cluster name in socket path

c90f605b5 introduces the default ceph cluster name value in the rgw
socket path for the rgw restart script. But this should use the
`cluster` variable instead.
This commit also fixes this in the osd restart script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoshrink-mds: refact post tasks
Guillaume Abrioux [Wed, 3 Jul 2019 08:45:46 +0000 (10:45 +0200)]
shrink-mds: refact post tasks

This commit refacts the way we check the "mds_to_kill" node is well
stopped.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotests/functional: add a test for shrink-mds.yml
Rishabh Dave [Fri, 10 May 2019 16:10:07 +0000 (21:40 +0530)]
tests/functional: add a test for shrink-mds.yml

Add a new functional test that deploys a Ceph cluster with three nodes
for MON, OSD and MDS and then runs shrink-mds.yml to test it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd a playbook that removes mds from a node
Rishabh Dave [Thu, 20 Jun 2019 09:59:25 +0000 (15:29 +0530)]
add a playbook that removes mds from a node

Add a playbook, named "shrink-mds.yml", in infrastructure-playbooks/
that removes a MDS from a node in an already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoAdd package-install tag on ceph-grafana-dashboard pkg install.
fmount [Thu, 4 Jul 2019 09:43:39 +0000 (11:43 +0200)]
Add package-install tag on ceph-grafana-dashboard pkg install.

According to the OSP pattern, we need the package-install tag
to control what is installed on the host. This commit just add
the missing tag to meet the TripleO requirements.

See: /issues/4197 for details

Fixes: #4197
Signed-off-by: fmount <fpantano@redhat.com>
6 years agoceph-iscsi-gw: Update log directories bind mount
Dimitri Savineau [Thu, 4 Jul 2019 14:10:00 +0000 (10:10 -0400)]
ceph-iscsi-gw: Update log directories bind mount

On containerized deployment we need to bind mount the ceph-iscsi
directory to avoid writing the logs in the container.
The /var/log/ceph directory isn't use by rbd-targe-api/gw services
because they have their own log directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>