]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agomon: force peer addition
Sébastien Han [Tue, 8 Jan 2019 17:36:14 +0000 (18:36 +0100)]
mon: force peer addition

Somewhat something changed with the introduction of msg2 and we have to
add each node as a peer so the monitors can form a quorum. This might be
due to our CI environment, although adding this is completly harmless
and solves monitors not being able to form quorum.

It seems that the initial monitor map wasn't containing the right
information about the peers (addresses like 0.0.0.0/0r1, for each rank.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotestinfra/osds double the amount of ports OSDs listen to
Alfredo Deza [Mon, 7 Jan 2019 17:59:43 +0000 (12:59 -0500)]
testinfra/osds double the amount of ports OSDs listen to

Since msgr2 changes got merged, the OSDs in master (to be nautilus) will
double the amount of ports they listen to.

Signed-off-by: Alfredo Deza <adeza@redhat.com>
6 years agonfs-ganesha: fixed nfs_ganesha_dev_apt_repo variable
Bruceforce [Thu, 3 Jan 2019 15:08:58 +0000 (16:08 +0100)]
nfs-ganesha: fixed nfs_ganesha_dev_apt_repo variable
The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"

Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>
6 years agorgw: do not create mandatory directories
Sébastien Han [Wed, 12 Dec 2018 10:35:02 +0000 (11:35 +0100)]
rgw: do not create mandatory directories

The packages are responsible for this, currently tracked in ceph https://github.com/ceph/ceph/pull/25503

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agorbd-mirror: copy bootstrap key after package install
Sébastien Han [Mon, 10 Dec 2018 14:30:48 +0000 (15:30 +0100)]
rbd-mirror: copy bootstrap key after package install

If we don't copy the key after the package install the directory /var/lib/ceph/bootstrap-rbd-mirror
will not exist and the copy will fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoconfig: only pre-create ceph dirs on containers
Sébastien Han [Mon, 10 Dec 2018 13:43:30 +0000 (14:43 +0100)]
config: only pre-create ceph dirs on containers

We don't need to create the directories on non-containers, they are
created by the packages.

Closes: https://github.com/ceph/ceph-ansible/issues/3430
Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph-infra: disable unrequired NTP services
Rishabh Dave [Wed, 12 Dec 2018 11:23:23 +0000 (16:53 +0530)]
ceph-infra: disable unrequired NTP services

When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-infra: merge ntp_debian.yml and ntp_rpm.yml
Rishabh Dave [Wed, 12 Dec 2018 11:15:00 +0000 (16:45 +0530)]
ceph-infra: merge ntp_debian.yml and ntp_rpm.yml

Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agocopy certificates as root user
Rishabh Dave [Wed, 2 Jan 2019 11:34:49 +0000 (17:04 +0530)]
copy certificates as root user

Since the current user on the controller node, might not have the
permission to read the TLS certificate and related files, copy these
files to the Ceph nodes as root user.

Fixes: https://github.com/ceph/ceph-ansible/issues/3465
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoupdate: do not enforce `serial: 1` on client nodes
Guillaume Abrioux [Wed, 2 Jan 2019 15:53:06 +0000 (16:53 +0100)]
update: do not enforce `serial: 1` on client nodes

There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoVagrantfile: remove useless default values
Guillaume Abrioux [Wed, 2 Jan 2019 13:46:54 +0000 (14:46 +0100)]
Vagrantfile: remove useless default values

Those default values are useless and might cause issues.

- `osd_scenario` should be mandatory anyway.
- `pool_default_size` is not used anymore (this has been refactored
recently.

Closes: #3468
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoadd 'custom' as valid ceph_repository value
Justin Riley [Fri, 21 Dec 2018 02:33:05 +0000 (21:33 -0500)]
add 'custom' as valid ceph_repository value

This is documented as valid:

https://github.com/ceph/ceph-ansible/blob/561746f75e3913b30e6ae3f14768ebc8a516bf66/group_vars/all.yml.sample#L245

Signed-off-by: Justin Riley <justin.t.riley@gmail.com>
6 years agoceph_key: if initial keys are missing, report which ones
Dan Mick [Tue, 18 Dec 2018 00:43:35 +0000 (16:43 -0800)]
ceph_key: if initial keys are missing, report which ones

Fixes: #3461
Signed-off-by: Dan Mick <dan.mick@redhat.com>
6 years agodocument missing support for non-containerized deployment
Kai Wembacher [Thu, 20 Dec 2018 23:35:53 +0000 (00:35 +0100)]
document missing support for non-containerized deployment

Signed-off-by: Kai Wembacher <kai@ktwe.de>
6 years agoCONTRIBUTING: add more guiline for backport
Sébastien Han [Thu, 20 Dec 2018 14:25:45 +0000 (15:25 +0100)]
CONTRIBUTING: add more guiline for backport

Clarify the backport process.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoClarify RGWs configuration when using ceph_conf_overrides.
jtudelag [Mon, 26 Feb 2018 08:49:57 +0000 (09:49 +0100)]
Clarify RGWs configuration when using ceph_conf_overrides.

To avoid future misconfigurations, clarify that the only valid
scheme is [client.rgw.*] instead of [client.radosgw.*].

6 years agoExample ceph_add_users_buckets playbook
Daniel-Pivonka [Thu, 29 Nov 2018 19:04:04 +0000 (14:04 -0500)]
Example ceph_add_users_buckets playbook

This is example playbook will show how to bulk add rgw users and buckets

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
6 years agorgw users/buckets module
Daniel-Pivonka [Thu, 29 Nov 2018 15:17:40 +0000 (10:17 -0500)]
rgw users/buckets module

ansible module to bulk create rgw users and buckets

Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com>
6 years agoadd support for rocksdb and wal on the same partition in non-collocated
Kai Wembacher [Thu, 13 Dec 2018 07:42:49 +0000 (08:42 +0100)]
add support for rocksdb and wal on the same partition in non-collocated

Signed-off-by: Kai Wembacher <kai@ktwe.de>
6 years agoset any_errors_fatal to true for all host sections
Rishabh Dave [Mon, 17 Dec 2018 10:34:46 +0000 (16:04 +0530)]
set any_errors_fatal to true for all host sections

Add `any_errors_fatal: true` to all host sections in `site.yml.sample`
and `site-container.yml.sample` so that the playbook execution
ceases spontaneously and instantaneously when errors occurs.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agolint: Remote package tasks should have a retry
Sébastien Han [Thu, 20 Dec 2018 09:00:26 +0000 (10:00 +0100)]
lint: Remote package tasks should have a retry

Make linter happy and add more robustness to remote tasks by retrying 3
times (the default) before failing.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoretry on packages and repositories failures
Guillaume Abrioux [Wed, 19 Dec 2018 13:55:01 +0000 (14:55 +0100)]
retry on packages and repositories failures

add register/until on all packaging related tasks to avoid non valid CI
failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoplaybook: report storage device inventory
Noah Watkins [Wed, 5 Dec 2018 23:15:02 +0000 (15:15 -0800)]
playbook: report storage device inventory

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agoceph-volume: add support for inventory command
Noah Watkins [Wed, 5 Dec 2018 23:14:08 +0000 (15:14 -0800)]
ceph-volume: add support for inventory command

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agotest: use yaml stdout callback
Sébastien Han [Fri, 14 Dec 2018 10:14:30 +0000 (11:14 +0100)]
test: use yaml stdout callback

This provides a much more readable output for tasks. It'll be easier to
debug traces.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: remove ceph aliases for containers
Sébastien Han [Thu, 13 Dec 2018 10:38:23 +0000 (11:38 +0100)]
mon: remove ceph aliases for containers

These aliases have led to several issues making believe that ceph
binaries are actually present on the host when running the command.
However it wasn't explicit that the commands were only ran inside a
container.
It has brought to much confusion so we decided to remove them.

Closes: https://github.com/ceph/ceph-ansible/issues/3445
Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-cluster: skip tasks that use ceph-volume if it's not installed
Andrew Schoen [Tue, 11 Dec 2018 16:52:26 +0000 (10:52 -0600)]
purge-cluster: skip tasks that use ceph-volume if it's not installed

This will allow the playbook to be idempotent.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agopurge-container: move facts gathering after ceph-defaults role import
Guillaume Abrioux [Wed, 12 Dec 2018 15:34:14 +0000 (16:34 +0100)]
purge-container: move facts gathering after ceph-defaults role import

This task has to be called after the role `ceph-defaults` has been
played, otherwise, `mon_group_name` will never be known.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agopurge-container: fix wrong syntax
Guillaume Abrioux [Wed, 12 Dec 2018 08:53:32 +0000 (09:53 +0100)]
purge-container: fix wrong syntax

we want a default value for `mon_group_name`, not for
`groups[mon_group_name]`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agointroduce new role ceph-facts
Guillaume Abrioux [Mon, 10 Dec 2018 14:46:32 +0000 (15:46 +0100)]
introduce new role ceph-facts

sometimes we play the whole role `ceph-defaults` just to access the
default value of some variables. It means we play the `facts.yml` part
in this role while it's not desired. Splitting this role will speedup
the playbook.

Closes: #3282
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agopurge-docker: do not call ceph-osd role
Guillaume Abrioux [Mon, 10 Dec 2018 20:43:35 +0000 (21:43 +0100)]
purge-docker: do not call ceph-osd role

calling ceph-osd role in purge playbook is not needed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agometa: set the right minimum ansible version required for galaxy
Guillaume Abrioux [Mon, 10 Dec 2018 14:49:23 +0000 (15:49 +0100)]
meta: set the right minimum ansible version required for galaxy

ceph-ansible@master requires the latest stable ansible version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agopurge: gather monitors facts in OSD purge
Guillaume Abrioux [Wed, 5 Dec 2018 08:06:53 +0000 (09:06 +0100)]
purge: gather monitors facts in OSD purge

the OSD part of the purge delegates commands on monitor node, we need to
gather monitors facts to know the `ansible_hostname` fact that is used
in the `docker_exec_cmd` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agopurge-container: gather fact before calling ceph-defaults
Sébastien Han [Tue, 4 Dec 2018 08:22:34 +0000 (09:22 +0100)]
purge-container: gather fact before calling ceph-defaults

ceph-defaults relies on facts so we must gather facts before running it.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge: tox add lvm-setup
Sébastien Han [Tue, 4 Dec 2018 08:21:51 +0000 (09:21 +0100)]
purge: tox add lvm-setup

Since we deploy > purge > deploy the LVs are gone so we much recreate
them.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-cluster: add support for mon/mgr collocation
Sébastien Han [Mon, 3 Dec 2018 21:59:17 +0000 (22:59 +0100)]
purge-cluster: add support for mon/mgr collocation

Recently we introduced the default collocation of mon/mgr without the
need of a dedicated mgrs section. This means we have to stop the mgr
process on that machine too.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-cluster: remove support for other init system
Sébastien Han [Mon, 3 Dec 2018 21:58:19 +0000 (22:58 +0100)]
purge-cluster: remove support for other init system

We only support systemd and use the service module anyway.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-docker-cluster: add support for mgr/mon collocation
Sébastien Han [Mon, 3 Dec 2018 21:46:52 +0000 (22:46 +0100)]
purge-docker-cluster: add support for mgr/mon collocation

Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-docker-cluste: add a task to check hosts
Sébastien Han [Mon, 3 Dec 2018 15:46:38 +0000 (16:46 +0100)]
purge-docker-cluste: add a task to check hosts

It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.

We fail if we see containers running.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agopurge-docker-cluster: add ceph-volume support
Sébastien Han [Thu, 4 Oct 2018 15:40:25 +0000 (17:40 +0200)]
purge-docker-cluster: add ceph-volume support

This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_keys: pass in module for error messages
Noah Watkins [Thu, 6 Dec 2018 18:34:49 +0000 (10:34 -0800)]
ceph_keys: pass in module for error messages

fixes: #3421

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agorgw multisite: update documentation
Ali Maredia [Fri, 7 Dec 2018 19:15:39 +0000 (14:15 -0500)]
rgw multisite: update documentation

Signed-off-by: Ali Maredia <amaredia@redhat.com>
6 years agoceph-defaults: do not use podman only on atomic
Sébastien Han [Fri, 7 Dec 2018 11:20:27 +0000 (12:20 +0100)]
ceph-defaults: do not use podman only on atomic

We want to test podman on f29 non-atomic, atomic is not a hard
requirement. However, if you want to get podman then you will have to
install it first before running the playbook.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomgr: little refact
Sébastien Han [Thu, 6 Dec 2018 12:58:37 +0000 (13:58 +0100)]
mgr: little refact

This commit removes the default module, so ceph-ansible does not enable
any manager module.
To enable a module you need to set a value to 'ceph_mgr_modules', you
can pass a list of modules like this:

ceph_mgr_modules:
  - status
  - dashboard

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agostart_osds: use list instead of keys (re-introduce)
Noah Watkins [Wed, 5 Dec 2018 22:04:48 +0000 (14:04 -0800)]
start_osds: use list instead of keys (re-introduce)

the python3 fix merged by:

  https://github.com/ceph/ceph-ansible/pull/3346

was reintroduced a few days later by:

  https://github.com/ceph/ceph-ansible/commit/82a6b5adec4d72eb4b7219147f2225b7b2904460

and this patch fixes it again :)

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agouse pre_tasks and post_tasks when necessary
Rishabh Dave [Mon, 12 Nov 2018 12:21:26 +0000 (17:51 +0530)]
use pre_tasks and post_tasks when necessary

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agodon't use private option for import_role
Rishabh Dave [Mon, 12 Nov 2018 11:10:40 +0000 (16:40 +0530)]
don't use private option for import_role

Since sharing variables amongst roles has been made default since
Ansible 2.6, private option has been deprecated; so stop using it.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotox: add missing ceph_docker_image_tag in shrink
Sébastien Han [Tue, 4 Dec 2018 17:41:36 +0000 (18:41 +0100)]
tox: add missing ceph_docker_image_tag in shrink

When calling shrink on containerized deployment, we were first doing the
setup with `latest-master` and then when calling the playbook we were
using the default value for `ceph_docker_image_tag` that comes from
ceph-defaults. Now we pass
`ceph_docker_image_tag={env:CEPH_DOCKER_IMAGE_TAG:latest-master}` so the
play will run the right container image.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agovalidate: expand all jinja2 templates before validating them
Rishabh Dave [Tue, 13 Nov 2018 16:22:44 +0000 (21:52 +0530)]
validate: expand all jinja2 templates before validating them

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotravis: remove sudo: required
Christian Berendt [Tue, 4 Dec 2018 15:58:25 +0000 (16:58 +0100)]
travis: remove sudo: required

The sudo keyword will be fully deprecated.

https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration
Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
6 years agorevert infra: don't restart firewalld if unit is masked
Guillaume Abrioux [Fri, 30 Nov 2018 16:12:21 +0000 (17:12 +0100)]
revert infra: don't restart firewalld if unit is masked

If firewalld unit is masked, setting `configure_firewall: false` is
enough

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: fail if less than 3 MONs
Ramana Raja [Mon, 3 Dec 2018 14:25:42 +0000 (19:55 +0530)]
rolling_update: fail if less than 3 MONs

... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470
Signed-off-by: Ramana Raja <rraja@redhat.com>
6 years agotest: disable nfs for containers
Sébastien Han [Tue, 4 Dec 2018 09:44:28 +0000 (10:44 +0100)]
test: disable nfs for containers

Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agofix json data type
Sébastien Han [Tue, 4 Dec 2018 08:59:47 +0000 (09:59 +0100)]
fix json data type

Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotest: remove leftover [mgrs]
Sébastien Han [Tue, 4 Dec 2018 08:56:53 +0000 (09:56 +0100)]
test: remove leftover [mgrs]

Since we now collocated mgrs and mons on the same machine we have to
remove the mgrs section, they are not needed anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotests: add purge_lvm_osds_container scenario
Guillaume Abrioux [Fri, 30 Nov 2018 10:25:25 +0000 (11:25 +0100)]
tests: add purge_lvm_osds_container scenario

This commits adds the purge_lvm_osds_container scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agopurge: add iscsi support
Guillaume Abrioux [Thu, 29 Nov 2018 16:52:18 +0000 (17:52 +0100)]
purge: add iscsi support

add iscsi support for both non containerized and containerized
deployment in purge playbooks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: manage legacy ceph-disk non-container startup
Sébastien Han [Thu, 29 Nov 2018 13:59:25 +0000 (14:59 +0100)]
osd: manage legacy ceph-disk non-container startup

The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.

Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 452069cb3a2d0ee11552f88924474e3608f7d912)

6 years agoosd: re-introduce disk_list check
Sébastien Han [Wed, 28 Nov 2018 23:10:29 +0000 (00:10 +0100)]
osd: re-introduce disk_list check

This commit
https://github.com/ceph/ceph-ansible/commit/4cc1506303739f13bb7a6e1022646ef90e004c90#diff-51bbe3572e46e3b219ad726da44b64ebL13
accidentally removed this check.

This is a must have for ceph-disk based containerized OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9b5a93e3a58bff07ce965ce2d6dabd4060537b5c)

6 years agoosd: discover osd_objectstore on the fly
Sébastien Han [Fri, 30 Nov 2018 10:20:03 +0000 (11:20 +0100)]
osd: discover osd_objectstore on the fly

Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.

Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph-osd: change jinja condition
Sébastien Han [Tue, 27 Nov 2018 16:50:44 +0000 (17:50 +0100)]
ceph-osd: change jinja condition

If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph-mgr: refact role for containers
Sébastien Han [Mon, 3 Dec 2018 10:15:30 +0000 (11:15 +0100)]
ceph-mgr: refact role for containers

Now we simplify the invocation of start and remove some code and the
directory 'docker'.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: allow setting 'dest' to a file
Sébastien Han [Mon, 3 Dec 2018 10:59:49 +0000 (11:59 +0100)]
ceph_key: allow setting 'dest' to a file

This is useful in situations where you fetch the key from the mon store
and want to write the file with a different name to a dedicated
directory. This is important when fetching the mgr key, they are created
as mgr.ceph-mon2 but we want them in /var/lib/ceph/mgr/ceph-ceph-mon0/keyring

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: do not serialized container bootstrap
Sébastien Han [Fri, 16 Nov 2018 09:50:38 +0000 (10:50 +0100)]
mon: do not serialized container bootstrap

This commit unifies the container and non-container code, which in the
meantime gives use the ability to deploy N mon container at the same
time without having to serialized the deployment. This will drastically
reduces the time needed to bootstrap the cluster.
Note, this is only possible since Nautilus because the monitors are
bootstrap the initial keys on their own once they reach quorum. In the
Nautilus version of the ceph-container mon, we stopped generating the
keys 'manually' from inside the container, for more detail see: https://github.com/ceph/ceph-container/pull/1238

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomgr: only copy keys with dedicated mgr
Sébastien Han [Fri, 26 Oct 2018 12:32:49 +0000 (14:32 +0200)]
mgr: only copy keys with dedicated mgr

When collocating mon and mgr, the mgr container will attempt to create
its own key since it has the admin key at its disposal. Also at this
point there is nothing to fetch since the key is not created by the
mons, as mentionned above the mgr creates the key on its own.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosite: collocated mon and mgr by default
Sébastien Han [Tue, 16 Oct 2018 13:40:35 +0000 (15:40 +0200)]
site: collocated mon and mgr by default

This will speed up the deployment and also deploy mon and mgr collocated
just as recommended.
This won't prevent you of adding more and dedicaded machines for mgr if
needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agodisable nfs scenario
Sébastien Han [Mon, 3 Dec 2018 09:04:38 +0000 (10:04 +0100)]
disable nfs scenario

The packages are broken, so let's remove it, until this solved.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: add missing include_tasks instead of import_tasks
Sébastien Han [Sun, 2 Dec 2018 13:48:19 +0000 (14:48 +0100)]
mon: add missing include_tasks instead of import_tasks

This was probably a leftover/mistake so let's fix this and make the file
consistent.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoconfig: add missing bootstrap mgr directory
Sébastien Han [Mon, 26 Nov 2018 10:07:40 +0000 (11:07 +0100)]
config: add missing bootstrap mgr directory

This directory is needed so we can fetch the bootstrap mgr key in it.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: default ceph_health_raw to json
Sébastien Han [Mon, 26 Nov 2018 10:08:27 +0000 (11:08 +0100)]
mon: default ceph_health_raw to json

During the first iteration, the command won't return anything, or can
simply fail and might not return a valid json structure. Ansible will
fail parsing it in the filter `from_json` so let's default that variable
to empty dictionary.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agocontainer-common: remove old check
Sébastien Han [Mon, 26 Nov 2018 09:56:14 +0000 (10:56 +0100)]
container-common: remove old check

This removes a bit of unnecessary code, the check was always wrong
because of the condition 'not ceph_current_status.get('rc', 1) == 0'
It will never match since `Not` is used for bool and we are checking for
an rc.
Also, even though the check would work, this will be a major blocker for
a complete meltdown. If the whole platform is shutdown then nothing will
be up but files will be present, so this check is definitely wrong.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agorolling-update: remove old condition
Sébastien Han [Fri, 26 Oct 2018 12:13:43 +0000 (14:13 +0200)]
rolling-update: remove old condition

This failure condition was only valid at the time where clusters didn't
have ceph-mgr activated. Now since we collocate the ceph-mgr with the
mon by default, if the daemon wasn't present it will be created during
the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_volume: fix unit tests
Sébastien Han [Tue, 27 Nov 2018 17:38:37 +0000 (18:38 +0100)]
ceph_volume: fix unit tests

Fix the container_binary to use by mocking the CEPH_CONTAINER_BINARY env
variable.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: apply permissions using ansible code module
Sébastien Han [Fri, 16 Nov 2018 09:46:10 +0000 (10:46 +0100)]
ceph_key: apply permissions using ansible code module

Instead of applying file permissions from our code, let's rely on the
ansible code 'file' module for this. This is now handled at the task
declaration level instead of inside the module.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agofw: update rules for mon/mgr collocation
Sébastien Han [Fri, 26 Oct 2018 10:12:20 +0000 (12:12 +0200)]
fw: update rules for mon/mgr collocation

Since we now deploy mgr on mon we need to open fw rules so the mgr can
reach out to the osds.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: remove old ubuntu login status
Sébastien Han [Fri, 16 Nov 2018 09:31:57 +0000 (10:31 +0100)]
mon: remove old ubuntu login status

We don't support Ubuntu Precise, so this feature does not exists
anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosites: fail the playbook on any failure
Sébastien Han [Mon, 26 Nov 2018 10:06:10 +0000 (11:06 +0100)]
sites: fail the playbook on any failure

We need to apply   any_errors_fatal: true to every play so it can take
effect, not only on the initial pass. With this flag, any error in the
playbook will cause the playbook to stop.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosite-container: retry image pull
Sébastien Han [Mon, 26 Nov 2018 10:05:13 +0000 (11:05 +0100)]
site-container: retry image pull

Sometimes pulling an image fails for network hickup, so let's retry 5
times at 5sec interval.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotravis: run modules unit tests
Sébastien Han [Fri, 16 Nov 2018 09:57:14 +0000 (10:57 +0100)]
travis: run modules unit tests

Travis now runs our modules unit tests to make sure they always pass.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: secure cluster on container
Sébastien Han [Fri, 16 Nov 2018 09:29:05 +0000 (10:29 +0100)]
mon: secure cluster on container

Add the ability to protect pools on containerized clusters.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoosd: remove a leftover
Guillaume Abrioux [Sat, 1 Dec 2018 04:46:16 +0000 (05:46 +0100)]
osd: remove a leftover

this file is never included in ceph-osd, looks like a leftover let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove an incorrect information
Guillaume Abrioux [Sat, 1 Dec 2018 04:24:11 +0000 (05:24 +0100)]
osd: remove an incorrect information

This is false, `./defaults/main.yml` is not supposed to be modified
directly. groups_vars a/o host_vars should always be preferred.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoremove kv store support
Guillaume Abrioux [Mon, 26 Nov 2018 13:54:02 +0000 (14:54 +0100)]
remove kv store support

the next stable release will drop this feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: create missing keyring only on running mon
Guillaume Abrioux [Thu, 29 Nov 2018 15:04:28 +0000 (16:04 +0100)]
rolling_update: create missing keyring only on running mon

try to create the potentially missing keys only on monitors that are
actually running.
The current node being played is stopped before this task.
By the way, delegating the command on all nodes but the current node
being played ensures that the generated keys will be present on all
monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoAdd missing space before }}
Christian Berendt [Thu, 29 Nov 2018 13:58:13 +0000 (14:58 +0100)]
Add missing space before }}

This will fix the following yamllint warning:

Variables should have spaces after {{ and before }}

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
6 years agoconfig: write jinja comment with appropriate syntax
Guillaume Abrioux [Thu, 29 Nov 2018 09:16:52 +0000 (10:16 +0100)]
config: write jinja comment with appropriate syntax

jinja comment should be written using the jinja syntax `{# ... #}`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: default ceph json output to empty dict
Sébastien Han [Wed, 28 Nov 2018 23:27:49 +0000 (00:27 +0100)]
rolling_update: default ceph json output to empty dict

So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agovalidate: change default value for `radosgw_address`
Guillaume Abrioux [Wed, 28 Nov 2018 19:53:10 +0000 (20:53 +0100)]
validate: change default value for `radosgw_address`

change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: rgw_multisite allow clusters to talk to each other
Guillaume Abrioux [Wed, 28 Nov 2018 17:46:45 +0000 (18:46 +0100)]
tests: rgw_multisite allow clusters to talk to each other

Adding this rule on the hypervisor will allow cluster to talk to each
other.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: update default pg num and pool size for podman scenario
Guillaume Abrioux [Wed, 28 Nov 2018 10:30:25 +0000 (11:30 +0100)]
tests: update default pg num and pool size for podman scenario

bring the recent refact about `osd_pool_default_pg_num` and
`osd_pool_default_size` into podman scenario as well.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: fix image tag for secondary rgw cluster (rgw_multisite)
Guillaume Abrioux [Tue, 27 Nov 2018 13:50:18 +0000 (14:50 +0100)]
tests: fix image tag for secondary rgw cluster (rgw_multisite)

the first cluster is using `latest-master` while the second is using
`latest` which is not the right version to be used here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: apply dev_setup on the secondary cluster for rgw_multisite
Guillaume Abrioux [Tue, 27 Nov 2018 09:26:41 +0000 (10:26 +0100)]
tests: apply dev_setup on the secondary cluster for rgw_multisite

we must apply this playbook before deploying the secondary cluster.
Otherwise, there will be a mismatch between the two deployed cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: fix mgr keyring error on rolling_update
Guillaume Abrioux [Tue, 27 Nov 2018 12:42:41 +0000 (13:42 +0100)]
mgr: fix mgr keyring error on rolling_update

when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd fix batch with container binary
Sébastien Han [Tue, 27 Nov 2018 09:03:07 +0000 (10:03 +0100)]
ceph-osd fix batch with container binary

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: fix after rebase
Sébastien Han [Tue, 27 Nov 2018 08:59:07 +0000 (09:59 +0100)]
ceph_key: fix after rebase

Fix the tests

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agofix template generation
Sébastien Han [Mon, 26 Nov 2018 16:22:04 +0000 (17:22 +0100)]
fix template generation

Position the right condition on ceph_docker_version, activate it when
the container_binary is 'docker'.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agocontainer-common: remove leftover
Sébastien Han [Mon, 26 Nov 2018 10:52:04 +0000 (11:52 +0100)]
container-common: remove leftover

ntp is installation is managed by the ceph-infra role.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoshrink-osd: add missing CEPH_BINARY
Sébastien Han [Thu, 22 Nov 2018 16:32:25 +0000 (17:32 +0100)]
shrink-osd: add missing CEPH_BINARY

We need to add the right binary to do the docker exec.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agodefaults: play set_radosgw_address.yml only on rgw nodes
Guillaume Abrioux [Thu, 22 Nov 2018 13:38:57 +0000 (14:38 +0100)]
defaults: play set_radosgw_address.yml only on rgw nodes

This is not needed to play these tasks on nodes that are not in rgw
group.

Always playing this code makes `shrink_mon.yml` failing.

Typical error:

```
TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21
Thursday 22 November 2018  12:34:51 +0000 (0:00:00.154)       0:00:12.371 *****
fatal: [localhost]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1'
```

Indeed, `radosgw_interface` is the network interface on rgw only. It is
expected that this same interface doesn't exist on `localhost`, so, when
running `shrink_mon.yml`, the role `ceph-defaults` is called in
`hosts: localhost` and causes the playbook to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: declare container_binary
Sébastien Han [Tue, 20 Nov 2018 21:29:53 +0000 (22:29 +0100)]
defaults: declare container_binary

Always declare container_binary and assign it a correct value.

Signed-off-by: Sébastien Han <seb@redhat.com>