]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agotests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous
Guillaume Abrioux [Mon, 11 Jun 2018 13:40:54 +0000 (15:40 +0200)]
tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous

Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agovalidate: be more explicit with error msg when notario isn't installed
Guillaume Abrioux [Mon, 11 Jun 2018 08:16:26 +0000 (10:16 +0200)]
validate: be more explicit with error msg when notario isn't installed

This error message may be confusing and need to be more explicit on
where you have to install notario, indeed, people may think this library
must be installed on configured nodes while it must be installed on the
node you are running the playbook.

Fixes: #2649
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
Konstantin Shalygin [Fri, 8 Jun 2018 18:03:00 +0000 (01:03 +0700)]
ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.

If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
7 years agoFix to run secure cluster only once in a run
Vishal Kanaujia [Fri, 8 Jun 2018 07:04:49 +0000 (12:34 +0530)]
Fix to run secure cluster only once in a run

The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.

Fixes: #2737
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
7 years agotest: only on containerized iscsi
Sébastien Han [Fri, 8 Jun 2018 10:01:16 +0000 (18:01 +0800)]
test: only on containerized iscsi

We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: keyrings aren't created when single client node
Guillaume Abrioux [Fri, 8 Jun 2018 06:49:37 +0000 (08:49 +0200)]
client: keyrings aren't created when single client node

combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontrib: fix generate group_vars samples
Sébastien Han [Wed, 6 Jun 2018 06:41:46 +0000 (14:41 +0800)]
contrib: fix generate group_vars samples

For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-iscsi: rename group iscsi_gws
Sébastien Han [Wed, 6 Jun 2018 04:07:33 +0000 (12:07 +0800)]
ceph-iscsi: rename group iscsi_gws

Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: add functionnal tests for iscsi
Sébastien Han [Thu, 29 Mar 2018 10:19:29 +0000 (12:19 +0200)]
ci: add functionnal tests for iscsi

We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosite-docker: add iscsi role
Sébastien Han [Fri, 23 Mar 2018 04:06:58 +0000 (12:06 +0800)]
site-docker: add iscsi role

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: add iscsi test
Sébastien Han [Fri, 23 Mar 2018 03:27:35 +0000 (11:27 +0800)]
ci: add iscsi test

Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-iscsi: support for containerize deployment
Sébastien Han [Fri, 23 Mar 2018 03:24:56 +0000 (11:24 +0800)]
ceph-iscsi: support for containerize deployment

We now have the ability to deploy a containerized version of ceph-iscsi.
The result is similar to the non-containerized version, you simply have
3 containers running for the following services:

* rbd-target-api
* rbd-target-gw
* tcmu-runner

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopin version of ansible to 2.4 in requirements.txt
Andrew Schoen [Wed, 23 May 2018 14:10:39 +0000 (09:10 -0500)]
pin version of ansible to 2.4 in requirements.txt

This is the latest version that we support. If we don't pin this we
get a 2.5.x version installed that causes the playbook to fail in
various ways.

Fixes: https://github.com/ceph/ceph-ansible/issues/2631
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: increase ssh timeout and retries in ansible.cfg
Andrew Schoen [Wed, 6 Jun 2018 19:59:47 +0000 (14:59 -0500)]
tests: increase ssh timeout and retries in ansible.cfg

We see quite a few failures in the CI related to testing nodes losing
ssh connection. This modification allows ansible to retry more times and
wait longer before timing out. This seems to really affect testing
scenarios that use a large amount of testing nodes. The centos7_cluster
scenario specifically has 12 nodes and suffered from these failures
often.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: update ooo inventory hostfile
Guillaume Abrioux [Thu, 7 Jun 2018 14:11:39 +0000 (16:11 +0200)]
tests: update ooo inventory hostfile

Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: add a default value for keyring file
Guillaume Abrioux [Thu, 7 Jun 2018 13:49:03 +0000 (15:49 +0200)]
client: add a default value for keyring file

Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: add a dummy value for 'dev' release
Guillaume Abrioux [Tue, 5 Jun 2018 21:42:08 +0000 (23:42 +0200)]
tests: add a dummy value for 'dev' release

Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
>       if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E       KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)

7 years agoceph-common: move firewall checks after package installation
Andrew Schoen [Mon, 4 Jun 2018 15:36:32 +0000 (10:36 -0500)]
ceph-common: move firewall checks after package installation

We need to do this because on dev or rhcs installs ceph_stable_release
is not mandatory and the firewall check tasks have a task that is
conditional based off the installed version of ceph. If we perform those
checks after package install then they will not fail on dev or rhcs
installs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoclient: use dummy created container when there is no mon in inventory
Guillaume Abrioux [Wed, 6 Jun 2018 11:59:26 +0000 (13:59 +0200)]
client: use dummy created container when there is no mon in inventory

the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: improve mds tests
Guillaume Abrioux [Wed, 6 Jun 2018 19:56:38 +0000 (21:56 +0200)]
tests: improve mds tests

the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosd: copy openstack keys over to all mon
Guillaume Abrioux [Wed, 6 Jun 2018 17:13:18 +0000 (19:13 +0200)]
osd: copy openstack keys over to all mon

When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: fix facts gathering delegation
Guillaume Abrioux [Tue, 5 Jun 2018 14:30:12 +0000 (16:30 +0200)]
rolling_update: fix facts gathering delegation

this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agochange max_mds default to 1
Patrick Donnelly [Mon, 4 Jun 2018 19:58:57 +0000 (12:58 -0700)]
change max_mds default to 1

Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d745e4667b5d757433efec9dac0150bc

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
7 years agotests: skip disabling fastest mirror detection on atomic host
Guillaume Abrioux [Tue, 5 Jun 2018 07:31:42 +0000 (09:31 +0200)]
tests: skip disabling fastest mirror detection on atomic host

There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: fix rgw tests
Guillaume Abrioux [Tue, 5 Jun 2018 09:26:11 +0000 (11:26 +0200)]
tests: fix rgw tests

41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoSyntax error fix in rgw multisite role
Vishal Kanaujia [Tue, 5 Jun 2018 08:01:35 +0000 (13:31 +0530)]
Syntax error fix in rgw multisite role

This checkin fixes a syntax error in RGW multisite role under when
clause.

Fixes: #2704
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
7 years agotest: do not always copy admin key
Sébastien Han [Tue, 5 Jun 2018 03:56:55 +0000 (11:56 +0800)]
test: do not always copy admin key

The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: refact rgw pools creation
Guillaume Abrioux [Fri, 1 Jun 2018 15:33:54 +0000 (17:33 +0200)]
rgw: refact rgw pools creation

Refact of 8704144e3157aa253fb7563fe701d9d434bf2f3e
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoUse python instead of python2
Ha Phan [Mon, 4 Jun 2018 09:36:48 +0000 (17:36 +0800)]
Use python instead of python2

The initial keyring is generated from ansible server locally and the snippet works well for both v2 and v3 of python.

I don't see any reason why we should explicitly invoke`python2` instead of just `python`.

In some setups, `python2` is not symlinked to `python`; while `python` and `python3` refer to v2 and v3 respectively.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
7 years agoceph-common: add firewall rules for ceph-mgr
Sébastien Han [Mon, 4 Jun 2018 02:40:14 +0000 (10:40 +0800)]
ceph-common: add firewall rules for ceph-mgr

Prior to this commit the firewall tasks were not opening the ceph-mgr
ports. This would lead to unclean configuration since the ceph-mgr
daemons can not connect to the OSDs.
Thi commit opens the right ports on the ceph-mgr nodes to talk with the
OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRolling upgrades should use norebalance flag for OSDs
Vishal Kanaujia [Wed, 30 May 2018 07:55:18 +0000 (13:25 +0530)]
Rolling upgrades should use norebalance flag for OSDs

The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
7 years agoceph-defaults: Enable local epel repository
Erwan Velu [Fri, 1 Jun 2018 16:53:10 +0000 (18:53 +0200)]
ceph-defaults: Enable local epel repository

During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
7 years agorgws: renames create_pools variable with rgw_create_pools.
jtudelag [Thu, 31 May 2018 15:01:44 +0000 (17:01 +0200)]
rgws: renames create_pools variable with rgw_create_pools.

Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
7 years agoAdds RGWs pool creation to containerized installation.
jtudelag [Sun, 4 Mar 2018 22:06:48 +0000 (23:06 +0100)]
Adds RGWs pool creation to containerized installation.

ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
7 years agomons: move set_fact of openstack_keys in ceph-osd
Guillaume Abrioux [Fri, 1 Jun 2018 13:11:21 +0000 (15:11 +0200)]
mons: move set_fact of openstack_keys in ceph-osd

Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-defaults: add the nautilus 14.x entry to ceph_release_num
Andrew Schoen [Thu, 31 May 2018 17:02:46 +0000 (12:02 -0500)]
ceph-defaults: add the nautilus 14.x entry to ceph_release_num

The first 14.x tag has been cut so this needs to be added so that
version detection will still work on the master branch of ceph.

Fixes: https://github.com/ceph/ceph-ansible/issues/2671
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoosds: wait for osds to be up before creating pools
Guillaume Abrioux [Fri, 1 Jun 2018 08:38:46 +0000 (10:38 +0200)]
osds: wait for osds to be up before creating pools

This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.

Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMakefile: followup on #2585
Guillaume Abrioux [Thu, 31 May 2018 09:25:49 +0000 (11:25 +0200)]
Makefile: followup on #2585

Fix a typo in `tag` target, double quote are missing here.

Without them, the `make tag` command fails like this:

```
if [[ "v3.0.35" ==  ]]; then \
            echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; \
            exit 1; \
        fi
/bin/sh: -c: line 0: unexpected argument `]]' to conditional binary operator
/bin/sh: -c: line 0: syntax error near `;'
/bin/sh: -c: line 0: `if [[ "v3.0.35" ==  ]]; then     echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35";     exit 1; fi'
make: *** [tag] Error 2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomdss: do not make pg_num a mandatory params
Guillaume Abrioux [Tue, 29 May 2018 07:18:08 +0000 (09:18 +0200)]
mdss: do not make pg_num a mandatory params

When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: do not set docker_exec_cmd fact
Guillaume Abrioux [Wed, 30 May 2018 10:09:16 +0000 (12:09 +0200)]
osds: do not set docker_exec_cmd fact

in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.

By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.

Removing this task will allow to collocate an OSD with a MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: fix broken symlink
Guillaume Abrioux [Mon, 28 May 2018 08:30:42 +0000 (10:30 +0200)]
tests: fix broken symlink

`requirements2.5.txt` is pointing to `tests/requirements2.4.txt` while
it should point to `requirements2.4.txt` since they are in the same
directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: resize root partition when atomic host
Guillaume Abrioux [Wed, 30 May 2018 07:17:09 +0000 (09:17 +0200)]
tests: resize root partition when atomic host

For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoCONTRIBUTING.md: Initial release
Erwan Velu [Mon, 28 May 2018 13:56:39 +0000 (15:56 +0200)]
CONTRIBUTING.md: Initial release

As per issue #2623, it is important to define the commit guidelines.
This commit is about adding a first version of it.

Fixes: #2653
Signed-off-by: Erwan Velu <erwan@redhat.com>
7 years agotests: avoid yum failures
Guillaume Abrioux [Mon, 28 May 2018 10:02:49 +0000 (12:02 +0200)]
tests: avoid yum failures

In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
7 years agopython-netaddr is required to generate ceph.conf
Ha Phan [Fri, 25 May 2018 13:07:40 +0000 (21:07 +0800)]
python-netaddr is required to generate ceph.conf

ceph-config: add netaddr to python requirements

netaddr is required to generate ceph.conf, let's add this requirement in `requirements.txt`

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
7 years agorolling_update: add role ceph-iscsi-gw
Sébastien Han [Thu, 10 May 2018 22:57:59 +0000 (15:57 -0700)]
rolling_update: add role ceph-iscsi-gw

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdd privilege escalation to iscsi purge tasks
Paul Cuzner [Fri, 25 May 2018 00:13:20 +0000 (12:13 +1200)]
Add privilege escalation to iscsi purge tasks

Without the escalation, invocation from non-root
users with fail when accessing the rados config
object, or when attempting to log to /var/log

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
7 years agomds: move mds fs pools creation
Guillaume Abrioux [Fri, 25 May 2018 00:39:01 +0000 (02:39 +0200)]
mds: move mds fs pools creation

When collocating mds on monitor node, the cephpfs will fail
because `docker_exec_cmd` is reset to `ceph-mds-monXX` which is
incorrect because we need to delegate the task on `ceph-mon-monXX`.
In addition, it wouldn't have worked since `ceph-mds-monXX` container
isn't started yet.

Moving the task earlier in the `ceph-mds` role will fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorgw: container add option to configure multi-site zone
Sébastien Han [Mon, 16 Apr 2018 13:57:23 +0000 (15:57 +0200)]
rgw: container add option to configure multi-site zone

You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoplaybook: follow up on #2553
Guillaume Abrioux [Thu, 24 May 2018 13:07:56 +0000 (15:07 +0200)]
playbook: follow up on #2553

Since we fixed the `gather and delegate facts` task, this exception is
not needed anymore. It's a leftover that should be removed to save some
time when deploying a cluster with a large client number.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agogroup_vars: resync group_vars
Sébastien Han [Wed, 23 May 2018 19:44:24 +0000 (12:44 -0700)]
group_vars: resync group_vars

The previous commit changed the content of roles/$ROLE/default/main.yml
so we have to re generate the group_vars files.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomdss: move cephfs pools creation in ceph-mds
Guillaume Abrioux [Wed, 23 May 2018 03:07:38 +0000 (05:07 +0200)]
mdss: move cephfs pools creation in ceph-mds

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move cephfs pools creation in `ceph-mds` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: move cephfs_pools variable
Guillaume Abrioux [Wed, 23 May 2018 02:59:37 +0000 (04:59 +0200)]
tests: move cephfs_pools variable

let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: move openstack pools creation in ceph-osd
Guillaume Abrioux [Tue, 22 May 2018 14:41:40 +0000 (16:41 +0200)]
osds: move openstack pools creation in ceph-osd

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodefaults: resync sample files with actual defaults
Guillaume Abrioux [Tue, 22 May 2018 14:04:15 +0000 (16:04 +0200)]
defaults: resync sample files with actual defaults

6644dba5e3a46a5a8c1cf7e66b97f7b7d62e8e95 and
1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 introduced changes some changes
in defaults variables files but it seems we've forgotten to
regenerate the sample files.
This commit aims to resync the content of `all.yml.sample`,
`mons.yml.sample` and `rhcs.yml.sample`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-radosgw: disable NSS PKI db when SSL is disabled
Luigi Toscano [Tue, 22 May 2018 09:46:33 +0000 (11:46 +0200)]
ceph-radosgw: disable NSS PKI db when SSL is disabled

The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a122b7ce8ec6a5c27a70bc19589d177.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
7 years agorhcs: bump version to 3.0 for stable 3.1
Sébastien Han [Fri, 4 May 2018 23:41:49 +0000 (01:41 +0200)]
rhcs: bump version to 3.0 for stable 3.1

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoSkip GPT header creation for lvm osd scenario
Vishal Kanaujia [Wed, 16 May 2018 09:58:31 +0000 (15:28 +0530)]
Skip GPT header creation for lvm osd scenario

The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
7 years agorolling_update: fix get fsid for containers
Sébastien Han [Tue, 22 May 2018 23:52:40 +0000 (16:52 -0700)]
rolling_update: fix get fsid for containers

When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFix restarting OSDs twice during a rolling update.
Subhachandra Chandra [Fri, 16 Mar 2018 17:10:14 +0000 (10:10 -0700)]
Fix restarting OSDs twice during a rolling update.

During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.

7 years agovalidate: split schema for lvm osd scenario per objecstore
Alfredo Deza [Mon, 21 May 2018 12:09:00 +0000 (08:09 -0400)]
validate: split schema for lvm osd scenario per objecstore

The bluestore lvm osd scenario does not require a journal entry. For
this reason we need to have a separate schema for that and filestore or
notario will fail validation for the bluestore lvm scenario because the
journal key does not exist in lvm_volumes.

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit d916246bfeb927779fa920bab2e0cc736128c8a7)

7 years agoceph-validate: do not check ceph version on dev or rhcs installs
Andrew Schoen [Mon, 21 May 2018 15:11:22 +0000 (10:11 -0500)]
ceph-validate: do not check ceph version on dev or rhcs installs

A dev or rhcs install does not require ceph_stable_release to be set and
instead generates that by looking at the installed ceph-version.
However, at this point in the playbook ceph may not have been installed
yet and ceph-common has not be run.

Fixes: https://github.com/ceph/ceph-ansible/issues/2618
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agopurge_cluster: fix dmcrypt purge
Guillaume Abrioux [Fri, 18 May 2018 15:56:03 +0000 (17:56 +0200)]
purge_cluster: fix dmcrypt purge

dmcrypt devices aren't closed properly, therefore, it may fail when
trying to redeploy after a purge.

Typical errors:

```
ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command
'/sbin/blkid' returned non-zero exit status 2
```

```
ceph-disk: Error: unable to read dm-crypt key:
/var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf:
/etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key
```

Closing properly dmcrypt devices allows to redeploy without error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-validate: move system checks from ceph-common to ceph-validate
Andrew Schoen [Thu, 17 May 2018 18:47:27 +0000 (13:47 -0500)]
ceph-validate: move system checks from ceph-common to ceph-validate

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoset the python-notario version to >= 0.0.13 in ceph-ansible.spec.in
Andrew Schoen [Thu, 17 May 2018 18:42:36 +0000 (13:42 -0500)]
set the python-notario version to >= 0.0.13 in ceph-ansible.spec.in

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: combine validate play with fact gathering play
Andrew Schoen [Tue, 15 May 2018 18:01:54 +0000 (13:01 -0500)]
site.yml: combine validate play with fact gathering play

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agodocs: explain the ceph-validate role and how it validates configuration
Andrew Schoen [Thu, 10 May 2018 16:30:39 +0000 (11:30 -0500)]
docs: explain the ceph-validate role and how it validates configuration

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: support validation of osd_auto_discovery
Andrew Schoen [Wed, 9 May 2018 18:36:35 +0000 (13:36 -0500)]
validate: support validation of osd_auto_discovery

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: remove objectstore from osd options schema
Andrew Schoen [Wed, 9 May 2018 14:36:08 +0000 (09:36 -0500)]
validate: remove objectstore from osd options schema

objectstore is not a valid option, it's osd_objectstore and it's already
validated in install_options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: remove backwards compat for containerized_deployment
Andrew Schoen [Tue, 8 May 2018 14:26:07 +0000 (09:26 -0500)]
ceph-defaults: remove backwards compat for containerized_deployment

The validation module does not get config options with the template
syntax rendered, so we're gonna remove that and just default it to
False. The backwards compat was schedule to be removed in 3.1 anyway.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite-docker: validate config before pulling container images
Andrew Schoen [Thu, 3 May 2018 21:47:54 +0000 (16:47 -0500)]
site-docker: validate config before pulling container images

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: adds a CEPH_RELEASES constant
Andrew Schoen [Thu, 3 May 2018 21:27:44 +0000 (16:27 -0500)]
validate: adds a CEPH_RELEASES constant

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: add support for containerized_deployment
Andrew Schoen [Wed, 2 May 2018 21:11:51 +0000 (16:11 -0500)]
validate: add support for containerized_deployment

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: show an error and stop the playbook when notario is missing
Andrew Schoen [Wed, 2 May 2018 21:06:08 +0000 (16:06 -0500)]
validate: show an error and stop the playbook when notario is missing

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite-docker.yml: add config validation play
Andrew Schoen [Wed, 2 May 2018 18:52:28 +0000 (13:52 -0500)]
site-docker.yml: add config validation play

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: the validation play must use become: true
Andrew Schoen [Wed, 2 May 2018 18:51:39 +0000 (13:51 -0500)]
site.yml: the validation play must use become: true

The ceph-defaults role expects this.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agodocs: add instructions for installing ansible and notario
Andrew Schoen [Wed, 2 May 2018 18:09:59 +0000 (13:09 -0500)]
docs: add instructions for installing ansible and notario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoadds a requiremnts.txt file for the project
Andrew Schoen [Wed, 2 May 2018 17:55:33 +0000 (12:55 -0500)]
adds a requiremnts.txt file for the project

With the addition of the validate module we need to ensure
that notario is installed. This will be done with the use
of this requirments.txt file and pip.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: use notario>=0.0.13 when testing
Andrew Schoen [Wed, 2 May 2018 17:54:11 +0000 (12:54 -0500)]
tests: use notario>=0.0.13 when testing

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: fix failing tasks when osd_scenario was not set correctly
Andrew Schoen [Tue, 1 May 2018 16:22:31 +0000 (11:22 -0500)]
ceph-defaults: fix failing tasks when osd_scenario was not set correctly

When devices is not defined because you want to use the 'lvm'
osd_scenario but you've made a mistake selecting that scenario these
tasks should not fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: improve error messages when config fails validation
Andrew Schoen [Tue, 1 May 2018 16:21:48 +0000 (11:21 -0500)]
validate: improve error messages when config fails validation

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: abort playbook when it fails during config validation
Andrew Schoen [Mon, 30 Apr 2018 20:47:35 +0000 (15:47 -0500)]
site.yml: abort playbook when it fails during config validation

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: move cephfs vars from the ceph-mon role
Andrew Schoen [Mon, 30 Apr 2018 19:21:12 +0000 (14:21 -0500)]
ceph-defaults: move cephfs vars from the ceph-mon role

We're doing this so we can validate this in the ceph-validate role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only validate cephfs_pools on mon nodes
Andrew Schoen [Mon, 30 Apr 2018 18:42:58 +0000 (13:42 -0500)]
validate: only validate cephfs_pools on mon nodes

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only validate osd config options on osd hosts
Andrew Schoen [Mon, 30 Apr 2018 18:08:49 +0000 (13:08 -0500)]
validate: only validate osd config options on osd hosts

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only check mon and rgw config if the node is in those groups
Andrew Schoen [Mon, 30 Apr 2018 17:59:07 +0000 (12:59 -0500)]
validate: only check mon and rgw config if the node is in those groups

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: remove the testing task that fails the playbook run
Andrew Schoen [Mon, 30 Apr 2018 16:39:25 +0000 (11:39 -0500)]
site.yml: remove the testing task that fails the playbook run

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: check rados config options
Andrew Schoen [Mon, 30 Apr 2018 16:04:42 +0000 (11:04 -0500)]
validate: check rados config options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: make sure ceph_stable_release is set to the correct value
Andrew Schoen [Thu, 26 Apr 2018 20:47:33 +0000 (15:47 -0500)]
validate: make sure ceph_stable_release is set to the correct value

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move var checks from ceph-common into this role
Andrew Schoen [Thu, 26 Apr 2018 16:15:02 +0000 (11:15 -0500)]
ceph-validate: move var checks from ceph-common into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move var checks from ceph-osd into this role
Andrew Schoen [Thu, 26 Apr 2018 15:09:47 +0000 (10:09 -0500)]
ceph-validate: move var checks from ceph-osd into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move ceph-mon config checks into this role
Andrew Schoen [Wed, 25 Apr 2018 19:59:54 +0000 (14:59 -0500)]
ceph-validate: move ceph-mon config checks into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoadds a new ceph-validate role
Andrew Schoen [Wed, 25 Apr 2018 19:57:27 +0000 (14:57 -0500)]
adds a new ceph-validate role

This will be used to validate config given to ceph-ansible.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: validate osd_scenarios
Andrew Schoen [Mon, 23 Apr 2018 16:06:22 +0000 (11:06 -0500)]
validate: validate osd_scenarios

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: check monitor options
Andrew Schoen [Wed, 18 Apr 2018 18:29:25 +0000 (13:29 -0500)]
validate: check monitor options

validates monitor_address, monitor_address_block and monitor_interface

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: move validate task to it's own play
Andrew Schoen [Fri, 13 Apr 2018 15:55:38 +0000 (10:55 -0500)]
site.yml: move validate task to it's own play

This needs to be in it's own play with ceph-defaults included
so that I can validate things that might be defaulted in that
role.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: first pass at validating the install options
Andrew Schoen [Wed, 11 Apr 2018 20:03:53 +0000 (15:03 -0500)]
validate: first pass at validating the install options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite: add validation task
Alfredo Deza [Fri, 23 Mar 2018 14:28:11 +0000 (10:28 -0400)]
site: add validation task

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agorpm: add python-notario as a dependency for validation
Alfredo Deza [Fri, 23 Mar 2018 14:21:34 +0000 (10:21 -0400)]
rpm: add python-notario as a dependency for validation

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agolibrary: add a placeholder module for the validate action plugin
Alfredo Deza [Fri, 23 Mar 2018 14:00:54 +0000 (10:00 -0400)]
library: add a placeholder module for the validate action plugin

Signed-off-by: Alfredo Deza <adeza@redhat.com>