]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agomdss: move cephfs pools creation in ceph-mds
Guillaume Abrioux [Wed, 23 May 2018 03:07:38 +0000 (05:07 +0200)]
mdss: move cephfs pools creation in ceph-mds

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move cephfs pools creation in `ceph-mds` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: move cephfs_pools variable
Guillaume Abrioux [Wed, 23 May 2018 02:59:37 +0000 (04:59 +0200)]
tests: move cephfs_pools variable

let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: move openstack pools creation in ceph-osd
Guillaume Abrioux [Tue, 22 May 2018 14:41:40 +0000 (16:41 +0200)]
osds: move openstack pools creation in ceph-osd

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodefaults: resync sample files with actual defaults
Guillaume Abrioux [Tue, 22 May 2018 14:04:15 +0000 (16:04 +0200)]
defaults: resync sample files with actual defaults

6644dba5e3a46a5a8c1cf7e66b97f7b7d62e8e95 and
1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 introduced changes some changes
in defaults variables files but it seems we've forgotten to
regenerate the sample files.
This commit aims to resync the content of `all.yml.sample`,
`mons.yml.sample` and `rhcs.yml.sample`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-radosgw: disable NSS PKI db when SSL is disabled
Luigi Toscano [Tue, 22 May 2018 09:46:33 +0000 (11:46 +0200)]
ceph-radosgw: disable NSS PKI db when SSL is disabled

The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a122b7ce8ec6a5c27a70bc19589d177.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
7 years agorhcs: bump version to 3.0 for stable 3.1
Sébastien Han [Fri, 4 May 2018 23:41:49 +0000 (01:41 +0200)]
rhcs: bump version to 3.0 for stable 3.1

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoSkip GPT header creation for lvm osd scenario
Vishal Kanaujia [Wed, 16 May 2018 09:58:31 +0000 (15:28 +0530)]
Skip GPT header creation for lvm osd scenario

The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
7 years agorolling_update: fix get fsid for containers
Sébastien Han [Tue, 22 May 2018 23:52:40 +0000 (16:52 -0700)]
rolling_update: fix get fsid for containers

When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFix restarting OSDs twice during a rolling update.
Subhachandra Chandra [Fri, 16 Mar 2018 17:10:14 +0000 (10:10 -0700)]
Fix restarting OSDs twice during a rolling update.

During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.

7 years agovalidate: split schema for lvm osd scenario per objecstore
Alfredo Deza [Mon, 21 May 2018 12:09:00 +0000 (08:09 -0400)]
validate: split schema for lvm osd scenario per objecstore

The bluestore lvm osd scenario does not require a journal entry. For
this reason we need to have a separate schema for that and filestore or
notario will fail validation for the bluestore lvm scenario because the
journal key does not exist in lvm_volumes.

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit d916246bfeb927779fa920bab2e0cc736128c8a7)

7 years agoceph-validate: do not check ceph version on dev or rhcs installs
Andrew Schoen [Mon, 21 May 2018 15:11:22 +0000 (10:11 -0500)]
ceph-validate: do not check ceph version on dev or rhcs installs

A dev or rhcs install does not require ceph_stable_release to be set and
instead generates that by looking at the installed ceph-version.
However, at this point in the playbook ceph may not have been installed
yet and ceph-common has not be run.

Fixes: https://github.com/ceph/ceph-ansible/issues/2618
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agopurge_cluster: fix dmcrypt purge
Guillaume Abrioux [Fri, 18 May 2018 15:56:03 +0000 (17:56 +0200)]
purge_cluster: fix dmcrypt purge

dmcrypt devices aren't closed properly, therefore, it may fail when
trying to redeploy after a purge.

Typical errors:

```
ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command
'/sbin/blkid' returned non-zero exit status 2
```

```
ceph-disk: Error: unable to read dm-crypt key:
/var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf:
/etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key
```

Closing properly dmcrypt devices allows to redeploy without error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-validate: move system checks from ceph-common to ceph-validate
Andrew Schoen [Thu, 17 May 2018 18:47:27 +0000 (13:47 -0500)]
ceph-validate: move system checks from ceph-common to ceph-validate

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoset the python-notario version to >= 0.0.13 in ceph-ansible.spec.in
Andrew Schoen [Thu, 17 May 2018 18:42:36 +0000 (13:42 -0500)]
set the python-notario version to >= 0.0.13 in ceph-ansible.spec.in

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: combine validate play with fact gathering play
Andrew Schoen [Tue, 15 May 2018 18:01:54 +0000 (13:01 -0500)]
site.yml: combine validate play with fact gathering play

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agodocs: explain the ceph-validate role and how it validates configuration
Andrew Schoen [Thu, 10 May 2018 16:30:39 +0000 (11:30 -0500)]
docs: explain the ceph-validate role and how it validates configuration

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: support validation of osd_auto_discovery
Andrew Schoen [Wed, 9 May 2018 18:36:35 +0000 (13:36 -0500)]
validate: support validation of osd_auto_discovery

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: remove objectstore from osd options schema
Andrew Schoen [Wed, 9 May 2018 14:36:08 +0000 (09:36 -0500)]
validate: remove objectstore from osd options schema

objectstore is not a valid option, it's osd_objectstore and it's already
validated in install_options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: remove backwards compat for containerized_deployment
Andrew Schoen [Tue, 8 May 2018 14:26:07 +0000 (09:26 -0500)]
ceph-defaults: remove backwards compat for containerized_deployment

The validation module does not get config options with the template
syntax rendered, so we're gonna remove that and just default it to
False. The backwards compat was schedule to be removed in 3.1 anyway.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite-docker: validate config before pulling container images
Andrew Schoen [Thu, 3 May 2018 21:47:54 +0000 (16:47 -0500)]
site-docker: validate config before pulling container images

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: adds a CEPH_RELEASES constant
Andrew Schoen [Thu, 3 May 2018 21:27:44 +0000 (16:27 -0500)]
validate: adds a CEPH_RELEASES constant

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: add support for containerized_deployment
Andrew Schoen [Wed, 2 May 2018 21:11:51 +0000 (16:11 -0500)]
validate: add support for containerized_deployment

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: show an error and stop the playbook when notario is missing
Andrew Schoen [Wed, 2 May 2018 21:06:08 +0000 (16:06 -0500)]
validate: show an error and stop the playbook when notario is missing

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite-docker.yml: add config validation play
Andrew Schoen [Wed, 2 May 2018 18:52:28 +0000 (13:52 -0500)]
site-docker.yml: add config validation play

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: the validation play must use become: true
Andrew Schoen [Wed, 2 May 2018 18:51:39 +0000 (13:51 -0500)]
site.yml: the validation play must use become: true

The ceph-defaults role expects this.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agodocs: add instructions for installing ansible and notario
Andrew Schoen [Wed, 2 May 2018 18:09:59 +0000 (13:09 -0500)]
docs: add instructions for installing ansible and notario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoadds a requiremnts.txt file for the project
Andrew Schoen [Wed, 2 May 2018 17:55:33 +0000 (12:55 -0500)]
adds a requiremnts.txt file for the project

With the addition of the validate module we need to ensure
that notario is installed. This will be done with the use
of this requirments.txt file and pip.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: use notario>=0.0.13 when testing
Andrew Schoen [Wed, 2 May 2018 17:54:11 +0000 (12:54 -0500)]
tests: use notario>=0.0.13 when testing

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: fix failing tasks when osd_scenario was not set correctly
Andrew Schoen [Tue, 1 May 2018 16:22:31 +0000 (11:22 -0500)]
ceph-defaults: fix failing tasks when osd_scenario was not set correctly

When devices is not defined because you want to use the 'lvm'
osd_scenario but you've made a mistake selecting that scenario these
tasks should not fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: improve error messages when config fails validation
Andrew Schoen [Tue, 1 May 2018 16:21:48 +0000 (11:21 -0500)]
validate: improve error messages when config fails validation

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: abort playbook when it fails during config validation
Andrew Schoen [Mon, 30 Apr 2018 20:47:35 +0000 (15:47 -0500)]
site.yml: abort playbook when it fails during config validation

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-defaults: move cephfs vars from the ceph-mon role
Andrew Schoen [Mon, 30 Apr 2018 19:21:12 +0000 (14:21 -0500)]
ceph-defaults: move cephfs vars from the ceph-mon role

We're doing this so we can validate this in the ceph-validate role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only validate cephfs_pools on mon nodes
Andrew Schoen [Mon, 30 Apr 2018 18:42:58 +0000 (13:42 -0500)]
validate: only validate cephfs_pools on mon nodes

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only validate osd config options on osd hosts
Andrew Schoen [Mon, 30 Apr 2018 18:08:49 +0000 (13:08 -0500)]
validate: only validate osd config options on osd hosts

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: only check mon and rgw config if the node is in those groups
Andrew Schoen [Mon, 30 Apr 2018 17:59:07 +0000 (12:59 -0500)]
validate: only check mon and rgw config if the node is in those groups

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: remove the testing task that fails the playbook run
Andrew Schoen [Mon, 30 Apr 2018 16:39:25 +0000 (11:39 -0500)]
site.yml: remove the testing task that fails the playbook run

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: check rados config options
Andrew Schoen [Mon, 30 Apr 2018 16:04:42 +0000 (11:04 -0500)]
validate: check rados config options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: make sure ceph_stable_release is set to the correct value
Andrew Schoen [Thu, 26 Apr 2018 20:47:33 +0000 (15:47 -0500)]
validate: make sure ceph_stable_release is set to the correct value

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move var checks from ceph-common into this role
Andrew Schoen [Thu, 26 Apr 2018 16:15:02 +0000 (11:15 -0500)]
ceph-validate: move var checks from ceph-common into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move var checks from ceph-osd into this role
Andrew Schoen [Thu, 26 Apr 2018 15:09:47 +0000 (10:09 -0500)]
ceph-validate: move var checks from ceph-osd into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-validate: move ceph-mon config checks into this role
Andrew Schoen [Wed, 25 Apr 2018 19:59:54 +0000 (14:59 -0500)]
ceph-validate: move ceph-mon config checks into this role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoadds a new ceph-validate role
Andrew Schoen [Wed, 25 Apr 2018 19:57:27 +0000 (14:57 -0500)]
adds a new ceph-validate role

This will be used to validate config given to ceph-ansible.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: validate osd_scenarios
Andrew Schoen [Mon, 23 Apr 2018 16:06:22 +0000 (11:06 -0500)]
validate: validate osd_scenarios

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: check monitor options
Andrew Schoen [Wed, 18 Apr 2018 18:29:25 +0000 (13:29 -0500)]
validate: check monitor options

validates monitor_address, monitor_address_block and monitor_interface

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite.yml: move validate task to it's own play
Andrew Schoen [Fri, 13 Apr 2018 15:55:38 +0000 (10:55 -0500)]
site.yml: move validate task to it's own play

This needs to be in it's own play with ceph-defaults included
so that I can validate things that might be defaulted in that
role.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agovalidate: first pass at validating the install options
Andrew Schoen [Wed, 11 Apr 2018 20:03:53 +0000 (15:03 -0500)]
validate: first pass at validating the install options

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agosite: add validation task
Alfredo Deza [Fri, 23 Mar 2018 14:28:11 +0000 (10:28 -0400)]
site: add validation task

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agorpm: add python-notario as a dependency for validation
Alfredo Deza [Fri, 23 Mar 2018 14:21:34 +0000 (10:21 -0400)]
rpm: add python-notario as a dependency for validation

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agolibrary: add a placeholder module for the validate action plugin
Alfredo Deza [Fri, 23 Mar 2018 14:00:54 +0000 (10:00 -0400)]
library: add a placeholder module for the validate action plugin

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agoplugins create an action plugin for validation using notario
Alfredo Deza [Fri, 23 Mar 2018 13:57:28 +0000 (09:57 -0400)]
plugins create an action plugin for validation using notario

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agodefaults: restart_osd_daemon unit spaces
Sébastien Han [Fri, 18 May 2018 12:43:57 +0000 (14:43 +0200)]
defaults: restart_osd_daemon unit spaces

Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail

It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDo nothing when mgr module is in good state
Michael Vollman [Thu, 17 May 2018 19:17:29 +0000 (15:17 -0400)]
Do nothing when mgr module is in good state

Check whether a mgr module is supposed to be disabled before disabling
it and whether it is already enabled before enabling it.

Signed-off-by: Michael Vollman <michael.b.vollman@gmail.com>
7 years agotake-over: fix bug when trying to override variable
Guillaume Abrioux [Thu, 17 May 2018 15:29:20 +0000 (17:29 +0200)]
take-over: fix bug when trying to override variable

A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.

Typical error:

```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```

Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoAdding mgr_vms variable
Ha Phan [Thu, 17 May 2018 15:26:08 +0000 (23:26 +0800)]
Adding mgr_vms variable

7 years agoFix template reference for ganesha.conf
Andy McCrae [Mon, 19 Feb 2018 16:57:18 +0000 (16:57 +0000)]
Fix template reference for ganesha.conf

We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.

7 years agoswitch: disable ceph-disk units
Sébastien Han [Wed, 16 May 2018 15:37:10 +0000 (17:37 +0200)]
switch: disable ceph-disk units

During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge_cluster: wipe all partitions
Guillaume Abrioux [Wed, 16 May 2018 15:34:38 +0000 (17:34 +0200)]
purge_cluster: wipe all partitions

In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge_cluster: fix bug when building device list
Guillaume Abrioux [Wed, 16 May 2018 14:04:25 +0000 (16:04 +0200)]
purge_cluster: fix bug when building device list

there is some leftover on devices when purging osds because of a invalid
device list construction.

typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
    "changed": true,
    "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
    "delta": "0:00:00.015188",
    "end": "2018-05-16 12:41:40.408597",
    "item": "/dev/sda sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:40.393409"
}

STDOUT:

sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600

STDERR:

Error: Could not stat device /dev/sda sda1 - No such file or directory.
```

the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output

eg:

```
changed: [osd3] => (item=/dev/sda1) => {
    "changed": true,
    "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
    "delta": "0:00:00.013634",
    "end": "2018-05-16 12:41:09.068166",
    "item": "/dev/sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:09.054532"
}

STDOUT:

/dev/sda sda1
```

For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: move osd flag section
Sébastien Han [Wed, 16 May 2018 14:02:41 +0000 (16:02 +0200)]
rolling_update: move osd flag section

During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMakefile: add "make tag" command
Ken Dreyer [Thu, 10 May 2018 23:08:05 +0000 (17:08 -0600)]
Makefile: add "make tag" command

Add a new "make tag" command. This automates some common operations:

1) Automatically determine the next Git tag version number to create.
   For example:
   "3.2.0beta1 -> "3.2.0beta2"
   "3.2.0rc1 -> "3.2.0rc2"
   "3.2.0" -> "3.2.1"

2) Create the Git tag, and print instructions for the user to push it to
   GitHub.

3) Sanity check that HEAD is a stable-* branch or master (bail on
   everything else).

4) Sanity check that HEAD is not already tagged.

Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".

Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontrib: check for lt 3 arguments
Sébastien Han [Wed, 16 May 2018 17:03:33 +0000 (19:03 +0200)]
contrib: check for lt 3 arguments

The script now supports 3 or 4 arguments so we need to check if the
script has less 3 args.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontruib: ability to set a prefix on backport script
Sébastien Han [Wed, 16 May 2018 08:38:24 +0000 (10:38 +0200)]
contruib: ability to set a prefix on backport script

When pushing a PR it might be handy to set the [skip ci] flag if we know
upfront the content should not trigger the CI.

Now you can add [skip ci] as $4 in your command line.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoInstall packages as a list
Andy McCrae [Tue, 1 May 2018 11:20:19 +0000 (12:20 +0100)]
Install packages as a list

To make the package installation more efficient we should install
packages as a list rather than as individual tasks or using a
"with_items" loop. The package managers can handle a list passed to them
to install in one go.

We can use a specified list and substitute any packages that are not to
be installed with the ceph-common package, which is installed on every
package install, then apply the unique filter to the package install
list.

7 years agomon: refactor of mgr key fetching
Guillaume Abrioux [Tue, 15 May 2018 10:20:08 +0000 (12:20 +0200)]
mon: refactor of mgr key fetching

There is no need to stat for created mgr keyrings since they are created
anyway when deploying a ceph cluster > jewel. In case of a jewel
deployment we won't enter that block.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomgr: delete copy_configs.yml (containerized)
Guillaume Abrioux [Tue, 15 May 2018 09:51:53 +0000 (11:51 +0200)]
mgr: delete copy_configs.yml (containerized)

This file is a leftover from PR ceph/ceph-ansible#2516
It is not used anymore so it can be removed.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: fix dest path for mgr keys fetching
Guillaume Abrioux [Tue, 15 May 2018 09:41:26 +0000 (11:41 +0200)]
rolling_update: fix dest path for mgr keys fetching

the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: get fsid in mgr pre_task
Guillaume Abrioux [Fri, 11 May 2018 06:05:11 +0000 (08:05 +0200)]
rolling_update: get fsid in mgr pre_task

{{ fsid }} points to {{ cluster_uuid.stdout }} which is not defined in
this part of the rolling_update playbook.
Since we need to call {{ fsid }} we must get the fsid and register it to
`cluster_uuid`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: move mgr key creation
Sébastien Han [Thu, 10 May 2018 17:38:55 +0000 (10:38 -0700)]
rolling_update: move mgr key creation

Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "mon: fix mgr keyring creation when upgrading from jewel"
Sébastien Han [Thu, 10 May 2018 17:02:44 +0000 (10:02 -0700)]
Revert "mon: fix mgr keyring creation when upgrading from jewel"

This reverts commit 259fae931d77f056b7e1077b023710cfab1e5cca.

7 years agoiscsi-gw: fix issue when trying to mask target
Guillaume Abrioux [Mon, 14 May 2018 15:39:25 +0000 (17:39 +0200)]
iscsi-gw: fix issue when trying to mask target

trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoiscsi: add python-rtslib repository
Sébastien Han [Mon, 14 May 2018 07:21:48 +0000 (09:21 +0200)]
iscsi: add python-rtslib repository

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAllow os_tuning_params to overwrite fs.aio-max-nr
Andy McCrae [Thu, 10 May 2018 10:15:30 +0000 (11:15 +0100)]
Allow os_tuning_params to overwrite fs.aio-max-nr

The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.

To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.

Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.

7 years agoMakefile: bail out on unknown Git tag formats
Ken Dreyer [Thu, 10 May 2018 20:39:07 +0000 (14:39 -0600)]
Makefile: bail out on unknown Git tag formats

Prior to this change, if we created entirely new Git tags patterns like
"3.2.0alpha" or "3.2.0foobar", the Makefile would incorrectly translate
the Git tag name into a Name-Version-Release that would prevent upgrades
to "newer" versions.

This happened for example in
https://bugs.centos.org/view.php?id=14593, "Incorrect naming scheme for
a build of ceph-ansible prevents subsequent updates to be installed"

If we encounter a new Git tag format that we cannot parse,
pessimistically bail out early instead of trying to build an RPM.

The purpose of this safeguard is to prevent Jenkins from building RPMs
that cannot be easily upgraded.

7 years agoclient: remove default value for pg_num in pools creation
Guillaume Abrioux [Thu, 3 May 2018 19:36:21 +0000 (21:36 +0200)]
client: remove default value for pg_num in pools creation

trying to set the default value for pg_num to
`hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num'])` will
break in case of external client nodes deployment.
the `pg_num` attribute should be mandatory and be tested in future
`ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontrib: update backport script to reflect stable branch
Sébastien Han [Wed, 9 May 2018 21:30:12 +0000 (14:30 -0700)]
contrib: update backport script to reflect stable branch

Since we now do backports on stable-3.0 and stable-3.1 we have to use
the name of the stable branch in the backport branch name. If we don't
do this we will end up with conflicting branch names.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadds missing state needed to upgrade nfs-ganesha
Gregory Meno [Wed, 9 May 2018 18:17:26 +0000 (11:17 -0700)]
adds missing state needed to upgrade nfs-ganesha

in tasks for os_family Red Hat we were missing this

fixes: bz1575859
Signed-off-by: Gregory Meno <gmeno@redhat.com>
7 years agomon: fix mgr keyring creation when upgrading from jewel v3.1.0rc2
Guillaume Abrioux [Wed, 9 May 2018 12:42:27 +0000 (14:42 +0200)]
mon: fix mgr keyring creation when upgrading from jewel

On containerized deployment,
when upgrading from jewel to luminous, mgr keyring creation fails because the
command to create mgr keyring is executed on a container that is still
running jewel since the container is restarted later to run the new
image, therefore, it fails with bad entity error.

To get around this situation, we can delegate the command to create
these keyrings on the first monitor when we are running the playbook on the last monitor.
That way we ensure we will issue the command on a container that has
been well restarted with the new image.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosd: clean legacy syntax in ceph-osd-run.sh.j2
Guillaume Abrioux [Wed, 9 May 2018 01:10:30 +0000 (03:10 +0200)]
osd: clean legacy syntax in ceph-osd-run.sh.j2

Quick clean on a legacy syntax due to e0a264c7e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMake sure the restart_mds_daemon script is created with the correct MDS name
Simone Caronni [Thu, 5 Apr 2018 14:14:23 +0000 (16:14 +0200)]
Make sure the restart_mds_daemon script is created with the correct MDS name

7 years agocommon: enable Tools repo for rhcs clients
Sébastien Han [Tue, 8 May 2018 14:11:14 +0000 (07:11 -0700)]
common: enable Tools repo for rhcs clients

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574458
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFix install of nfs-ganesha-ceph for Debian/SuSE v3.1.0beta9
Andy McCrae [Thu, 22 Mar 2018 12:19:22 +0000 (12:19 +0000)]
Fix install of nfs-ganesha-ceph for Debian/SuSE

The Debian and SuSE installs for nfs-ganesha on the non-rhcs repository
requires you to allow_unauthenticated for Debian, and disable_gpg_check
for SuSE. The nfs-ganesha-rgw package already does this, but the
nfs-ganesha-ceph package will fail to install because of this same
issue.

This PR moves the installations to happen when the appropriate flags are
set to True (nfs_obj_gw & nfs_file_gw), but does it per distro (one for
SuSE and one for Debian) so that the appropriate flag can be passed to
ignore the GPG check.

7 years agoplaybook: improve facts gathering
Guillaume Abrioux [Thu, 3 May 2018 16:41:16 +0000 (18:41 +0200)]
playbook: improve facts gathering

there is no need to gather facts with O(N^2) way.
Only one node should gather facts from other node.

Fixes: #2553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-nfs: disable attribute caching
Ramana Raja [Thu, 3 May 2018 12:10:13 +0000 (17:40 +0530)]
ceph-nfs: disable attribute caching

When 'ceph_nfs_disable_caching' is set to True, disable attribute
caching done by Ganesha for all Ganesha exports.

Signed-off-by: Ramana Raja <rraja@redhat.com>
7 years agocommon: copy iso files if rolling_update
Sébastien Han [Thu, 3 May 2018 14:54:53 +0000 (16:54 +0200)]
common: copy iso files if rolling_update

If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMove apt cache update to individual task per role
Andy McCrae [Thu, 26 Apr 2018 09:42:11 +0000 (10:42 +0100)]
Move apt cache update to individual task per role

The apt-cache update can fail due to transient issues related to the
action being a network operation. To reduce the impact of these
transient failures this patch adds a retry to the update_cache task.

However, the apt_repository tasks which would perform an apt_update
won't retry the apt_update on a failure in the same way, as such this PR
moves the apt_update into an individual task, once per role.

Finally, the apt_repository tasks no longer have a changed_when: false,
and the apt_cache update is only performed once per role, if the
repositories change. Otherwise the cache is updated on the "apt" install
tasks if the cache_timeout has been reached.

7 years agoclient: fix pool creation
Guillaume Abrioux [Mon, 30 Apr 2018 18:53:42 +0000 (20:53 +0200)]
client: fix pool creation

the value in `docker_exec_client_cmd` doesn't allow to check for
existing pools because it's set with a wrong value for the entrypoint
that is going to be used.
It means the check were going to fail anyway even if pools actually exist.

Using jinja syntax to set `docker_exec_cmd` allows to handle the case
where you don't have monitors in your inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomon: change application pool support
Sébastien Han [Thu, 26 Apr 2018 17:55:48 +0000 (19:55 +0200)]
mon: change application pool support

If openstack_pools contains an application key it will be used to apply
this application pool type to a pool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1562220
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocheck if pools already exist before creating them
Guillaume Abrioux [Fri, 27 Apr 2018 12:48:33 +0000 (14:48 +0200)]
check if pools already exist before creating them

Add a task to check if pools already exist before we create them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: update the type for the rule used in pools
Guillaume Abrioux [Wed, 25 Apr 2018 15:33:35 +0000 (17:33 +0200)]
tests: update the type for the rule used in pools

As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: fix ceph_uid fact for osd
Guillaume Abrioux [Wed, 25 Apr 2018 12:20:35 +0000 (14:20 +0200)]
switch: fix ceph_uid fact for osd

In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: resolve device path so we can umount the osd data dir
Sébastien Han [Thu, 19 Apr 2018 12:45:03 +0000 (14:45 +0200)]
switch: resolve device path so we can umount the osd data dir

If we don't do this, umounting devices declared like this
/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001

will fail like:

umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found

Since we append '1' (partition 1), this won't work.
So we need to resolved the link to get something like /dev/sdb and then
append 1 to /dev/sdb1

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: fix ceph_uid fact
Sébastien Han [Thu, 19 Apr 2018 08:28:56 +0000 (10:28 +0200)]
switch: fix ceph_uid fact

Latest is now centos not ubuntu anymore so the condition was wrong.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "add .vscode/ to gitignore"
Sébastien Han [Fri, 27 Apr 2018 11:19:25 +0000 (13:19 +0200)]
Revert "add .vscode/ to gitignore"

This reverts commit 3c4319ca4b5355d69b2925e916420f86d29ee524.

7 years agomon/client: honor key mode when copying it to other nodes v3.1.0beta8
Sébastien Han [Mon, 23 Apr 2018 08:02:16 +0000 (10:02 +0200)]
mon/client: honor key mode when copying it to other nodes

The last mon creates the keys with a particular mode, while copying them
to the other mons (first and second) we must re-use the mode that was
set.

The same applies for the client node, the slurp preserves the initial
'item' so we can get the mode for the copy.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: bump client nodes to 2
Sébastien Han [Mon, 23 Apr 2018 08:01:23 +0000 (10:01 +0200)]
ci: bump client nodes to 2

In order to test the key distribution is correct we must have 2 client
nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: remove redundant copy task
Sébastien Han [Mon, 23 Apr 2018 07:52:18 +0000 (09:52 +0200)]
mon: remove redundant copy task

We had twice the same task, also one was overriding the mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon/client: remove acl code
Sébastien Han [Fri, 20 Apr 2018 14:44:41 +0000 (16:44 +0200)]
mon/client: remove acl code

Applying ACL on the keyrings is not used anymore so let's remove this
code.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon/client: apply mode from ceph_key
Sébastien Han [Fri, 20 Apr 2018 14:37:05 +0000 (16:37 +0200)]
mon/client: apply mode from ceph_key

Do not use a dedicated task for this but use the ceph_key module
capability to set file mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph_key: ability to apply a mode to a file
Sébastien Han [Fri, 20 Apr 2018 14:35:39 +0000 (16:35 +0200)]
ceph_key: ability to apply a mode to a file

You can now create keys and set file mode on them. Use the 'mode'
parameter for that, mode must be in octal so 0644.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadd AArch64 to supported architecture
Di Xu [Mon, 23 Apr 2018 02:08:48 +0000 (10:08 +0800)]
add AArch64 to supported architecture

works on AArch64 platform