]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agoAdd default for radosgw_keystone_ssl
Andy McCrae [Sat, 27 Jan 2018 19:40:09 +0000 (19:40 +0000)]
Add default for radosgw_keystone_ssl

This should default to False. The default for Keystone is not to use PKI
keys, additionally, anybody using this setting had to have been manually
setting it before.

Fixes: #2111
7 years agoRevert "monitor_interface: document need to use monitor_address when using IPv6"
Guillaume Abrioux [Wed, 24 Jan 2018 13:06:47 +0000 (14:06 +0100)]
Revert "monitor_interface: document need to use monitor_address when using IPv6"

This reverts commit 10b91661ceef7992354032030c7c2673a90d40f4.

This reverts also the same comment added in
1359869497a44df0c3b4157f41453b84326b58e7

7 years agoconfig: add host-specific ceph_conf_overrides evaluation and generation.
Eduard Egorov [Thu, 9 Nov 2017 11:49:00 +0000 (11:49 +0000)]
config: add host-specific ceph_conf_overrides evaluation and generation.

This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157.

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
7 years agoupgrade: skip luminous tasks for jewel minor update
Guillaume Abrioux [Thu, 25 Jan 2018 15:57:45 +0000 (16:57 +0100)]
upgrade: skip luminous tasks for jewel minor update

These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodefaults: avoid getting stuck (ceph --connect-timeout)
Guillaume Abrioux [Wed, 24 Jan 2018 17:49:41 +0000 (18:49 +0100)]
defaults: avoid getting stuck (ceph --connect-timeout)

Sometime the playbook gets stuck because even with `--connect-timeout=`
option, the connexion to the existing ceph cluster never timeout.

As a workaround, using `timeout` command provided by coreutils will
actually timeout if we can't connect to the cluster.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537003
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodocs for creating encrypted OSDs with the lvm scenario
Andrew Schoen [Mon, 22 Jan 2018 16:53:40 +0000 (10:53 -0600)]
docs for creating encrypted OSDs with the lvm scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-osd: adds dmcrypt to the lvm scenario
Andrew Schoen [Fri, 19 Jan 2018 15:44:59 +0000 (09:44 -0600)]
ceph-osd: adds dmcrypt to the lvm scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-volume: adds a dmcrypt param to the ceph_volume module
Andrew Schoen [Fri, 19 Jan 2018 15:43:48 +0000 (09:43 -0600)]
ceph-volume: adds a dmcrypt param to the ceph_volume module

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoansible: set ssh retry option to 5
Guillaume Abrioux [Tue, 23 Jan 2018 13:38:35 +0000 (14:38 +0100)]
ansible: set ssh retry option to 5

We noticed that sometime, ceph-ansible can fail with error :

`Failed to connect to the host via ssh:`

It can occurs after the task `restart firewalld` has been played.

Setting `retries` to 5 should prevent from unexcepted ssh failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: change default value for `dedicated_devices`
Guillaume Abrioux [Mon, 22 Jan 2018 13:28:15 +0000 (14:28 +0100)]
osds: change default value for `dedicated_devices`

This is to keep backward compatibility with stable-2.2 and satisfy the
check "verify dedicated devices have been provided" in
`check_mandatory_vars.yml`. This check is looking for
`dedicated_devices` so we need to default it's value to
`raw_journal_devices` when `raw_multi_journal` is set to `True`.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1536098
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: remove crush_device_class from lvm tests
Andrew Schoen [Thu, 18 Jan 2018 13:57:45 +0000 (07:57 -0600)]
tests: remove crush_device_class from lvm tests

The --crush-device-class flag for ceph-volume is not available in luminous so lets
remove this testing option for now until it's more widely available.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agorgw: disable legacy unit
Sébastien Han [Thu, 18 Jan 2018 09:06:34 +0000 (10:06 +0100)]
rgw: disable legacy unit

Some systems that were deployed with old tools can leave units named
"ceph-radosgw@radosgw.gateway.service". As a consequence, they will
prevent the new unit to start.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1509584
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling update: add mgr exception for jewel minor updates
Sébastien Han [Wed, 17 Jan 2018 14:18:11 +0000 (15:18 +0100)]
rolling update: add mgr exception for jewel minor updates

When update from a minor Jewel version to another, the playbook will
fail on the task "fail if no mgr host is present in the inventory".
This now can be worked around by running Ansible with_items

-e jewel_minor_update=true

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1535382
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-container: use lsblk to resolv parent device
Guillaume Abrioux [Wed, 17 Jan 2018 08:08:16 +0000 (09:08 +0100)]
purge-container: use lsblk to resolv parent device

Using `lsblk` to resolv the parent device is better than just removing the last
char when passing it to the zap container.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge-container: remove awk usage in favor of blkid
Guillaume Abrioux [Wed, 17 Jan 2018 08:06:43 +0000 (09:06 +0100)]
purge-container: remove awk usage in favor of blkid

Avoid using `awk` to get the different devices from the partlabel.
Using `blkid` is more readable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodocs for the crush_device_class option of lvm_volumes
Andrew Schoen [Fri, 12 Jan 2018 14:46:30 +0000 (08:46 -0600)]
docs for the crush_device_class option of lvm_volumes

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: adds crush_device_class to lvm tests
Andrew Schoen [Thu, 11 Jan 2018 17:00:23 +0000 (11:00 -0600)]
tests: adds crush_device_class to lvm tests

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-osd: adds the crush_device_class param to the lvm scenario
Andrew Schoen [Thu, 11 Jan 2018 16:59:01 +0000 (10:59 -0600)]
ceph-osd: adds the crush_device_class param to the lvm scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: adds the crush_device_class param
Andrew Schoen [Thu, 11 Jan 2018 16:56:39 +0000 (10:56 -0600)]
ceph_volume: adds the crush_device_class param

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoMakefile: handle "beta" Git tags
Ken Dreyer [Thu, 11 Jan 2018 17:06:21 +0000 (10:06 -0700)]
Makefile: handle "beta" Git tags

With this change, "make srpm" will generate an RPM with "beta" in the
Release value.

For example, "v3.2.0beta1" will create
"ceph-ansible-3.2.0-0.beta1.1.el7.src.rpm"

7 years agocrush: create rack type buckets and build crush tree according to {{ osd_crush_locati...
Eduard Egorov [Thu, 16 Nov 2017 14:26:27 +0000 (14:26 +0000)]
crush: create rack type buckets and build crush tree according to {{ osd_crush_location }}.

Currently, we can define crush location for each host but only crush roots and crush rules are created. This commit automates other routines for a complete solution:
  1) Creates rack type crush buckets defined in {{ ceph_crush_rack }} of each osd host. If it's not defined by user then a rack named 'default_rack_{{ ceph_crush_root  }}' would be added and used in next steps.
  2) Move rack type crush buckets defined in {{ ceph_crush_rack }} into crush roots defined in {{ ceph_crush_root }} of each osd host.
  3) Move hosts defined in {{ ceph_crush_rack }} into crush roots defined in {{ ceph_crush_root }} of each osd host.

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
7 years agoosd: skip devices marked as '/dev/dead'
Sébastien Han [Tue, 19 Dec 2017 17:54:19 +0000 (18:54 +0100)]
osd: skip devices marked as '/dev/dead'

On a non-collocated scenario, if a drive is faulty we can't really
remove it from the list of 'devices' without messing up or having to
re-arrange the order of the 'dedicated_devices'. We want to keep this
device list ordered. This will prevent the activation failing on a
device that we know is failing but we can't remove it yet to not mess up
the dedicated_devices mapping with devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: test on ansible 2.4.2
Sébastien Han [Thu, 21 Dec 2017 18:57:01 +0000 (19:57 +0100)]
ci: test on ansible 2.4.2

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainer: trigger handlers on systemd file change
Guillaume Abrioux [Mon, 8 Jan 2018 14:00:32 +0000 (15:00 +0100)]
container: trigger handlers on systemd file change

When a systemd unit file is changed we should trigger handlers to
restart the services.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agohandlers: avoid duplicate handler
Guillaume Abrioux [Mon, 8 Jan 2018 09:00:25 +0000 (10:00 +0100)]
handlers: avoid duplicate handler

Having handlers in both ceph-defaults and ceph-docker-common roles can make the
playbook restarting two times services. Handlers can be triggered first
time because of a change in ceph.conf and a second time because a new
image has been pulled.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontainer: restart container when there is a new image
Sébastien Han [Fri, 15 Dec 2017 18:43:23 +0000 (19:43 +0100)]
container: restart container when there is a new image

This wasn't any good choice to implement this.
We had several options and none of them were ideal since handlers can
not be triggered cross-roles.
We could have achieved that by doing:

* option 1 was to add a dependancy in the meta of the ceph-docker-common
role. We had that long ago and we decided to stop so everything is
managed via site.yml

* option 2 was to import files from another role. This is messy and we
don't that anywhere in the current code base. We will continue to do so.

There is option 3 where we pull the image from the ceph-config role.
This is not suitable as well since the docker command won't be available
unless you run Atomic distro. This would also mean that you're trying to
pull twice. First time in ceph-config, second time in ceph-docker-common

The only option I came up with was to duplicate a bit of the ceph-config
handlers code.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainers: fix bug when looking for existing cluster
Guillaume Abrioux [Wed, 10 Jan 2018 09:18:27 +0000 (10:18 +0100)]
containers: fix bug when looking for existing cluster

When containerized deployment, `docker_exec_cmd` is not set before the
task which try to retrieve the current fsid is played, it means it
considers there is no existing fsid and try to generate a new one.

Typical error:

```
ok: [mon0 -> mon0] => {
    "changed": false,
    "cmd": [
        "ceph",
        "--connect-timeout",
        "3",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:00:00.179909",
    "end": "2018-01-09 10:36:58.759846",
    "failed": false,
    "failed_when_result": false,
    "rc": 1,
    "start": "2018-01-09 10:36:58.579937"
}

STDERR:

Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',)
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontainer: change the way we force no logs inside the container
Sébastien Han [Tue, 9 Jan 2018 13:34:09 +0000 (14:34 +0100)]
container: change the way we force no logs inside the container

Previously we were using ceph_conf_overrides however this doesn't play
nice for softwares like TripleO that uses ceph_conf_overrides inside its
own code. For now, and since this is the only occurence of this, we can
ensure no logs through the ceph conf template.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1532619
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: rename check_socket files for containers
Guillaume Abrioux [Wed, 10 Jan 2018 08:08:01 +0000 (09:08 +0100)]
defaults: rename check_socket files for containers

When containerized deployment, we are not looking for a socket but for a
running container.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomon: use crush rules for non-container too
Sébastien Han [Tue, 9 Jan 2018 12:54:50 +0000 (13:54 +0100)]
mon: use crush rules for non-container too

There is no reasons why we can't use crush rules when deploying
containers. So moving the inlcude in the main.yml so it can be called.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocontainers: bump memory limit beta-3.1.0 v3.1.0beta2 v3.1.0rc1
Sébastien Han [Mon, 8 Jan 2018 15:41:42 +0000 (16:41 +0100)]
containers: bump memory limit

A default value of 4GB for MDS is more appropriate and 3GB for OSD also.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1531607
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: set UPDATE_CEPH_DOCKER_IMAGE_TAG for jewel tests
Andrew Schoen [Fri, 5 Jan 2018 19:47:10 +0000 (13:47 -0600)]
test: set UPDATE_CEPH_DOCKER_IMAGE_TAG for jewel tests

We want to be explict here and update to luminous and not
the 'latest' tag.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoswitch-to-containers: do not fail when stopping the nfs-ganesha service
Andrew Schoen [Fri, 5 Jan 2018 18:42:16 +0000 (12:42 -0600)]
switch-to-containers: do not fail when stopping the nfs-ganesha service

If we're working with a jewel cluster then this service will not exist.

This is mainly a problem with CI testing because our tests are setup to
work with both jewel and luminous, meaning that eventhough we want to
test jewel we still have a nfs-ganesha host in the test causing these
tasks to run.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoswitch-to-containers: do not fail when stopping the ceph-mgr daemon
Andrew Schoen [Fri, 5 Jan 2018 18:37:36 +0000 (12:37 -0600)]
switch-to-containers: do not fail when stopping the ceph-mgr daemon

If we are working with a jewel cluster ceph mgr does not exist
and this makes the playbook fail.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agorolling_update: do not fail the playbook if nfs-ganesha is not present
Andrew Schoen [Fri, 5 Jan 2018 16:06:53 +0000 (10:06 -0600)]
rolling_update: do not fail the playbook if nfs-ganesha is not present

The rolling update playbook was attempting to stop the
nfs-ganesha service on nodes where jewel is still installed.
The nfs-ganesha service did not exist in jewel so the task fails.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agodoc: corrected a typo
Aviolat Romain [Fri, 29 Dec 2017 23:17:31 +0000 (00:17 +0100)]
doc: corrected a typo

7 years agomon: always run ceph-create-keys
Sébastien Han [Wed, 20 Dec 2017 14:29:02 +0000 (15:29 +0100)]
mon: always run ceph-create-keys

ceph-create-keys is idempotent so it's not an issue to run it each time
we play ansible. This also fix issues where the 'creates' arg skips the
task and no keys get generated on newer version, e.g during an upgrade.

Closes: https://github.com/ceph/ceph-ansible/issues/2228
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: disable legacy rgw service unit
Sébastien Han [Thu, 21 Dec 2017 09:19:22 +0000 (10:19 +0100)]
rgw: disable legacy rgw service unit

When upgrading from OSP11 to OSP12 container, ceph-ansible attempts to
disable the RGW service provided by the overcloud image. The task
attempts to stop/disable ceph-rgw@{{ ansible-hostname }} and
ceph-radosgw@{{ ansible-hostname }}.service. The actual service name is
ceph-radosgw@radosgw.$name

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525209
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: fix check gpt
Guillaume Abrioux [Tue, 19 Dec 2017 09:55:02 +0000 (10:55 +0100)]
osd: fix check gpt

the gpt label creation doesn't work even with parted module.
This commit fixes the gpt label creation by using parted command
instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge-cluster: clean some code
Guillaume Abrioux [Wed, 13 Dec 2017 14:23:47 +0000 (15:23 +0100)]
purge-cluster: clean some code

Avoid using regexp to match device

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge-cluster: wipe disk using dd
Guillaume Abrioux [Wed, 13 Dec 2017 14:24:33 +0000 (15:24 +0100)]
purge-cluster: wipe disk using dd

`bluestore_purge_osd_non_container` scenario is failing because it
keeps old osd_uuid information on devices and cause the `ceph-disk activate`
to fail when trying to redeploy a new cluster after a purge.

typical error seen :

```
2017-12-13 14:29:48.021288 7f6620651d00 -1
bluestore(/var/lib/ceph/tmp/mnt.2_3gh6/block) _check_or_set_bdev_label
bdev /var/lib/ceph/tmp/mnt.2_3gh6/block fsid
770080e2-20db-450f-bc17-81b55f167982 does not match our fsid
f33efff0-2f07-4203-ad8d-8a0844d6bda0
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agofix jewel scenarios on container
Sébastien Han [Wed, 20 Dec 2017 12:39:33 +0000 (13:39 +0100)]
fix jewel scenarios on container

When deploying Jewel from master we still need to enable this code since
the container image has such check. This check still exists because
ceph-disk is not able to create a GPT label on a drive that does not
have one.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosite-docker: ability to disable fact sharing
Sébastien Han [Tue, 19 Dec 2017 14:10:05 +0000 (15:10 +0100)]
site-docker: ability to disable fact sharing

When deploying with Ansible at large scale, the delegate_facts method
consumes a lot of memory on the host that is running Ansible. This can
cause various issues like memory exhaustion on that machine.
You can now run Ansible with "-e delegate_facts_host=False" to disable
the fact sharing.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: best effort if no device is found during activation
Sébastien Han [Mon, 18 Dec 2017 15:43:37 +0000 (16:43 +0100)]
osd: best effort if no device is found during activation

We have a scenario when we switch from non-container to containers. This
means we don't know anything about the ceph partitions associated to an
OSD. Normally in a containerized context we have files containing the
preparation sequence. From these files we can get the capabilities of
each OSD. As a last resort we use a ceph-disk call inside a dummy bash
container to discover the ceph journal on the current osd.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1525612
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: do not require root to answer question
Sébastien Han [Fri, 15 Dec 2017 16:39:32 +0000 (17:39 +0100)]
rolling_update: do not require root to answer question

There is no need to ask for root on the local action. This will prompt
for a password the current user is not part of sudoers. That's
  unnecessary anyways.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1516947
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agonfs: fix package install for debian/suss systems
Sébastien Han [Tue, 19 Dec 2017 10:17:04 +0000 (11:17 +0100)]
nfs: fix package install for debian/suss systems

This resolves the following error:
E: There were unauthenticated packages and -y was used without
--allow-unauthenticated

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRename fact docker_version to ceph_docker_version
Christian Berendt [Tue, 12 Dec 2017 10:06:15 +0000 (11:06 +0100)]
Rename fact docker_version to ceph_docker_version

The name docker_version is very generic and is also used by other
roles. As a result, there may be name conflicts. To avoid this a
ceph_ prefix should be used for this fact. Since it is an internal
fact renaming is not a problem.

7 years agoroles: ceph-mgr: Install the ceph-mgr package on SUSE
Markos Chandras [Thu, 14 Dec 2017 18:13:09 +0000 (18:13 +0000)]
roles: ceph-mgr: Install the ceph-mgr package on SUSE

The ceph-mgr package name is identical to RedHat so add the SUSE family
to the existing task.

7 years agocontrib: do not skip ci on backport
Sébastien Han [Thu, 14 Dec 2017 16:23:02 +0000 (17:23 +0100)]
contrib: do not skip ci on backport

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: don't make `osd_pool_default_pg_num` mandatory
Guillaume Abrioux [Tue, 12 Dec 2017 10:28:36 +0000 (11:28 +0100)]
client: don't make `osd_pool_default_pg_num` mandatory

making `osd_pool_default_pg_num` mandatory is a bit agressive and is
unrelated when you just want to create users keyrings.

Closes: #2241
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: don't try to generate keys
Guillaume Abrioux [Tue, 12 Dec 2017 10:25:26 +0000 (11:25 +0100)]
client: don't try to generate keys

the entrypoint to generate users keyring is `ceph-authtool`, therefore,
it can expand the `$(ceph-authtool --gen-print-key)` inside the
container. Users must generate a keyring themselves.
This commit also adds a check to ensure keyring are properly filled when
`user_config: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodocker: add missing condition for selinux tasks
Guillaume Abrioux [Tue, 12 Dec 2017 13:55:02 +0000 (14:55 +0100)]
docker: add missing condition for selinux tasks

on `client` and `mds` roles, it tries to set selinux even on non rhel
based distributions.`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agodefault: look for the right return code on socket stat in-use
Sébastien Han [Thu, 14 Dec 2017 10:31:28 +0000 (11:31 +0100)]
default: look for the right return code on socket stat in-use

As reported in https://github.com/ceph/ceph-ansible/issues/2254, the
check with fuser is not ideal. If fuser is not available the return code
is 127. Here we want to make sure that we looking for the correct return
code, so 1.

Closes: https://github.com/ceph/ceph-ansible/issues/2254
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdd flags for OSD 'docker run --cpuset-{cpus,mems}'
John Fulton [Mon, 11 Dec 2017 21:17:22 +0000 (16:17 -0500)]
Add flags for OSD 'docker run --cpuset-{cpus,mems}'

Add the variables ceph_osd_docker_cpuset_cpus and
ceph_osd_docker_cpuset_mems, so that a user may specify
the CPUs and memory nodes of NUMA systems on which OSD
containers are run.

Provides a example in osds.yaml.sample to guide user
based on sample `lscpu` output since cpuset-mems refers
to the memory by NUMA node only while cpuset-cpus can
refer to individual vCPUs within a NUMA node.

7 years agofirewall: add mds, nfs, restapi and iscsi ports, remove 'configure_firewall' variable...
Eduard Egorov [Mon, 20 Nov 2017 14:11:38 +0000 (14:11 +0000)]
firewall: add mds, nfs, restapi and iscsi ports, remove 'configure_firewall' variable used for conditional execution. Include the task only on rpm-based systems.

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
7 years agofirewall: configure firewalld if it's already installed on the host (#2192).
Eduard Egorov [Fri, 17 Nov 2017 12:32:48 +0000 (12:32 +0000)]
firewall: configure firewalld if it's already installed on the host (#2192).

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>
7 years agoRevert "tests: set CEPH_STABLE_RELEASE in ceph-build"
Guillaume Abrioux [Wed, 6 Dec 2017 14:18:42 +0000 (15:18 +0100)]
Revert "tests: set CEPH_STABLE_RELEASE in ceph-build"

This reverts commit 7a1d7d92ff4d6f38be9f11f4c26909b361b58f99.

7 years agoConvert interface names to underscores for facts
Major Hayden [Mon, 11 Dec 2017 15:56:56 +0000 (09:56 -0600)]
Convert interface names to underscores for facts

If a deployer uses an interface name with a dash/hyphen in it, such
as 'br-storage' for the monitor_interface group_var, the ceph.conf.j2
template fails to find the right facts. It looks for
'ansible_br-storage' but only 'ansible_br_storage' exists.

This patch converts the interface name to underscores when the
template does the fact lookup.

7 years agoceph-osd: respect nvme partitions when device is a disk.
Konstantin Shalygin [Tue, 28 Nov 2017 14:27:09 +0000 (21:27 +0700)]
ceph-osd: respect nvme partitions when device is a disk.

7 years agodefaults: fix CI issue with ceph_uid fact
Guillaume Abrioux [Mon, 11 Dec 2017 17:48:13 +0000 (18:48 +0100)]
defaults: fix CI issue with ceph_uid fact

The CI complains because of `ceph_uid` fact which doesn't exist since
the docker image tag used in the CI doesn't match with this condition.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-osd: adds osd_objectstore to the name when using the ceph_volume module
Andrew Schoen [Fri, 1 Dec 2017 19:37:44 +0000 (13:37 -0600)]
ceph-osd: adds osd_objectstore to the name when using the ceph_volume module

This allows for easier debugging if verbosity is not set high enough.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests for the ceph_volume module
Andrew Schoen [Fri, 1 Dec 2017 19:27:13 +0000 (13:27 -0600)]
tests for the ceph_volume module

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agorefactor ceph_volume.py so it's easier to test
Andrew Schoen [Fri, 1 Dec 2017 19:26:36 +0000 (13:26 -0600)]
refactor ceph_volume.py so it's easier to test

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-osd: use the cluster param with the ceph_volume module
Andrew Schoen [Fri, 1 Dec 2017 14:41:40 +0000 (08:41 -0600)]
ceph-osd: use the cluster param with the ceph_volume module

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: adds the cluster param
Andrew Schoen [Fri, 1 Dec 2017 14:40:33 +0000 (08:40 -0600)]
ceph_volume: adds the cluster param

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph-osd: use the new ceph_volume module for the lvm scenario
Andrew Schoen [Fri, 1 Dec 2017 13:33:16 +0000 (07:33 -0600)]
ceph-osd: use the new ceph_volume module for the lvm scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoadds a ceph_volume module
Andrew Schoen [Fri, 1 Dec 2017 13:25:13 +0000 (07:25 -0600)]
adds a ceph_volume module

This module uses ceph-volume to create OSDs. Currently
it only supports the 'lvm' subcommand and 'create'.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoVagrantfile: Fixed repeated OSD controller creation
Prisacari Dmitrii [Fri, 8 Dec 2017 17:09:50 +0000 (19:09 +0200)]
Vagrantfile: Fixed repeated OSD controller creation

7 years agoMerge pull request #2226 from andymcc/gpt_mklabel
Sébastien Han [Mon, 11 Dec 2017 09:12:46 +0000 (03:12 -0600)]
Merge pull request #2226 from andymcc/gpt_mklabel

Skip mklabel gpt if already gpt

7 years agoUse parted module instead of command 2226/head
Andy McCrae [Thu, 30 Nov 2017 17:46:55 +0000 (17:46 +0000)]
Use parted module instead of command

7 years agoMerge pull request #2211 from fultonj/admin-key-perms
Guillaume Abrioux [Thu, 7 Dec 2017 07:43:02 +0000 (08:43 +0100)]
Merge pull request #2211 from fultonj/admin-key-perms

Set tighter permissions on keyrings when containerized

7 years agoSet tighter permissions on keyrings when containerized 2211/head
John Fulton [Wed, 22 Nov 2017 21:38:30 +0000 (16:38 -0500)]
Set tighter permissions on keyrings when containerized

During a containerized deployment, set the permissions
of ceph.client.admin.keyring and other keyrings to
chmod 600 and chown it to ceph.

7 years agoMerge pull request #2221 from ceph/fix_purge_waitfor
Sébastien Han [Wed, 29 Nov 2017 13:56:54 +0000 (14:56 +0100)]
Merge pull request #2221 from ceph/fix_purge_waitfor

purge: fix bug on 'wait_for' task

7 years agopurge: fix bug on 'wait_for' task 2221/head
Guillaume Abrioux [Wed, 29 Nov 2017 10:10:56 +0000 (11:10 +0100)]
purge: fix bug on 'wait_for' task

this task hangs because `{{ inventory_hostname }}` doesn't resolv to an
actual ip address.
Using `hostvars[inventory_hostname]['ansible_default_ipv4']['address']`
should fix this because it will reach the node with its actual IP
address.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMerge pull request #2215 from squidboylan/support_loopback_devices
Guillaume Abrioux [Tue, 28 Nov 2017 13:04:47 +0000 (14:04 +0100)]
Merge pull request #2215 from squidboylan/support_loopback_devices

Add support for using loopback devices as OSDs

7 years agoMerge pull request #2214 from ceph/bz-1510555
Sébastien Han [Tue, 28 Nov 2017 11:22:50 +0000 (12:22 +0100)]
Merge pull request #2214 from ceph/bz-1510555

handlers: restart daemons only if docker is running

7 years agoMerge pull request #2202 from ceph/remove_leftover
Sébastien Han [Tue, 28 Nov 2017 11:21:13 +0000 (12:21 +0100)]
Merge pull request #2202 from ceph/remove_leftover

osd: remove leftover and fix a typo

7 years agoMerge pull request #2212 from wintamute/master
Guillaume Abrioux [Tue, 28 Nov 2017 11:14:09 +0000 (12:14 +0100)]
Merge pull request #2212 from wintamute/master

Group_vars mon sample - replaced hardcoded pool names for Openstack

7 years agoOpenstack: replaced hardcoded pool names with variables for openstack (nova) user 2212/head
wintamute [Mon, 27 Nov 2017 10:21:05 +0000 (11:21 +0100)]
Openstack: replaced hardcoded pool names with variables for openstack (nova) user

(cherry picked from commit 2bf48f1)

7 years agoAdd support for using loopback devices as OSDs 2215/head
Caleb Boylan [Tue, 28 Nov 2017 00:02:36 +0000 (16:02 -0800)]
Add support for using loopback devices as OSDs

This is particularly useful in CI environments where you dont have
the option of adding extra devices or volumes to the host. It is also
a simple change to support loopback devices

7 years agohandlers: restart daemons only if docker is running 2214/head
Guillaume Abrioux [Mon, 27 Nov 2017 13:59:30 +0000 (14:59 +0100)]
handlers: restart daemons only if docker is running

In case where docker CLI is available but docker is not running, we
don't want to trigger the restart of the daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMerge pull request #2177 from jprovaznik/rados
Sébastien Han [Thu, 23 Nov 2017 09:36:58 +0000 (10:36 +0100)]
Merge pull request #2177 from jprovaznik/rados

Allow to use rados for ganesha exports

7 years agoMerge pull request #2207 from ceph/mds-debian
Sébastien Han [Wed, 22 Nov 2017 16:45:29 +0000 (17:45 +0100)]
Merge pull request #2207 from ceph/mds-debian

common: install ceph-common on all the machines

7 years agocommon: install ceph-common on all the machines 2207/head
Sébastien Han [Wed, 22 Nov 2017 16:11:50 +0000 (17:11 +0100)]
common: install ceph-common on all the machines

Since some daemons now install their own packages the task checking the
ceph version fails on Debian systems. So the 'ceph-common' package must
be installed on all the machines.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAllow to use rados for ganesha exports 2177/head
Jan Provaznik [Wed, 15 Nov 2017 11:59:36 +0000 (12:59 +0100)]
Allow to use rados for ganesha exports

7 years agoMerge pull request #2185 from ceph/fix_purge-cluster
Guillaume Abrioux [Tue, 21 Nov 2017 12:30:15 +0000 (13:30 +0100)]
Merge pull request #2185 from ceph/fix_purge-cluster

purge-cluster: remove usage of `with_fileglob`

7 years agoosd: remove leftover and fix a typo 2202/head
Guillaume Abrioux [Tue, 21 Nov 2017 10:11:34 +0000 (11:11 +0100)]
osd: remove leftover and fix a typo

This task was originally needed to fix a docker installation issue
(see: #1030). This has been fixed, therefore it can be removed.

Fixes: #2199
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge-cluster: remove usage of `with_fileglob` 2185/head
Guillaume Abrioux [Thu, 16 Nov 2017 10:49:18 +0000 (11:49 +0100)]
purge-cluster: remove usage of `with_fileglob`

`with_fileglob` loops over files on the machine where ansible-playbook
is being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMerge pull request #2197 from ceph/fix_gpt_prepare
Sébastien Han [Mon, 20 Nov 2017 09:40:18 +0000 (10:40 +0100)]
Merge pull request #2197 from ceph/fix_gpt_prepare

osd: ensure a gpt label is set on device

7 years agoosd: ensure a gpt label is set on device 2197/head
Guillaume Abrioux [Fri, 17 Nov 2017 16:32:23 +0000 (17:32 +0100)]
osd: ensure a gpt label is set on device

ceph-disk prepare will fail on jewel if a GPT label is not present on
device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMerge pull request #2189 from fultonj/empty-acl
Guillaume Abrioux [Thu, 16 Nov 2017 18:39:01 +0000 (19:39 +0100)]
Merge pull request #2189 from fultonj/empty-acl

Make openstack_keys param support no acls list

7 years agoMerge pull request #2184 from ceph/fix_wildcard_remove
Guillaume Abrioux [Thu, 16 Nov 2017 17:02:35 +0000 (18:02 +0100)]
Merge pull request #2184 from ceph/fix_wildcard_remove

purge-docker: remove osd disk prepare logs

7 years agoMake openstack_keys param support no acls list 2189/head
John Fulton [Thu, 16 Nov 2017 16:29:59 +0000 (11:29 -0500)]
Make openstack_keys param support no acls list

A recent change [1] required that the openstack_keys
param always containe an acls list. However, it's
possible it might not contain that list. Thus, this
param sets a default for that list to be empty if it
is not in the structure as defined by the user.

[1] d65cbaa53952269ec9a2e76fca8203ce7ad22c2b

7 years agoMerge pull request #2182 from ceph/fix_reboot_rbd
Sébastien Han [Thu, 16 Nov 2017 15:55:39 +0000 (16:55 +0100)]
Merge pull request #2182 from ceph/fix_reboot_rbd

rbd: enable ceph-rbd-mirror.target on releases prior to luminous

7 years agoMerge pull request #2186 from ceph/dmcrypt-fixfg
Guillaume Abrioux [Thu, 16 Nov 2017 15:52:13 +0000 (16:52 +0100)]
Merge pull request #2186 from ceph/dmcrypt-fixfg

osd: multiple fixes

7 years agoosd: remove leftover from osd partition 2186/head
Sébastien Han [Thu, 16 Nov 2017 13:58:40 +0000 (14:58 +0100)]
osd: remove leftover from osd partition

We used to support osds that are a partition. This is long gone so
removing this task.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: remove failed_when on activation
Sébastien Han [Thu, 16 Nov 2017 13:57:49 +0000 (14:57 +0100)]
osd: remove failed_when on activation

There is no need to continue if the activation fails.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: fix bad activation for dmcrypt
Sébastien Han [Thu, 16 Nov 2017 13:55:08 +0000 (14:55 +0100)]
osd: fix bad activation for dmcrypt

We were activating dmcrypt devices with the wrong command. Basically the
first task execute the wrong activate command. The task fails but
continues because of the 'failed_when: false'. Then the right activation
sequence is being done by the next task.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMerge pull request #2151 from hwoarang/add-opensuse
Sébastien Han [Thu, 16 Nov 2017 13:35:28 +0000 (14:35 +0100)]
Merge pull request #2151 from hwoarang/add-opensuse

Add openSUSE Leap 42.3 support

7 years agopurge-docker: remove osd disk prepare logs 2184/head
Guillaume Abrioux [Thu, 16 Nov 2017 10:36:17 +0000 (11:36 +0100)]
purge-docker: remove osd disk prepare logs

`with_fileglob` loops over files on the machine that runs the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>