]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agorhcs: re-add apt-pining
Sébastien Han [Tue, 17 Apr 2018 13:59:52 +0000 (15:59 +0200)]
rhcs: re-add apt-pining

When installing rhcs on Debian systems the red hat repos must have the
highest priority so we avoid packages conflicts and install the rhcs
version.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1565850
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: check only 1 time if there is a running cluster
Guillaume Abrioux [Mon, 9 Apr 2018 16:07:31 +0000 (18:07 +0200)]
defaults: check only 1 time if there is a running cluster

There is no need to check for a running cluster n*nodes time in
`ceph-defaults` so let's add a `run_once: true` to save some resources
and time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agosite: make it more readable
Guillaume Abrioux [Tue, 10 Apr 2018 13:30:16 +0000 (15:30 +0200)]
site: make it more readable

These conditions introduced by d981c6bd2 were insane.
This should be a bit easier to read.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosd: do not do anything if the dev has a partition
Sébastien Han [Fri, 13 Apr 2018 14:36:43 +0000 (16:36 +0200)]
osd: do not do anything if the dev has a partition

Regardless if the partition is 'ceph' or something else, we don't want
to be as strick as checking for a particular partition.
If the drive has a partition, we just don't do anything.

This solves the case where the server reboots, disks get a different
/dev/sda (node) allocation. In this case, prior to restarting the server
/dev/sda was an OSD, but now it's /dev/sdb and the other way around.
In such scenario, we will try to prepare the OSD and create a new
partition, so let's not mess around with devices that have partitions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: update tests for mds to cover multimds case
Guillaume Abrioux [Thu, 12 Apr 2018 07:55:25 +0000 (09:55 +0200)]
tests: update tests for mds to cover multimds case

in case of multimds we must check for the number of mds up instead of
just checking if the hostname of the node is in the fsmap.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocommon: add tools repo for iscsi gw
Sébastien Han [Thu, 12 Apr 2018 10:15:35 +0000 (12:15 +0200)]
common: add tools repo for iscsi gw

To install iscsi gw packages we need to enable the tools repo.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1547849
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRemove deprecated allow_multimds
Douglas Fuller [Wed, 4 Apr 2018 18:23:25 +0000 (14:23 -0400)]
Remove deprecated allow_multimds

allow_multimds will be officially deprecated in Mimic, specify it
only for all versions of Ceph where it was declared stable. Going
forward, specify only max_mds.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
7 years agoFixed a typo (extra space) v3.1.0beta6
vasishta p shastry [Tue, 10 Apr 2018 13:37:35 +0000 (19:07 +0530)]
Fixed a typo (extra space)

7 years agoosd: to support copy_admin_key
vasishta p shastry [Tue, 10 Apr 2018 13:21:50 +0000 (18:51 +0530)]
osd: to support copy_admin_key

7 years agomds: to support copy_admin_keyring
vasishta p shastry [Tue, 10 Apr 2018 12:39:43 +0000 (18:09 +0530)]
mds: to support copy_admin_keyring

7 years agonfs: to support copy_admin_key - containerized
vasishta p shastry [Tue, 10 Apr 2018 12:37:11 +0000 (18:07 +0530)]
nfs: to support copy_admin_key - containerized

7 years agonfs: ensure nfs-server server is stopped
Ali Maredia [Mon, 2 Apr 2018 17:47:31 +0000 (13:47 -0400)]
nfs: ensure nfs-server server is stopped

NFS-ganesha cannot start is the nfs-server service
is running. This commit stops nfs-server in case it
is running on a (debian, redhat, suse) node before
the nfs-ganesha service starts up

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506

Signed-off-by: Ali Maredia <amaredia@redhat.com>
7 years agoceph-nfs: allow disabling ganesha caching
Ramana Raja [Mon, 9 Apr 2018 12:03:33 +0000 (17:33 +0530)]
ceph-nfs: allow disabling ganesha caching

Add a variable, ceph_nfs_disable_caching, that if set to true
disables ganesha's directory and attribute caching as much as
possible.

Also, disable caching done by ganesha, when 'nfs_file_gw'
variable is true, i.e., when Ganesha is used as CephFS's gateway.
This is the recommended Ganesha setting as libcephfs already caches
information. And doing so helps avoid cache incoherency issues
especially with clustered ganesha over CephFS.

Fixes: https://tracker.ceph.com/issues/23393
Signed-off-by: Ramana Raja <rraja@redhat.com>
7 years agoceph-defaults: bring backward compatibility for old syntax
Sébastien Han [Tue, 10 Apr 2018 13:39:44 +0000 (15:39 +0200)]
ceph-defaults: bring backward compatibility for old syntax

If people keep on using the mon_cap, osd_cap etc the playbook will
translate this old syntax on the flight.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: fix tripleO scenario
Sébastien Han [Mon, 9 Apr 2018 22:33:33 +0000 (00:33 +0200)]
ci: fix tripleO scenario

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: client copy admin key
Sébastien Han [Thu, 5 Apr 2018 16:52:23 +0000 (18:52 +0200)]
ci: client copy admin key

If we don't copy the admin key we can't add the key into ceph.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: remove useless tests
Sébastien Han [Wed, 4 Apr 2018 14:31:04 +0000 (16:31 +0200)]
ci: remove useless tests

These are already handled by ceph-client/defaults/main.yml so the keys
will be created once user_config is set to True.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph_key: use ceph_key in the playbook
Sébastien Han [Wed, 4 Apr 2018 14:22:36 +0000 (16:22 +0200)]
ceph_key: use ceph_key in the playbook

Replaced all the occurence of raw command using the 'command' module
with the ceph_key module instead.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoinfra: add playbook example for ceph_key module
Sébastien Han [Fri, 30 Mar 2018 14:56:44 +0000 (16:56 +0200)]
infra: add playbook example for ceph_key module

Helper playbook to manage CephX keys.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadd ceph_key module
Sébastien Han [Sun, 18 Mar 2018 14:53:45 +0000 (15:53 +0100)]
add ceph_key module

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph_volume: objectstore should default to 'bluestore'
Andrew Schoen [Thu, 5 Apr 2018 14:12:32 +0000 (09:12 -0500)]
ceph_volume: objectstore should default to 'bluestore'

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: refactor to not run ceph osd destroy
Andrew Schoen [Tue, 3 Apr 2018 16:55:36 +0000 (11:55 -0500)]
ceph_volume: refactor to not run ceph osd destroy

This changes state to action and gives the options 'create'
or 'zap'. The zap parameter is also removed.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: perserve newlines in stdout and stderr when zapping
Andrew Schoen [Wed, 28 Mar 2018 16:10:17 +0000 (11:10 -0500)]
ceph_volume: perserve newlines in stdout and stderr when zapping

Because we have many commands we might need to run the
ANSIBLE_STDOUT_CALLBACK won't format these nicely because we're
not reporting these back at the root level of the json result.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agopurge-cluster: no need to use objectstore for ceph_volume module
Andrew Schoen [Wed, 14 Mar 2018 19:46:37 +0000 (14:46 -0500)]
purge-cluster: no need to use objectstore for ceph_volume module

When zapping objectstore is not required.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: rc should be 0 on successful runs
Andrew Schoen [Wed, 14 Mar 2018 17:26:43 +0000 (12:26 -0500)]
ceph_volume: rc should be 0 on successful runs

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: defines the zap param in module_args
Andrew Schoen [Wed, 14 Mar 2018 17:19:42 +0000 (12:19 -0500)]
ceph_volume: defines the zap param in module_args

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: make state not required so I can provide a default
Andrew Schoen [Wed, 14 Mar 2018 16:49:48 +0000 (11:49 -0500)]
ceph_volume: make state not required so I can provide a default

I want a default value of 'present' for state, so it can not
be made required. Othewise it'll throw a 'Module alias error'
from ansible.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: objectstore is now optional except when state is present
Andrew Schoen [Wed, 14 Mar 2018 16:47:07 +0000 (11:47 -0500)]
ceph_volume: objectstore is now optional except when state is present

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agopurge-cluster: use ceph_volume module to zap and destroy OSDs
Andrew Schoen [Wed, 14 Mar 2018 16:32:19 +0000 (11:32 -0500)]
purge-cluster: use ceph_volume module to zap and destroy OSDs

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agotests: no need to remove partitions in lvm_setup.yml
Andrew Schoen [Mon, 12 Mar 2018 19:06:39 +0000 (14:06 -0500)]
tests: no need to remove partitions in lvm_setup.yml

Now that we are using ceph_volume_zap the partitions are
kept around and should be able to be reused.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: adds a zap property and reworks to support state: absent
Andrew Schoen [Wed, 14 Mar 2018 16:24:40 +0000 (11:24 -0500)]
ceph_volume: adds a zap property and reworks to support state: absent

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: adds a state property
Andrew Schoen [Wed, 14 Mar 2018 15:14:21 +0000 (10:14 -0500)]
ceph_volume: adds a state property

This can be either present or absent.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoceph_volume: remove the subcommand argument
Andrew Schoen [Wed, 14 Mar 2018 14:57:49 +0000 (09:57 -0500)]
ceph_volume: remove the subcommand argument

This really isn't needed currently and I don't believe is a good
mechanism for switching subcommands anwyay. The user of this module
should not have to be familar with all ceph-volume subcommands.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agopurge-docker: added conditionals needed to successfully re-run purge
Randy J. Martinez [Wed, 28 Mar 2018 23:46:54 +0000 (18:46 -0500)]
purge-docker: added conditionals needed to successfully re-run purge

Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run.
Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios.
purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy).

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
7 years agoFixed wrong path of ceph.conf in docs. v3.1.0beta5
JohnHaan [Tue, 10 Apr 2018 00:48:47 +0000 (09:48 +0900)]
Fixed wrong path of ceph.conf in docs.

The path of ceph.conf sample template moved to ceph-config.
Therefore docs needs to be changed to the right directory.

Signed-off-by: JohnHaan <yongiman@gmail.com>
7 years agodefaults: fix backward compatibility
Guillaume Abrioux [Mon, 9 Apr 2018 11:02:44 +0000 (13:02 +0200)]
defaults: fix backward compatibility

backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocommon: upgrade/install ceph-test RPM first
Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]
common: upgrade/install ceph-test RPM first

Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.

The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.

In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.

When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.

7 years agoceph-defaults: fix ceoh_uid for container image tag latest
Sébastien Han [Mon, 9 Apr 2018 08:01:30 +0000 (10:01 +0200)]
ceph-defaults: fix ceoh_uid for container image tag latest

According to our recent change, we now use "CentOS" as a latest
container image. We need to reflect this on the ceph_uid.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotox: use container latest tag for upgrades
Sébastien Han [Thu, 5 Apr 2018 08:28:51 +0000 (10:28 +0200)]
tox: use container latest tag for upgrades

Currently tag-build-master-luminous-ubuntu-16.04 is not used anymore.
Also now, 'latest' points to CentOS so we need to make that switch here
too.

We know have latest tags for each stable release so let's use them and
point tox at them to deploy the right version.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoUse the CentOS repo for Red Hat dev packages
Zack Cerza [Fri, 6 Apr 2018 16:17:48 +0000 (10:17 -0600)]
Use the CentOS repo for Red Hat dev packages

No use even trying to use something that doesn't exist.

Signed-off-by: Zack Cerza <zack@redhat.com>
7 years agosite-docker: followup on #2487
Guillaume Abrioux [Wed, 4 Apr 2018 09:46:51 +0000 (11:46 +0200)]
site-docker: followup on #2487

get a non empty array as default value for `groups.get('clients')`,
otherwise `| first` filter will complain because it can't work with
empty array.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoadd .vscode/ to gitignore
Sébastien Han [Wed, 4 Apr 2018 14:23:54 +0000 (16:23 +0200)]
add .vscode/ to gitignore

I personally dev on vscode and I have some preferences to save when it
comes to running the python unit tests. So escaping this directory is
actually useful.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDeploying without managed monitors failed
Attila Fazekas [Wed, 4 Apr 2018 13:30:55 +0000 (15:30 +0200)]
Deploying without managed monitors failed

Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69ae18317cb0c1cc3e391a0bca5767eb6 .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
7 years agodefaults: remove `run_once: true` when creating fetch_directory
Guillaume Abrioux [Tue, 3 Apr 2018 11:43:53 +0000 (13:43 +0200)]
defaults: remove `run_once: true` when creating fetch_directory

because of `serial: 1`, it can be an issue when the playbook is being
run on client nodes.
Since the refact of `ceph-client` we skip the role `ceph-defaults` on
every node except the first client node, it means that the task is not
going to be played because of `run_once: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoconfig: use fact `ceph_uid`
Guillaume Abrioux [Tue, 3 Apr 2018 11:41:07 +0000 (13:41 +0200)]
config: use fact `ceph_uid`

Use fact `ceph_uid` in the task which ensures `/etc/ceph` exists in
containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclients: refact `ceph-clients` role
Guillaume Abrioux [Fri, 30 Mar 2018 11:48:17 +0000 (13:48 +0200)]
clients: refact `ceph-clients` role

This commit refacts this role so we don't have to pull container image
on client nodes just to create pools and keys.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1550977
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: remove legacy code
Guillaume Abrioux [Fri, 30 Mar 2018 10:50:14 +0000 (12:50 +0200)]
client: remove legacy code

This seems to be a leftover.
This commit removes an unnecessary 'set linux permissions' on
`/var/lib/ceph`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocontainer: play docker-common only on first client node
Guillaume Abrioux [Fri, 30 Mar 2018 10:45:15 +0000 (12:45 +0200)]
container: play docker-common only on first client node

This commit aims to set the default behavior to play
`ceph-docker-common` only on first node in clients group.

Currently, we play docker-common to pull container image so we can run
ceph commands in order to generate keys or create pools.
On a cluster with a large number of client nodes this can be time consuming
to proceed this way. An alternative would be to pull container image
only a first node and then copy keys on other nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomove selinux check to `ceph-defaults`
Guillaume Abrioux [Fri, 30 Mar 2018 10:38:41 +0000 (12:38 +0200)]
move selinux check to `ceph-defaults`

This check is alone in `ceph-docker-common` since a previous code
refactor.
Moving this check in `ceph-defaults` allows us to run `ceph-clients`
without having to run `ceph-docker-common` even in non-containerized
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-iscsi: fix certificates generation and distribution
Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]
ceph-iscsi: fix certificates generation and distribution

Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodo not delegate facts on client nodes
Guillaume Abrioux [Wed, 21 Mar 2018 18:01:51 +0000 (19:01 +0100)]
do not delegate facts on client nodes

This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977

We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge-docker: remove redundant task
Guillaume Abrioux [Tue, 27 Mar 2018 12:26:12 +0000 (14:26 +0200)]
purge-docker: remove redundant task

The `remove_packages` prompt is redundant to the `ireallymeanit` prompt
since it does exactly the same thing. I guess the only goal of this task
was to make a break to warn user about `--skip-tags=with_pkg` feature.
This warning should be part of the first prompt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
Randy J. Martinez [Thu, 29 Mar 2018 00:15:19 +0000 (19:15 -0500)]
ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.

This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
7 years agoceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Alfredo Deza [Wed, 28 Mar 2018 20:40:04 +0000 (16:40 -0400)]
ceph-osd note that some scenarios use ceph-disk vs. ceph-volume

Signed-off-by: Alfredo Deza <adeza@redhat.com>
7 years agoRefer to expected-num-ojects as expected_num_objects, not size
John Fulton [Sun, 25 Mar 2018 20:36:27 +0000 (20:36 +0000)]
Refer to expected-num-ojects as expected_num_objects, not size

Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools

7 years agocleanup osd.conf.j2 in ceph-osd
Ning Yao [Fri, 23 Mar 2018 15:48:16 +0000 (23:48 +0800)]
cleanup osd.conf.j2 in ceph-osd

osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
7 years agosetup cephx keys when not nfs_obj_gw
Patrick Donnelly [Sat, 10 Mar 2018 19:27:10 +0000 (11:27 -0800)]
setup cephx keys when not nfs_obj_gw

Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also,
copy/setup RGW related directories only when configured as nfs_obj_gw.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
7 years agoceph-defaults: set is_atomic variable
Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]
ceph-defaults: set is_atomic variable

This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
7 years agoFix config_template to consistently order sections
Andy McCrae [Fri, 16 Mar 2018 15:24:53 +0000 (15:24 +0000)]
Fix config_template to consistently order sections

In ec042219e64a321fa67fce0384af76eeb238c645 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().

7 years agoSimplify ceph.conf generation
Andy McCrae [Mon, 12 Mar 2018 14:13:53 +0000 (14:13 +0000)]
Simplify ceph.conf generation

Since the approach to creating a ceph.conf file has changed, and now
no-longer relies on assembling config file fragments in /etc/ceph/ceph.d
we can avoid the conf_overrides rendering on the local host and skip out
the tasks related to that, instead using just the config_template task
to configure the file directly.

7 years agoosd: add fs.aio-max-nr tuning v3.1.0beta4
Sébastien Han [Wed, 14 Mar 2018 22:46:23 +0000 (23:46 +0100)]
osd: add fs.aio-max-nr tuning

The number of osds per nodes is limited by aio-max-nr, default is low,
so we need to increase it.

Full story:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: apply systcl right away
Sébastien Han [Wed, 14 Mar 2018 22:41:53 +0000 (23:41 +0100)]
osd: apply systcl right away

Without     sysctl_set: yes the sysctm tuning will only get applied on
the systctl.conf but not on the fly.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomove system tuning to osd role
Sébastien Han [Wed, 14 Mar 2018 22:39:10 +0000 (23:39 +0100)]
move system tuning to osd role

The changes from these tasks only apply to osd nodes so there is no
reason to have them in ceph-common.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: re-arrange group_vars files
Sébastien Han [Wed, 7 Mar 2018 16:28:20 +0000 (17:28 +0100)]
ci: re-arrange group_vars files

We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: remove left over iscsi_gws file
Sébastien Han [Wed, 7 Mar 2018 16:27:29 +0000 (17:27 +0100)]
ci: remove left over iscsi_gws file

Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoremove unsed ceph_rgw_civetweb_port variable
Sébastien Han [Wed, 7 Mar 2018 16:26:24 +0000 (17:26 +0100)]
remove unsed ceph_rgw_civetweb_port variable

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: implement proper pools creation
Sébastien Han [Wed, 7 Mar 2018 13:50:27 +0000 (14:50 +0100)]
client: implement proper pools creation

Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: add support for erasure code pool
Sébastien Han [Tue, 6 Mar 2018 13:26:53 +0000 (14:26 +0100)]
mon: add support for erasure code pool

You can now specify type: erasure and   erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: add support for pgp, pool type and rule name
Sébastien Han [Tue, 6 Mar 2018 13:22:48 +0000 (14:22 +0100)]
mon: add support for pgp, pool type and rule name

When creating pools, it's crucial to expose all the options available as
part of the pool creation command. As explained in:
http://docs.ceph.com/docs/jewel/rados/operations/pools/

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: test pool creation on container
Sébastien Han [Mon, 5 Mar 2018 09:08:16 +0000 (10:08 +0100)]
ci: test pool creation on container

On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: fail if pool creation fails
Sébastien Han [Mon, 5 Mar 2018 09:05:28 +0000 (10:05 +0100)]
mon: fail if pool creation fails

There is no reason to continue the deployment if these tasks fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: add support for expected-num-objects
Sébastien Han [Mon, 5 Mar 2018 08:56:03 +0000 (09:56 +0100)]
mon: add support for expected-num-objects

This commit adds the support for expected-num-objects when creating a pool.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: add useful info if daemon are not restarted properly
Sébastien Han [Wed, 7 Mar 2018 10:56:30 +0000 (11:56 +0100)]
defaults: add useful info if daemon are not restarted properly

If OSDs don't restart normally we now also dump info of the crush map,
crush rules, crush tree and pools.

If the monitors don't restart normally we also print the socket status
by calling mon_status and quorum_status.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoTune ansible.cfg
jtudelag [Thu, 8 Mar 2018 15:54:43 +0000 (16:54 +0100)]
Tune ansible.cfg

Based on the OpenShift one:
https://docs.openshift.com/container-platform/3.7/scaling_performance/install_practices.html#scaling-performance-install-optimization

* Increases number of forks.
* Disables host_key_checking
* Smart gathering facts
* Fact caching jsonfile
* Enables profile_tasks callback
* Mutliplexes ssh connections (ControlMaster)
* Enables pipelining

7 years agoCleanup plugins directories and references
Andy McCrae [Tue, 13 Mar 2018 11:30:09 +0000 (11:30 +0000)]
Cleanup plugins directories and references

Having callback_plugins, and action plugins in random locations causes
a lot of disparity.

We should centralize this into one place in the plugins directory and
fix up the ansible.cfg to reflect this.

Additionally, since the ansible.cfg already reflects action_plugins, we
don't need a link to action_plugins in the base of the repository.

7 years agoAdds handy ceph aliases whe containerized installations.
jtudelag [Wed, 28 Feb 2018 17:53:57 +0000 (18:53 +0100)]
Adds handy ceph aliases whe containerized installations.

Same approach as openshift-ansible etcdctl:

* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml
* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh

7 years agoclient: fix pgs num for client pool creation
Guillaume Abrioux [Mon, 26 Feb 2018 15:03:30 +0000 (16:03 +0100)]
client: fix pgs num for client pool creation

The `pools` dict defined in `roles/ceph-client/defaults/main.yml`
shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num
}}` as default value for `pgs` keys.

For instance, if you want some pools to be created but without explicitely
specifying the pgs for these pools (it means you want to use the
`osd_pool_default_pg_num`), you will be obliged to define
`{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you
wanted to use the current default value already defined in the cluster which is
retrieved early in the playbook and stored in the
`{{ osd_pool_default_pg_num }}` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocommon: run updatedb task on debian systems only
Sébastien Han [Mon, 5 Mar 2018 17:57:29 +0000 (18:57 +0100)]
common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: fix osd_pool_default_crush_rule persistence and effectiveness
Sébastien Han [Fri, 2 Mar 2018 14:50:01 +0000 (15:50 +0100)]
mon: fix osd_pool_default_crush_rule persistence and effectiveness

Running the last portion (insert new default and add new default crush
tasks) of crush_rules.yml only on the last monitor is
wrong since ceph CLI calls usually end up on the master having the
quorum, which is by default the one with the lower IP.
So if we run the  command and end up on another mon the creation will
happen on the default crush rule because the particular mon hasn't been
updated.
To fix this we remove the |last on the include and use run_once: true on
 certain tasks, then we let the final two tasks run on all the monitors.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: fix set crush default rule
Sébastien Han [Fri, 2 Mar 2018 13:53:57 +0000 (14:53 +0100)]
mon: fix set crush default rule

On releases after jewel the option
'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's
called osd_pool_default_crush_rule.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: remove old crush_location implementation
Sébastien Han [Wed, 21 Feb 2018 14:56:32 +0000 (15:56 +0100)]
osd: remove old crush_location implementation

This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: add tests for creating crush tree
Sébastien Han [Wed, 21 Feb 2018 14:26:30 +0000 (15:26 +0100)]
test: add tests for creating crush tree

We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: use ceph_crush module in the playbook
Sébastien Han [Wed, 21 Feb 2018 14:20:24 +0000 (15:20 +0100)]
mon: use ceph_crush module in the playbook

Instead of creating the CRUSH hierarchy with Ansible tasks using the
command module we now rely on the ceph_crush module.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadd ceph_crush module
Sébastien Han [Mon, 19 Feb 2018 09:13:06 +0000 (10:13 +0100)]
add ceph_crush module

This module allows us to create Ceph CRUSH hierarchy. The module works
with
hostvars from individual OSD hosts.
Here is an example of the expected configuration in the inventory file:

[osds]
ceph-osd-01 osd_crush_location="{ 'root': 'mon-roottt', 'rack':
'mon-rackkkk', 'pod': 'monpod', 'host': 'localhost' }"  # valid case

Then, if create_crush_tree is enabled the module will create the
appropriate CRUSH buckets and their types in Ceph.

Some pre-requesites:

* a 'host' bucket must be defined
* at least two buckets must be defined (this includes the 'host')

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomons: Current crush_rule playbook does not work if there is no default rule defined...
Greg Charot [Tue, 6 Feb 2018 18:44:03 +0000 (19:44 +0100)]
mons: Current crush_rule playbook does not work if there is no default rule defined (default: true).
One could want to add new crush rules while keeping his current default rule.
Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.

7 years agono reason the ceph-ansible ansible default provided crush_rule_hdd rule should be...
Greg Charot [Tue, 6 Feb 2018 18:26:54 +0000 (19:26 +0100)]
no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset

7 years agoWe don't want to automatically move the rbd pool to the new default crush rule. This...
Greg Charot [Tue, 6 Feb 2018 18:20:17 +0000 (19:20 +0100)]
We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator.

7 years agoadd support for installation checkpoint
Sébastien Han [Wed, 28 Feb 2018 16:08:07 +0000 (17:08 +0100)]
add support for installation checkpoint

This was taken from the openshift ansible repository here:
https://github.com/leseb/openshift-ansible/tree/master/roles/installer_checkpoint

Rationale:

A complete OpenShift cluster installation is comprised of many different
components which can take 30 minutes to several hours to complete. If
the installation should fail, it could be confusing to understand at
which component the failure occurred. Additionally, it may be desired to
re-run only the component which failed instead of starting over from the
beginning. Components which came after the failed component would also
need to be run individually.

Ceph has a similar situation so we can benefit from that
callback_plugin.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRemove vars that are no longer used
Andy McCrae [Mon, 5 Mar 2018 19:06:09 +0000 (19:06 +0000)]
Remove vars that are no longer used

As part of fcba2c801a122b7ce8ec6a5c27a70bc19589d177 these vars were
removed and no longer do anything:

radosgw_dns_name
radosgw_resolve_cname

This patch removes them from the group_vars files and defaults/main.yml

7 years agoMakes use of docker_exec_cmd in ceph-mon role.
jtudelag [Sun, 4 Mar 2018 21:13:22 +0000 (22:13 +0100)]
Makes use of docker_exec_cmd in ceph-mon role.

Keeps consistency inside the role and among roles.
Makes the code more readable.

7 years agocommon: run updatedb task on debian systems only
Sébastien Han [Thu, 1 Mar 2018 16:33:33 +0000 (17:33 +0100)]
common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: add cluster name option to the handler
Sébastien Han [Thu, 1 Mar 2018 15:50:06 +0000 (16:50 +0100)]
rgw: add cluster name option to the handler

If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: add copy_admin_key test to container scenario
Sébastien Han [Thu, 1 Mar 2018 15:47:37 +0000 (16:47 +0100)]
ci: add copy_admin_key test to container scenario

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: ability to copy ceph admin key on containerized
Sébastien Han [Thu, 1 Mar 2018 15:47:22 +0000 (16:47 +0100)]
rgw: ability to copy ceph admin key on containerized

If we now set copy_admin_key while running a containerized scenario, the
ceph admin key will be copied on the node.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorgw: run the handler on a mon host
Sébastien Han [Thu, 1 Mar 2018 15:46:01 +0000 (16:46 +0100)]
rgw: run the handler on a mon host

In case the admin wasn't copied over to the node this command would
fail. So it's safer to run it from a monitor directly.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: make CI jobs using 'ansible.cfg' v3.1.0beta3
Guillaume Abrioux [Mon, 26 Feb 2018 13:35:36 +0000 (14:35 +0100)]
tests: make CI jobs using 'ansible.cfg'

The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: use `ceph_uid` fact to set uid/gid on admin key
Guillaume Abrioux [Fri, 16 Feb 2018 08:04:23 +0000 (09:04 +0100)]
client: use `ceph_uid` fact to set uid/gid on admin key

That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomds: fix ansible_service_mgr typo
Grant Slater [Sun, 25 Feb 2018 01:44:07 +0000 (01:44 +0000)]
mds: fix ansible_service_mgr typo

This commit fixes a typo introduced by 4671b9e74e657988137f6723ef12e38c66d9cd40

7 years agoRevert "[TEST] Test setting up correct systemd file for nfs-ganesha"
Andy McCrae [Wed, 21 Feb 2018 08:41:27 +0000 (08:41 +0000)]
Revert "[TEST] Test setting up correct systemd file for nfs-ganesha"

The nfs-ganesha package has been fixed as part of this commit:
https://github.com/nfs-ganesha/nfs-ganesha-debian/commit/963b6681dfac459c27c947cb8decc788bc9e5422

Once the package is rebuilt this should be good to merge.

This reverts commit e88af3c4cb314f1f640447ebdce343f0aca85fb4.

7 years agoMake rule_name optional when defining items in openstack_pools
Giulio Fidente [Thu, 22 Feb 2018 18:57:47 +0000 (19:57 +0100)]
Make rule_name optional when defining items in openstack_pools

Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.