git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

JohnHaan [Tue, 10 Apr 2018 00:48:47 +0000 (09:48 +0900)]

Fixed wrong path of ceph.conf in docs.

The path of ceph.conf sample template moved to ceph-config.
Therefore docs needs to be changed to the right directory.

Signed-off-by: JohnHaan <yongiman@gmail.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Apr 2018 11:02:44 +0000 (13:02 +0200)]

defaults: fix backward compatibility

backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]

common: upgrade/install ceph-test RPM first

Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.

The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.

In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.

When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.

commit | commitdiff | tree

Sébastien Han [Mon, 9 Apr 2018 08:01:30 +0000 (10:01 +0200)]

ceph-defaults: fix ceoh_uid for container image tag latest

According to our recent change, we now use "CentOS" as a latest
container image. We need to reflect this on the ceph_uid.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 5 Apr 2018 08:28:51 +0000 (10:28 +0200)]

tox: use container latest tag for upgrades

Currently tag-build-master-luminous-ubuntu-16.04 is not used anymore.
Also now, 'latest' points to CentOS so we need to make that switch here
too.

We know have latest tags for each stable release so let's use them and
point tox at them to deploy the right version.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Zack Cerza [Fri, 6 Apr 2018 16:17:48 +0000 (10:17 -0600)]

Use the CentOS repo for Red Hat dev packages

No use even trying to use something that doesn't exist.

Signed-off-by: Zack Cerza <zack@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 4 Apr 2018 09:46:51 +0000 (11:46 +0200)]

site-docker: followup on #2487

get a non empty array as default value for `groups.get('clients')`,
otherwise `| first` filter will complain because it can't work with
empty array.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 4 Apr 2018 14:23:54 +0000 (16:23 +0200)]

add .vscode/ to gitignore

I personally dev on vscode and I have some preferences to save when it
comes to running the python unit tests. So escaping this directory is
actually useful.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Attila Fazekas [Wed, 4 Apr 2018 13:30:55 +0000 (15:30 +0200)]

Deploying without managed monitors failed

Tripleo deployment failed when the monitors not manged
by tripleo itself with:
FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
f46217b69ae18317cb0c1cc3e391a0bca5767eb6 .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Apr 2018 11:43:53 +0000 (13:43 +0200)]

defaults: remove `run_once: true` when creating fetch_directory

because of `serial: 1`, it can be an issue when the playbook is being
run on client nodes.
Since the refact of `ceph-client` we skip the role `ceph-defaults` on
every node except the first client node, it means that the task is not
going to be played because of `run_once: true`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Apr 2018 11:41:07 +0000 (13:41 +0200)]

config: use fact `ceph_uid`

Use fact `ceph_uid` in the task which ensures `/etc/ceph` exists in
containerized deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Mar 2018 11:48:17 +0000 (13:48 +0200)]

clients: refact `ceph-clients` role

This commit refacts this role so we don't have to pull container image
on client nodes just to create pools and keys.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1550977
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Mar 2018 10:50:14 +0000 (12:50 +0200)]

client: remove legacy code

This seems to be a leftover.
This commit removes an unnecessary 'set linux permissions' on
`/var/lib/ceph`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Mar 2018 10:45:15 +0000 (12:45 +0200)]

container: play docker-common only on first client node

This commit aims to set the default behavior to play
`ceph-docker-common` only on first node in clients group.

Currently, we play docker-common to pull container image so we can run
ceph commands in order to generate keys or create pools.
On a cluster with a large number of client nodes this can be time consuming
to proceed this way. An alternative would be to pull container image
only a first node and then copy keys on other nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Mar 2018 10:38:41 +0000 (12:38 +0200)]

move selinux check to `ceph-defaults`

This check is alone in `ceph-docker-common` since a previous code
refactor.
Moving this check in `ceph-defaults` allows us to run `ceph-clients`
without having to run `ceph-docker-common` even in non-containerized
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]

ceph-iscsi: fix certificates generation and distribution

Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Mar 2018 18:01:51 +0000 (19:01 +0100)]

do not delegate facts on client nodes

This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977

We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 27 Mar 2018 12:26:12 +0000 (14:26 +0200)]

purge-docker: remove redundant task

The `remove_packages` prompt is redundant to the `ireallymeanit` prompt
since it does exactly the same thing. I guess the only goal of this task
was to make a break to warn user about `--skip-tags=with_pkg` feature.
This warning should be part of the first prompt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Randy J. Martinez [Thu, 29 Mar 2018 00:15:19 +0000 (19:15 -0500)]

ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.

This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>

commit | commitdiff | tree

Alfredo Deza [Wed, 28 Mar 2018 20:40:04 +0000 (16:40 -0400)]

ceph-osd note that some scenarios use ceph-disk vs. ceph-volume

Signed-off-by: Alfredo Deza <adeza@redhat.com>

commit | commitdiff | tree

John Fulton [Sun, 25 Mar 2018 20:36:27 +0000 (20:36 +0000)]

Refer to expected-num-ojects as expected_num_objects, not size

Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].

[1] https://github.com/ceph/ceph-ansible/pull/2432/files
[2] http://docs.ceph.com/docs/jewel/rados/operations/pools

commit | commitdiff | tree

Ning Yao [Fri, 23 Mar 2018 15:48:16 +0000 (23:48 +0800)]

cleanup osd.conf.j2 in ceph-osd

osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>

commit | commitdiff | tree

Patrick Donnelly [Sat, 10 Mar 2018 19:27:10 +0000 (11:27 -0800)]

setup cephx keys when not nfs_obj_gw

Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also,
copy/setup RGW related directories only when configured as nfs_obj_gw.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]

ceph-defaults: set is_atomic variable

This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andy McCrae [Fri, 16 Mar 2018 15:24:53 +0000 (15:24 +0000)]

Fix config_template to consistently order sections

In ec042219e64a321fa67fce0384af76eeb238c645 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().

commit | commitdiff | tree

Andy McCrae [Mon, 12 Mar 2018 14:13:53 +0000 (14:13 +0000)]

Simplify ceph.conf generation

Since the approach to creating a ceph.conf file has changed, and now
no-longer relies on assembling config file fragments in /etc/ceph/ceph.d
we can avoid the conf_overrides rendering on the local host and skip out
the tasks related to that, instead using just the config_template task
to configure the file directly.

commit | commitdiff | tree

Sébastien Han [Wed, 14 Mar 2018 22:46:23 +0000 (23:46 +0100)]

osd: add fs.aio-max-nr tuning

The number of osds per nodes is limited by aio-max-nr, default is low,
so we need to increase it.

Full story:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-August/020408.html

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1553407
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 14 Mar 2018 22:41:53 +0000 (23:41 +0100)]

osd: apply systcl right away

Without sysctl_set: yes the sysctm tuning will only get applied on
the systctl.conf but not on the fly.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 14 Mar 2018 22:39:10 +0000 (23:39 +0100)]

move system tuning to osd role

The changes from these tasks only apply to osd nodes so there is no
reason to have them in ceph-common.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 7 Mar 2018 16:28:20 +0000 (17:28 +0100)]

ci: re-arrange group_vars files

We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 7 Mar 2018 16:27:29 +0000 (17:27 +0100)]

ci: remove left over iscsi_gws file

Wrong file that is not used, only iscsi-ggw that is present is correct.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 7 Mar 2018 16:26:24 +0000 (17:26 +0100)]

remove unsed ceph_rgw_civetweb_port variable

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 7 Mar 2018 13:50:27 +0000 (14:50 +0100)]

client: implement proper pools creation

Just like we did for the monitor and openstack_config we now have the
ability to precisely create pools.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 6 Mar 2018 13:26:53 +0000 (14:26 +0100)]

mon: add support for erasure code pool

You can now specify type: erasure and erasure_profile to use when
declaring the pool dictionnary.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 6 Mar 2018 13:22:48 +0000 (14:22 +0100)]

mon: add support for pgp, pool type and rule name

When creating pools, it's crucial to expose all the options available as
part of the pool creation command. As explained in:
http://docs.ceph.com/docs/jewel/rados/operations/pools/

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 5 Mar 2018 09:08:16 +0000 (10:08 +0100)]

ci: test pool creation on container

On containerized scenario we also want to test pool creation.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 5 Mar 2018 09:05:28 +0000 (10:05 +0100)]

mon: fail if pool creation fails

There is no reason to continue the deployment if these tasks fail.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1546185
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 5 Mar 2018 08:56:03 +0000 (09:56 +0100)]

mon: add support for expected-num-objects

This commit adds the support for expected-num-objects when creating a pool.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1541520
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 7 Mar 2018 10:56:30 +0000 (11:56 +0100)]

defaults: add useful info if daemon are not restarted properly

If OSDs don't restart normally we now also dump info of the crush map,
crush rules, crush tree and pools.

If the monitors don't restart normally we also print the socket status
by calling mon_status and quorum_status.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

jtudelag [Thu, 8 Mar 2018 15:54:43 +0000 (16:54 +0100)]

Tune ansible.cfg

Based on the OpenShift one:
https://docs.openshift.com/container-platform/3.7/scaling_performance/install_practices.html#scaling-performance-install-optimization

* Increases number of forks.
* Disables host_key_checking
* Smart gathering facts
* Fact caching jsonfile
* Enables profile_tasks callback
* Mutliplexes ssh connections (ControlMaster)
* Enables pipelining

commit | commitdiff | tree

Andy McCrae [Tue, 13 Mar 2018 11:30:09 +0000 (11:30 +0000)]

Cleanup plugins directories and references

Having callback_plugins, and action plugins in random locations causes
a lot of disparity.

We should centralize this into one place in the plugins directory and
fix up the ansible.cfg to reflect this.

Additionally, since the ansible.cfg already reflects action_plugins, we
don't need a link to action_plugins in the base of the repository.

commit | commitdiff | tree

jtudelag [Wed, 28 Feb 2018 17:53:57 +0000 (18:53 +0100)]

Adds handy ceph aliases whe containerized installations.

Same approach as openshift-ansible etcdctl:

* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/tasks/auxiliary/drop_etcdctl.yml
* https://github.com/openshift/openshift-ansible/blob/release-3.7/roles/etcd/etcdctl.sh

commit | commitdiff | tree

Guillaume Abrioux [Mon, 26 Feb 2018 15:03:30 +0000 (16:03 +0100)]

client: fix pgs num for client pool creation

The `pools` dict defined in `roles/ceph-client/defaults/main.yml`
shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num
}}` as default value for `pgs` keys.

For instance, if you want some pools to be created but without explicitely
specifying the pgs for these pools (it means you want to use the
`osd_pool_default_pg_num`), you will be obliged to define
`{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you
wanted to use the current default value already defined in the cluster which is
retrieved early in the playbook and stored in the
`{{ osd_pool_default_pg_num }}` fact.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 5 Mar 2018 17:57:29 +0000 (18:57 +0100)]

common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 2 Mar 2018 14:50:01 +0000 (15:50 +0100)]

mon: fix osd_pool_default_crush_rule persistence and effectiveness

Running the last portion (insert new default and add new default crush
tasks) of crush_rules.yml only on the last monitor is
wrong since ceph CLI calls usually end up on the master having the
quorum, which is by default the one with the lower IP.
So if we run the command and end up on another mon the creation will
happen on the default crush rule because the particular mon hasn't been
updated.
To fix this we remove the |last on the include and use run_once: true on
certain tasks, then we let the final two tasks run on all the monitors.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 2 Mar 2018 13:53:57 +0000 (14:53 +0100)]

mon: fix set crush default rule

On releases after jewel the option
'osd_pool_default_crush_replicated_ruleset' does not exist anymore, it's
called osd_pool_default_crush_rule.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 21 Feb 2018 14:56:32 +0000 (15:56 +0100)]

osd: remove old crush_location implementation

This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 21 Feb 2018 14:26:30 +0000 (15:26 +0100)]

test: add tests for creating crush tree

We now run tests on the newly created ceph_crush module. Now the CI will
create a specific hierarchy for the OSD.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 21 Feb 2018 14:20:24 +0000 (15:20 +0100)]

mon: use ceph_crush module in the playbook

Instead of creating the CRUSH hierarchy with Ansible tasks using the
command module we now rely on the ceph_crush module.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 19 Feb 2018 09:13:06 +0000 (10:13 +0100)]

add ceph_crush module

This module allows us to create Ceph CRUSH hierarchy. The module works
with
hostvars from individual OSD hosts.
Here is an example of the expected configuration in the inventory file:

[osds]
ceph-osd-01 osd_crush_location="{ 'root': 'mon-roottt', 'rack':
'mon-rackkkk', 'pod': 'monpod', 'host': 'localhost' }" # valid case

Then, if create_crush_tree is enabled the module will create the
appropriate CRUSH buckets and their types in Ceph.

Some pre-requesites:

* a 'host' bucket must be defined
* at least two buckets must be defined (this includes the 'host')

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Greg Charot [Tue, 6 Feb 2018 18:44:03 +0000 (19:44 +0100)]

mons: Current crush_rule playbook does not work if there is no default rule defined (default: true).
One could want to add new crush rules while keeping his current default rule.
Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.

commit | commitdiff | tree

Greg Charot [Tue, 6 Feb 2018 18:26:54 +0000 (19:26 +0100)]

no reason the ceph-ansible ansible default provided crush_rule_hdd rule should be set as rack root + default ruleset

commit | commitdiff | tree

Greg Charot [Tue, 6 Feb 2018 18:20:17 +0000 (19:20 +0100)]

We don't want to automatically move the rbd pool to the new default crush rule. This operation shall be performed by the cluster operator.

commit | commitdiff | tree

Sébastien Han [Wed, 28 Feb 2018 16:08:07 +0000 (17:08 +0100)]

add support for installation checkpoint

This was taken from the openshift ansible repository here:
https://github.com/leseb/openshift-ansible/tree/master/roles/installer_checkpoint

Rationale:

A complete OpenShift cluster installation is comprised of many different
components which can take 30 minutes to several hours to complete. If
the installation should fail, it could be confusing to understand at
which component the failure occurred. Additionally, it may be desired to
re-run only the component which failed instead of starting over from the
beginning. Components which came after the failed component would also
need to be run individually.

Ceph has a similar situation so we can benefit from that
callback_plugin.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andy McCrae [Mon, 5 Mar 2018 19:06:09 +0000 (19:06 +0000)]

Remove vars that are no longer used

As part of fcba2c801a122b7ce8ec6a5c27a70bc19589d177 these vars were
removed and no longer do anything:

radosgw_dns_name
radosgw_resolve_cname

This patch removes them from the group_vars files and defaults/main.yml

commit | commitdiff | tree

jtudelag [Sun, 4 Mar 2018 21:13:22 +0000 (22:13 +0100)]

Makes use of docker_exec_cmd in ceph-mon role.

Keeps consistency inside the role and among roles.
Makes the code more readable.

commit | commitdiff | tree

Sébastien Han [Thu, 1 Mar 2018 16:33:33 +0000 (17:33 +0100)]

common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 1 Mar 2018 15:50:06 +0000 (16:50 +0100)]

rgw: add cluster name option to the handler

If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 1 Mar 2018 15:47:37 +0000 (16:47 +0100)]

ci: add copy_admin_key test to container scenario

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 1 Mar 2018 15:47:22 +0000 (16:47 +0100)]

rgw: ability to copy ceph admin key on containerized

If we now set copy_admin_key while running a containerized scenario, the
ceph admin key will be copied on the node.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 1 Mar 2018 15:46:01 +0000 (16:46 +0100)]

rgw: run the handler on a mon host

In case the admin wasn't copied over to the node this command would
fail. So it's safer to run it from a monitor directly.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 26 Feb 2018 13:35:36 +0000 (14:35 +0100)]

tests: make CI jobs using 'ansible.cfg'

The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 16 Feb 2018 08:04:23 +0000 (09:04 +0100)]

client: use `ceph_uid` fact to set uid/gid on admin key

That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540578
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Grant Slater [Sun, 25 Feb 2018 01:44:07 +0000 (01:44 +0000)]

mds: fix ansible_service_mgr typo

This commit fixes a typo introduced by 4671b9e74e657988137f6723ef12e38c66d9cd40

commit | commitdiff | tree

Andy McCrae [Wed, 21 Feb 2018 08:41:27 +0000 (08:41 +0000)]

Revert "[TEST] Test setting up correct systemd file for nfs-ganesha"

The nfs-ganesha package has been fixed as part of this commit:
https://github.com/nfs-ganesha/nfs-ganesha-debian/commit/963b6681dfac459c27c947cb8decc788bc9e5422

Once the package is rebuilt this should be good to merge.

This reverts commit e88af3c4cb314f1f640447ebdce343f0aca85fb4.

commit | commitdiff | tree

Giulio Fidente [Thu, 22 Feb 2018 18:57:47 +0000 (19:57 +0100)]

Make rule_name optional when defining items in openstack_pools

Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.

commit | commitdiff | tree

Sébastien Han [Thu, 22 Feb 2018 14:05:28 +0000 (15:05 +0100)]

remove kernel.pid_max

This is now managed by Ceph packages.

See: https://github.com/ceph/ceph/pull/18544/files

http://tracker.ceph.com/issues/21929

Closes: https://github.com/ceph/ceph-ansible/issues/2410
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 23 Feb 2018 10:23:13 +0000 (11:23 +0100)]

tests: change ceph_docker_image_tag for 2nd run

The ceph-ansible upstream CI runs severals tests, including a
'idempotency/handlers' test. It means the playbook is run a first time
and then a second time with an other container image version to ensure the
handlers run properly and the containers are well restarted.
This can cause issues.
For instance, in that specific case which drove me to submit this commit,
I've hit the case where `latest` image ships ceph 12.2.3 while the `stable-3.0`
(which is the image used for the second run) ships ceph 12.2.2.

The goal of this test is not to verify we can upgrade from a specific
version to another but to ensure handlers are working even if it's a valid
failure here.
It should be caught by a test dedicated to that usecase.

We just need to have a container image which has a different id for
the upstream CI, we need the same content in container imagebut a different
image id in the registry since the test relies on image id to decide whether
the container should be restarted.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 16 Feb 2018 12:53:52 +0000 (13:53 +0100)]

ci: add tripleo scenario testing

This should help to see earlier any failure in a tripleo deployment scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Andy McCrae [Mon, 19 Feb 2018 18:13:21 +0000 (18:13 +0000)]

Adjust /etc/updatedb.conf to not parse /var/lib/ceph

Using updatedb -e doesnt make a permanent change, but will updatedb
without the passed path.

To make this change more permanent we should update the
/etc/updatedb.conf file to include /var/lib/ceph.

commit | commitdiff | tree

Andy McCrae [Mon, 19 Feb 2018 17:23:32 +0000 (17:23 +0000)]

[TEST] Test setting up correct systemd file for nfs-ganesha

Don't merge this.
Test to see if we copy over the nfs-ganesha-lock.service.debian8 file
properly, whether the Xenial CI job will work.

The upstream download.ceph.com nfs-ganesha package should be fixed for
xenial (which is in progress).

commit | commitdiff | tree

Guillaume Abrioux [Fri, 16 Feb 2018 12:45:26 +0000 (13:45 +0100)]

update: look for short and fqdn in ceph_health_raw

According to hostname configuration, the task waiting for mons to be in
quorum might fail.
The idea here is to look for both shortname and fqdn in
`ceph_health_raw` instead of just `ansible_hostname`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1546127
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Paul Bourke [Fri, 16 Feb 2018 16:21:24 +0000 (16:21 +0000)]

Remove redundant task to check if atomic

This fact is already set in site-docker.yml so there's no need to check
it again in ceph-docker-common

Signed-off-by: Paul Bourke <paul.bourke@oracle.com>

commit | commitdiff | tree

Andy McCrae [Wed, 20 Dec 2017 03:49:16 +0000 (13:49 +1000)]

Restart services if handler called

This patch fixes an issue where if hosts have different service lists,
it will prevent restarting changes on services that run later on.

For example, hostA in the mons and rgws group would initiate a config
change and restart of services on all mons and rgws hosts, even though
a separate hostB (which is only in the rgws group) has not had its
configuration changed yet. Additionally, when the second host has its
coniguration changed as part of the ceph-rgw role, it will not initiate
a restart since its inventory name != the first hosts.

To fix this we should run the restart once (using run_once: True)
as long as the host has called the handler. This will ensure that even
if only 1 host has called the handler it will initiate a restart on all
hosts that have called the handler.

Additionally, we add a var that is set when the handler runs, this will
ensure that only hosts that have called the handler get restarted.

Includes minor fix to remove unrequired "inventory_hostname in
play_hosts" when: clause. This is no longer required since the handlers
were changed. The host calling the handler will be in play_hosts
already.

commit | commitdiff | tree

Sébastien Han [Wed, 14 Feb 2018 00:44:18 +0000 (01:44 +0100)]

container: osd remove run_once

When used along with delegate, run_once does not belong well. Thus,
using | last always brings the desired result.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 8 Feb 2018 16:35:05 +0000 (17:35 +0100)]

docker-common: fix container restart on new image

We now look for any excisting containers, if any we compare their
running image with the latest pulled container image.
For OSDs, we iterate over the list of running OSDs, this handles the
case where the first OSD of the list has been updated (runs the new
image) and not the others.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526513
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 13 Feb 2018 08:37:14 +0000 (09:37 +0100)]

default: remove duplicate code

This is already defined in ceph-defaults.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 9 Feb 2018 17:15:25 +0000 (18:15 +0100)]

test: add test for containers resources changes

We change the ceph_mon_docker_memory_limit on the second run, this
should trigger a restart of services.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 9 Feb 2018 17:11:07 +0000 (18:11 +0100)]

test: add test for restart on new container image

Since we have a task to test the handlers we can test a new container to
validate the service restart on a new container image.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Mon, 12 Feb 2018 20:52:27 +0000 (14:52 -0600)]

rolling update: fix undefined jewel_minor_update failure

Variables set at the play level with ``vars`` do
not carry over into the next play in the playbook.

The var jewel_minor_update was set in a previous play but
used in this one and was failing because it was not defined.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1544029

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Fri, 9 Feb 2018 20:02:07 +0000 (14:02 -0600)]

infra: do not include host_vars/* in take-over-existing-cluster.yml

These are better collected by ansible automatically. This would also
fail if the host_var file didn't exist.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Caleb Boylan [Thu, 28 Dec 2017 16:52:02 +0000 (08:52 -0800)]

osd: Add support for multipath disks

Multipath disks have partitions with a different format than what
ceph-ansible currently supports, this update makes ceph-ansible
aware of that format so multipath disks can be used as OSDs

Signed-off-by: Caleb Boylan <caleb.boylan@ormuco.com>

commit | commitdiff | tree

Andy McCrae [Fri, 9 Feb 2018 14:12:35 +0000 (14:12 +0000)]

Set application for OpenStack pools

Since Luminous we need to set the application tag for each pool,
otherwise a CEPH_WARNING is generated when the pools are in use.

We should assign the OpenStack pools to their default which would be
"rbd". When updating to Luminous this would happen automatically to the
vms, images, backups and volumes pools, but for new deploys this is not
the case.

commit | commitdiff | tree

Sébastien Han [Thu, 8 Feb 2018 16:44:19 +0000 (17:44 +0100)]

site: ability to only generate a ceph.conf on the machines

Now by running the playbook like this:

ansible-playbook site.yml --tags='ceph_update_config'

You can only generate a ceph configuration file on the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1543434
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 8 Feb 2018 13:51:15 +0000 (14:51 +0100)]

default: define 'osd_scenario' variable

osd_scenario does not exist in the ceph-default role so if we try to
play ceph-default on an OSD node, the playbook will fail with undefined
variable.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 8 Feb 2018 12:27:45 +0000 (13:27 +0100)]

osd: fix osd restart when dmcrypt

This commit fixes a bug that occurs especially for dmcrypt scenarios.

There is an issue where the 'disk_list' container can't reach the ceph
cluster because it's not launched with `--net=host`.

If this container can't reach the cluster, it will hang on this step
(when trying to retrieve the dm-crypt key) :

```
+common_functions.sh:448: open_encrypted_part(): ceph --cluster abc12 --name \
client.osd-lockbox.9138767f-7445-49e0-baad-35e19adca8bb --keyring \
/var/lib/ceph/osd-lockbox/9138767f-7445-49e0-baad-35e19adca8bb/keyring \
config-key get dm-crypt/osd/9138767f-7445-49e0-baad-35e19adca8bb/luks
+common_functions.sh:452: open_encrypted_part(): base64 -d
+common_functions.sh:452: open_encrypted_part(): cryptsetup --key-file \
-luksOpen /dev/sdb1 9138767f-7445-49e0-baad-35e19adca8bb
```

It means the `ceph-run-osd.sh` script won't be able to start the
`osd_disk_activate` process in ceph-container because he won't have
filled the `$DOCKER_ENV` environment variable properly.

Adding `--net=host` to the 'disk_list' container fixes this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1543284
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Grant [Wed, 7 Feb 2018 12:35:11 +0000 (12:35 +0000)]

Update Documentation example link to 3.0

Update the Documentation link from 2.2 -> 3.0

Helpful for newbies.

commit | commitdiff | tree

Giulio Fidente [Fri, 2 Feb 2018 08:45:07 +0000 (09:45 +0100)]

Check for docker sockets named after both _hostname or _fqdn

While hostname -f will always return an hostname including its
domain part and -s without the domain part, the behavior when
no arguments are given can include or not include the domain part
depending on how the system is configured; the socket name might
not match the instance name then.

commit | commitdiff | tree

Greg Charot [Fri, 2 Feb 2018 14:12:18 +0000 (15:12 +0100)]

mon: Fixed crush_rule_config for containerised deployment.

Was called too early, container was not yet started so the commands failed.
Moved the section after include docker/main.yml

Signed-off-by: Greg Charot <gcharot@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 2 Feb 2018 10:55:18 +0000 (11:55 +0100)]

purge-docker: fix ceph-osd-zap name container

the `zap ceph osd disks` task should iter on `resolved_parent_device`
instead of `combined_devices_list` which contain only the base device
name (vs. full path name in `combined_devices_list`).

this fixes the issue where docker complain about container name because
of illegal characters such as `/` :
```
"/usr/bin/docker-current: Error response from daemon: Invalid container
name (ceph-osd-zap-magna074-/dev/sdb1), only [a-zA-Z0-9][a-zA-Z0-9_.-]
are allowed.","See '/usr/bin/docker-current run --help'."
""
```

having the the basename of the device path is enough for the container
name.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 31 Jan 2018 08:31:11 +0000 (09:31 +0100)]

common: do not use `shell` module when it is not needed

There is no need here to use `shell` instead of `command`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 31 Jan 2018 08:23:28 +0000 (09:23 +0100)]

syntax: change local_action syntax

Use a nicer syntax for `local_action` tasks.
We used to have oneliner like this:
```
local_action: wait_for port=22 host={{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} state=started delay=10 timeout=500 }}
```

The usual syntax:
```
    local_action:
      module: wait_for
      port: 22
      host: "{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}"
      state: started
      delay: 10
      timeout: 500
```
is nicer and kind of way to keep consistency regarding the whole
playbook.

This also fix a potential issue about missing quotation :

```
Traceback (most recent call last):
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 213, in <module>
    main()
  File "/tmp/ansible_wQtWsi/ansible_module_command.py", line 185, in main
    rc, out, err = module.run_command(args, executable=executable, use_unsafe_shell=shell, encoding=None, data=stdin)
  File "/tmp/ansible_wQtWsi/ansible_modlib.zip/ansible/module_utils/basic.py", line 2710, in run_command
  File "/usr/lib64/python2.7/shlex.py", line 279, in split
    return list(lex)                                                                                                                                                                                                                                                                                                            File "/usr/lib64/python2.7/shlex.py", line 269, in next
    token = self.get_token()
  File "/usr/lib64/python2.7/shlex.py", line 96, in get_token
    raw = self.read_token()
  File "/usr/lib64/python2.7/shlex.py", line 172, in read_token
    raise ValueError, "No closing quotation"
ValueError: No closing quotation
```

writing `local_action: shell echo {{ fsid }} | tee {{ fetch_directory }}/ceph_cluster_uuid.conf`
can cause trouble because it's complaining with missing quotes, this fix solves this issue.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510555
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Jan 2018 13:41:52 +0000 (14:41 +0100)]

osd: resync group_vars file

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Jan 2018 13:39:58 +0000 (14:39 +0100)]

config: remove any spaces in public_network or cluster_network

With two public networks configured - we found that with
"NETWORK_ADDR_1, NETWORK_ADDR_2" install process consistently became
broken, trying to find docker registry on second network, and not
finding mon container.

but without spaces
"NETWORK_ADDR_1,NETWORK_ADDR_2" install succeeds
so, containerized install is more peculiar with formatting of this line

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1534003
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 30 Jan 2018 16:27:53 +0000 (17:27 +0100)]

purge: fix resolve parent device task

This is a typo caused by leftover.
It was previously written like this :
`shell: echo /dev/$(lsblk -no pkname "{{ item }}") }}")`
and has been rewritten to :
`shell: $(lsblk --nodeps -no pkname "{{ item }}") }}")`
because we are appending later the '/dev/' in the next task.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1540137
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Jan 2018 13:28:23 +0000 (14:28 +0100)]

Do not search osd ids if ceph-volume

Description of problem: The 'get osd id' task goes through all the 10 times (and its respective timeouts) to make sure that the number of OSDs in the osd directory match the number of devices.

This happens always, regardless if the setup and deployment is correct.

Version-Release number of selected component (if applicable): Surely the latest. But any ceph-ansible version that contains ceph-volume support is affected.

How reproducible: 100%

Steps to Reproduce:
1. Use ceph-volume (LVM) to deploy OSDs
2. Avoid using anything in the 'devices' section
3. Deploy the cluster

Actual results:
TASK [ceph-osd : get osd id _uses_shell=True, _raw_params=ls /var/lib/ceph/osd/ | sed 's/.*-//'] **********************************************************************************************************************************************
task path: /Users/alfredo/python/upstream/ceph/src/ceph-volume/ceph_volume/tests/functional/lvm/.tox/xenial-filestore-dmcrypt/tmp/ceph-ansible/roles/ceph-osd/tasks/start_osds.yml:6
FAILED - RETRYING: get osd id (10 retries left).
FAILED - RETRYING: get osd id (9 retries left).
FAILED - RETRYING: get osd id (8 retries left).
FAILED - RETRYING: get osd id (7 retries left).
FAILED - RETRYING: get osd id (6 retries left).
FAILED - RETRYING: get osd id (5 retries left).
FAILED - RETRYING: get osd id (4 retries left).
FAILED - RETRYING: get osd id (3 retries left).
FAILED - RETRYING: get osd id (2 retries left).
FAILED - RETRYING: get osd id (1 retries left).
ok: [osd0] => {
    "attempts": 10,
    "changed": false,
    "cmd": "ls /var/lib/ceph/osd/ | sed 's/.*-//'",
    "delta": "0:00:00.002717",
    "end": "2018-01-21 18:10:31.237933",
    "failed": true,
    "failed_when_result": false,
    "rc": 0,
    "start": "2018-01-21 18:10:31.235216"
}

STDOUT:

0
1
2

Expected results:
There aren't any (or just a few) timeouts while the OSDs are found

Additional info:
This is happening because the check is mapping the number of "devices" defined for ceph-disk (in this case it would be 0) to match the number of OSDs found.

Basically this line:

    until: osd_id.stdout_lines|length == devices|unique|length

Means in this 2 OSD case it is trying to ensure the following incorrect condition:

    until: 2 == 0

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1537103

commit | commitdiff | tree

Andy McCrae [Sat, 27 Jan 2018 19:40:09 +0000 (19:40 +0000)]

Add default for radosgw_keystone_ssl

This should default to False. The default for Keystone is not to use PKI
keys, additionally, anybody using this setting had to have been manually
setting it before.

Fixes: #2111

commit | commitdiff | tree

Guillaume Abrioux [Wed, 24 Jan 2018 13:06:47 +0000 (14:06 +0100)]

Revert "monitor_interface: document need to use monitor_address when using IPv6"

This reverts commit 10b91661ceef7992354032030c7c2673a90d40f4.

This reverts also the same comment added in
1359869497a44df0c3b4157f41453b84326b58e7

commit | commitdiff | tree

Eduard Egorov [Thu, 9 Nov 2017 11:49:00 +0000 (11:49 +0000)]

config: add host-specific ceph_conf_overrides evaluation and generation.

This allows us to use host-specific variables in ceph_conf_overrides variable. For example, this fixes usage of such variables (e.g. 'nss db path' having {{ ansible_hostname }} inside) in ceph_conf_overrides for rados gateway configuration (see profiles/rgw-keystone-v3) - issue #2157.

Signed-off-by: Eduard Egorov <eduard.egorov@icl-services.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 25 Jan 2018 15:57:45 +0000 (16:57 +0100)]

upgrade: skip luminous tasks for jewel minor update

These tasks are needed only when upgrading to luminous.
They are not needed in Jewel minor upgrade and by the way, they fail because
`ceph versions` command doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom