Sébastien Han [Fri, 13 Apr 2018 14:36:43 +0000 (16:36 +0200)]
osd: do not do anything if the dev has a partition
Regardless if the partition is 'ceph' or something else, we don't want
to be as strick as checking for a particular partition.
If the drive has a partition, we just don't do anything.
This solves the case where the server reboots, disks get a different
/dev/sda (node) allocation. In this case, prior to restarting the server
/dev/sda was an OSD, but now it's /dev/sdb and the other way around.
In such scenario, we will try to prepare the OSD and create a new
partition, so let's not mess around with devices that have partitions.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1498303 Signed-off-by: Sébastien Han <seb@redhat.com>
Douglas Fuller [Wed, 4 Apr 2018 18:23:25 +0000 (14:23 -0400)]
Remove deprecated allow_multimds
allow_multimds will be officially deprecated in Mimic, specify it
only for all versions of Ceph where it was declared stable. Going
forward, specify only max_mds.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Ali Maredia [Mon, 2 Apr 2018 17:47:31 +0000 (13:47 -0400)]
nfs: ensure nfs-server server is stopped
NFS-ganesha cannot start is the nfs-server service
is running. This commit stops nfs-server in case it
is running on a (debian, redhat, suse) node before
the nfs-ganesha service starts up
Ramana Raja [Mon, 9 Apr 2018 12:03:33 +0000 (17:33 +0530)]
ceph-nfs: allow disabling ganesha caching
Add a variable, ceph_nfs_disable_caching, that if set to true
disables ganesha's directory and attribute caching as much as
possible.
Also, disable caching done by ganesha, when 'nfs_file_gw'
variable is true, i.e., when Ganesha is used as CephFS's gateway.
This is the recommended Ganesha setting as libcephfs already caches
information. And doing so helps avoid cache incoherency issues
especially with clustered ganesha over CephFS.
Fixes: https://tracker.ceph.com/issues/23393 Signed-off-by: Ramana Raja <rraja@redhat.com>
Andrew Schoen [Wed, 28 Mar 2018 16:10:17 +0000 (11:10 -0500)]
ceph_volume: perserve newlines in stdout and stderr when zapping
Because we have many commands we might need to run the
ANSIBLE_STDOUT_CALLBACK won't format these nicely because we're
not reporting these back at the root level of the json result.
Andrew Schoen [Wed, 14 Mar 2018 14:57:49 +0000 (09:57 -0500)]
ceph_volume: remove the subcommand argument
This really isn't needed currently and I don't believe is a good
mechanism for switching subcommands anwyay. The user of this module
should not have to be familar with all ceph-volume subcommands.
purge-docker: added conditionals needed to successfully re-run purge
Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run.
Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios.
purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy).
Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`
Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]
common: upgrade/install ceph-test RPM first
Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.
The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.
In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.
When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.
Sébastien Han [Wed, 4 Apr 2018 14:23:54 +0000 (16:23 +0200)]
add .vscode/ to gitignore
I personally dev on vscode and I have some preferences to save when it
comes to running the python unit tests. So escaping this directory is
actually useful.
defaults: remove `run_once: true` when creating fetch_directory
because of `serial: 1`, it can be an issue when the playbook is being
run on client nodes.
Since the refact of `ceph-client` we skip the role `ceph-defaults` on
every node except the first client node, it means that the task is not
going to be played because of `run_once: true`.
container: play docker-common only on first client node
This commit aims to set the default behavior to play
`ceph-docker-common` only on first node in clients group.
Currently, we play docker-common to pull container image so we can run
ceph commands in order to generate keys or create pools.
On a cluster with a large number of client nodes this can be time consuming
to proceed this way. An alternative would be to pull container image
only a first node and then copy keys on other nodes.
This check is alone in `ceph-docker-common` since a previous code
refactor.
Moving this check in `ceph-defaults` allows us to run `ceph-clients`
without having to run `ceph-docker-common` even in non-containerized
deployment.
Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]
ceph-iscsi: fix certificates generation and distribution
Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.
This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845 Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977
We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.
The `remove_packages` prompt is redundant to the `ireallymeanit` prompt
since it does exactly the same thing. I guess the only goal of this task
was to make a break to warn user about `--skip-tags=with_pkg` feature.
This warning should be part of the first prompt.
ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.
Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
John Fulton [Sun, 25 Mar 2018 20:36:27 +0000 (20:36 +0000)]
Refer to expected-num-ojects as expected_num_objects, not size
Follow up patch to PR 2432 [1] which replaces "size" (sorry if
the original bug used that term, which can be confusing) with
expected_num_objects as is used in the Ceph documentation [2].
Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]
ceph-defaults: set is_atomic variable
This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.
Andy McCrae [Mon, 12 Mar 2018 14:13:53 +0000 (14:13 +0000)]
Simplify ceph.conf generation
Since the approach to creating a ceph.conf file has changed, and now
no-longer relies on assembling config file fragments in /etc/ceph/ceph.d
we can avoid the conf_overrides rendering on the local host and skip out
the tasks related to that, instead using just the config_template task
to configure the file directly.
Sébastien Han [Wed, 7 Mar 2018 16:28:20 +0000 (17:28 +0100)]
ci: re-arrange group_vars files
We should stop putting everything in 'all'. This is too easy and this is
error prone as well for those who are separating variables into host
type, things that you should do.
Sébastien Han [Tue, 6 Mar 2018 13:22:48 +0000 (14:22 +0100)]
mon: add support for pgp, pool type and rule name
When creating pools, it's crucial to expose all the options available as
part of the pool creation command. As explained in:
http://docs.ceph.com/docs/jewel/rados/operations/pools/
jtudelag [Thu, 8 Mar 2018 15:54:43 +0000 (16:54 +0100)]
Tune ansible.cfg
Based on the OpenShift one:
https://docs.openshift.com/container-platform/3.7/scaling_performance/install_practices.html#scaling-performance-install-optimization
The `pools` dict defined in `roles/ceph-client/defaults/main.yml`
shouldn't have `{{ ceph_conf_overrides.global.osd_pool_default_pg_num
}}` as default value for `pgs` keys.
For instance, if you want some pools to be created but without explicitely
specifying the pgs for these pools (it means you want to use the
`osd_pool_default_pg_num`), you will be obliged to define
`{{ ceph_conf_overrides.global.osd_pool_default_pg_num }}` anyway while you
wanted to use the current default value already defined in the cluster which is
retrieved early in the playbook and stored in the
`{{ osd_pool_default_pg_num }}` fact.
Sébastien Han [Fri, 2 Mar 2018 14:50:01 +0000 (15:50 +0100)]
mon: fix osd_pool_default_crush_rule persistence and effectiveness
Running the last portion (insert new default and add new default crush
tasks) of crush_rules.yml only on the last monitor is
wrong since ceph CLI calls usually end up on the master having the
quorum, which is by default the one with the lower IP.
So if we run the command and end up on another mon the creation will
happen on the default crush rule because the particular mon hasn't been
updated.
To fix this we remove the |last on the include and use run_once: true on
certain tasks, then we let the final two tasks run on all the monitors.
Sébastien Han [Wed, 21 Feb 2018 14:56:32 +0000 (15:56 +0100)]
osd: remove old crush_location implementation
This was causing a lot of pain with the handlers. Also the
implementation was not ideal since we were assembling files. Everything
can now be done with the ceph_crush module so let's remove that.
Sébastien Han [Mon, 19 Feb 2018 09:13:06 +0000 (10:13 +0100)]
add ceph_crush module
This module allows us to create Ceph CRUSH hierarchy. The module works
with
hostvars from individual OSD hosts.
Here is an example of the expected configuration in the inventory file:
Greg Charot [Tue, 6 Feb 2018 18:44:03 +0000 (19:44 +0100)]
mons: Current crush_rule playbook does not work if there is no default rule defined (default: true).
One could want to add new crush rules while keeping his current default rule.
Fixed it so that it works with all rules defined as "default: false". If multiple rules are defined as default (should not be) then the last rule listed in "crush_rules" is taken as default.
Sébastien Han [Wed, 28 Feb 2018 16:08:07 +0000 (17:08 +0100)]
add support for installation checkpoint
This was taken from the openshift ansible repository here:
https://github.com/leseb/openshift-ansible/tree/master/roles/installer_checkpoint
Rationale:
A complete OpenShift cluster installation is comprised of many different
components which can take 30 minutes to several hours to complete. If
the installation should fail, it could be confusing to understand at
which component the failure occurred. Additionally, it may be desired to
re-run only the component which failed instead of starting over from the
beginning. Components which came after the failed component would also
need to be run individually.
Ceph has a similar situation so we can benefit from that
callback_plugin.
The jobs launches by the CI are not using 'ansible.cfg'.
There are some parameters that should avoid SSH failure that we are used
to see in the CI so far.
client: use `ceph_uid` fact to set uid/gid on admin key
That task is failing on containerized deployment because `ceph:ceph`
doesn't exist.
The idea here is to use the `{{ ceph_uid }}` to set the ownerships for
the admin keyring when containerized_deployment.
Andy McCrae [Wed, 21 Feb 2018 08:41:27 +0000 (08:41 +0000)]
Revert "[TEST] Test setting up correct systemd file for nfs-ganesha"
The nfs-ganesha package has been fixed as part of this commit:
https://github.com/nfs-ganesha/nfs-ganesha-debian/commit/963b6681dfac459c27c947cb8decc788bc9e5422
Once the package is rebuilt this should be good to merge.
Giulio Fidente [Thu, 22 Feb 2018 18:57:47 +0000 (19:57 +0100)]
Make rule_name optional when defining items in openstack_pools
Previously it was necessary to provide a value (eventually an
empty string) for the "rule_name" key for each item in
openstack_pools. This change makes that optional and defaults to
empty string when not given.