Sébastien Han [Fri, 16 Nov 2018 09:41:12 +0000 (10:41 +0100)]
test_build_key_path_bootstrap_osd: fix
The entity name is client.bootstrap-osd (as returned by Ceph), and not
bootstrap-osd. The build_key_path function split 'client.bootstrap-osd'
on the '.' so using bootstrap-osd fails with index out of range.
Sébastien Han [Fri, 16 Nov 2018 09:37:07 +0000 (10:37 +0100)]
ceph_key: remove set-uid support
The support of set-uid was remove from Ceph during the Nautilus cycle by
the following commit: d6def8ba1126209f8dcb40e296977dc2b09a376e so this
will not work anymore when deploying Nautilus clusters and above.
Sébastien Han [Sat, 17 Nov 2018 18:58:54 +0000 (19:58 +0100)]
client: do not use a dummy container anymore
Since 84fcf4639140c390a7f1fcd790ba190503713f86 we now use the container
binary cli to create ceph keys instead of creating a container and
'docker execing' into it.
Sébastien Han [Fri, 16 Nov 2018 09:46:10 +0000 (10:46 +0100)]
ceph_key: rework container support
Previously, we were doing a 'docker exec' inside a mon container, this
worked but this wasn't ideal since it required a mon to be up to
generate keys. We must be able to generate a key without a running mon,
e.g, when we create the initial key or simply when you want to generate
a key from any node that is not a mon.
Now, just like the ceph_volume module we use a 'docker run' command with
the right binary as an entrypoint to perform the choosen action, this is
more elegant and also only requires an env variable to be set in the
playbook: CEPH_CONTAINER_IMAGE.
Andrew Schoen [Tue, 20 Nov 2018 20:28:58 +0000 (14:28 -0600)]
ceph-volume: be idempotent when the batch strategy changes
If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both
HDD drives will be filtered out, because they're already used by ceph.
ceph-volume will report this as a 'strategy change' because the device
list went from a mixed type of HDD and SDD to a single type of only SDD.
This situation results in a non-zero exit code from ceph-volume. We want
to handle this situation gracefully and report that nothing will be changed.
A similar json structure to what would have been given by ceph-volume is
returned in the 'stdout' key.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306 Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Sébastien Han [Mon, 26 Nov 2018 16:58:49 +0000 (17:58 +0100)]
osd: expose udev into the container
In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.
tests: do not fully override previous ceph_conf_overrides
We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.
Sébastien Han [Wed, 21 Nov 2018 15:18:58 +0000 (16:18 +0100)]
rolling_update: create rbd and rbd-mirror keyrings
During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572 Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 21 Nov 2018 15:17:04 +0000 (16:17 +0100)]
ceph_key: add a get_key function
When checking if a key exists we also have to ensure that the key exists
on the filesystem, the key can change on Ceph but still have an outdated
version on the filesystem. This solves this issue.
Sébastien Han [Fri, 16 Nov 2018 15:15:24 +0000 (16:15 +0100)]
switch: disable all ceph units
Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.
Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.
If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.
By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```
we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.
mon: move `osd_pool_default_pg_num` in `ceph-defaults`
`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.
Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```
This commit ensures the value inserted in ceph.conf will be an integer.
Sébastien Han [Tue, 20 Nov 2018 17:03:14 +0000 (18:03 +0100)]
site: resync container playbook
This PR https://github.com/ceph/ceph-ansible/pull/3251 forgot to create a symlink from site-docker.yml.sample to site-container.yml.sample.
This commit resyncs and put the symlink in place.
Boris Ranto [Tue, 20 Nov 2018 10:46:08 +0000 (11:46 +0100)]
defaults/facts: Use list instead of keys
It is safer to use the list filter than the keys() method since the keys
method does have some interoperability issues between python2 and
python3 based ansible/jinja.
Boris Ranto [Mon, 19 Nov 2018 23:45:40 +0000 (00:45 +0100)]
start_osds: Use list instead of keys
If you use python3 based ansible then keys() returns a dict_keys object,
not a list of keys. This breaks the installation on such a system. Using
the list filter provides a more robust solution that should work on both
python2 and python3 based ansible. You can find some more information
about the issue, here:
Dan Mick [Sat, 17 Nov 2018 00:28:54 +0000 (16:28 -0800)]
validate plugin: handle missing exception fields without traceback
"missing variable" errors introduced by PR3058 would attempt to
be reported, but since the exception contained no "path" definition,
would cause a second exception in the Invalid exception handler.
Make the exception handler verify that any field it tries to use
exists, clean up its message formatting, and reduce the verbose
level to see the literal error from notario in case more goes
wrong in future.
Sébastien Han [Sun, 18 Nov 2018 20:48:47 +0000 (21:48 +0100)]
ceph.ceph-container-common remove symlink
This error was introduced in the recent refactor of ceph-docker-common
in https://github.com/ceph/ceph-ansible/pull/3251. However, the Ansible
galaxy linter is not happy about it and fails importing the role.
Removing this since it's not used anymore.
since `ceph-volume` introduction, there is no need to split those tasks.
Let's refact this part of the code so it's clearer.
By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.
setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.
Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.
Add quotes around package names added in the commit da6f38422396307605d62ef63980bd0c5b7868f6 so that the difference between
the Ansible variables and package names is clear.
Rishabh Dave [Thu, 8 Nov 2018 09:26:58 +0000 (04:26 -0500)]
pass the list of packages to package management modules
Instead of looping over a list of packages or repeating the task
separately for different packages, pass the list of packages to the
task performing package management.
Sébastien Han [Tue, 23 Oct 2018 16:38:40 +0000 (18:38 +0200)]
ceph_key: add fetch_initial_keys capability
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.
Noah Watkins [Tue, 6 Nov 2018 16:49:39 +0000 (08:49 -0800)]
Fixup shrink_osd[_container] scenario config
** configuration seems to be for filestore:
[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes
** Removing `radosgw_interface: eth1` to resolve:
The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'
The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.
The offending line appears to be:
- name: set_fact _radosgw_address to radosgw_interface - ipv4
^ here
Sébastien Han [Tue, 23 Oct 2018 16:38:40 +0000 (18:38 +0200)]
ceph_key: add fetch_initial_keys capability
This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.
Rishabh Dave [Wed, 31 Oct 2018 16:07:25 +0000 (12:07 -0400)]
don't loop over a task using package management modules
For tasks using (Ansible) modules for package management utilities,
pass the list of packages to be installed instead of repeating the task
for each package. Using the latter manner of installing a list of
packages leads to a deprecation warning by ansible-playbook command.
Fixes: https://github.com/ceph/ceph-ansible/issues/3293 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Rishabh Dave [Wed, 31 Oct 2018 14:46:13 +0000 (10:46 -0400)]
remove configuration files for ceph packages on ubuntu clusters
For apt-get, purge command needs to be used, instead of remove command,
to remove related configuration files. Otherwise, packages might be
shown as installed while running dpkg command even after removing them.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061 Signed-off-by: Rishabh Dave <ridave@redhat.com>
VasishtaShastry [Sun, 28 Oct 2018 17:37:21 +0000 (23:07 +0530)]
ceph-validate : Added functions to accept true and flase
ceph-validate used to throw error for setting flags as 'true' or 'false' for True and False
Now user can set the flags 'dmcrypt' and 'osd_auto_discovery' as 'true' or 'false'
Alfredo Deza [Thu, 1 Nov 2018 18:48:06 +0000 (14:48 -0400)]
ceph-common: update_cache whenever a new deb repo is added
The use of a handler meant that the cache would be updated at the very
end of the play, which doesn't work when adding a development repo and
trying to install right after it. This mostly reverts 53cdddf88699263763b36643565e5f846d9d13a8 without an actual `git revert`
because that caused other conflicts.
Sébastien Han [Tue, 30 Oct 2018 16:13:20 +0000 (17:13 +0100)]
lint: skip the linter
Do not run the linter for these 3:
* we use latest for pip docker-py package
* for ssl keys this is a false positive since the inital command is a
'shell' it'll always change
* for keystone, we must use shell since the with_items contains pipes
Sébastien Han [Tue, 30 Oct 2018 14:51:32 +0000 (15:51 +0100)]
lint: add changed_when to command
Calling command should have changed_when false otherwise each time it
runs it will show as 'changed' and this is irrelevant.
Commands should not change things if nothing needs doing