]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agoceph_key: apply permissions using ansible code module
Sébastien Han [Fri, 16 Nov 2018 09:46:10 +0000 (10:46 +0100)]
ceph_key: apply permissions using ansible code module

Instead of applying file permissions from our code, let's rely on the
ansible code 'file' module for this. This is now handled at the task
declaration level instead of inside the module.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agofw: update rules for mon/mgr collocation
Sébastien Han [Fri, 26 Oct 2018 10:12:20 +0000 (12:12 +0200)]
fw: update rules for mon/mgr collocation

Since we now deploy mgr on mon we need to open fw rules so the mgr can
reach out to the osds.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: remove old ubuntu login status
Sébastien Han [Fri, 16 Nov 2018 09:31:57 +0000 (10:31 +0100)]
mon: remove old ubuntu login status

We don't support Ubuntu Precise, so this feature does not exists
anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosites: fail the playbook on any failure
Sébastien Han [Mon, 26 Nov 2018 10:06:10 +0000 (11:06 +0100)]
sites: fail the playbook on any failure

We need to apply   any_errors_fatal: true to every play so it can take
effect, not only on the initial pass. With this flag, any error in the
playbook will cause the playbook to stop.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosite-container: retry image pull
Sébastien Han [Mon, 26 Nov 2018 10:05:13 +0000 (11:05 +0100)]
site-container: retry image pull

Sometimes pulling an image fails for network hickup, so let's retry 5
times at 5sec interval.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotravis: run modules unit tests
Sébastien Han [Fri, 16 Nov 2018 09:57:14 +0000 (10:57 +0100)]
travis: run modules unit tests

Travis now runs our modules unit tests to make sure they always pass.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agomon: secure cluster on container
Sébastien Han [Fri, 16 Nov 2018 09:29:05 +0000 (10:29 +0100)]
mon: secure cluster on container

Add the ability to protect pools on containerized clusters.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoosd: remove a leftover
Guillaume Abrioux [Sat, 1 Dec 2018 04:46:16 +0000 (05:46 +0100)]
osd: remove a leftover

this file is never included in ceph-osd, looks like a leftover let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove an incorrect information
Guillaume Abrioux [Sat, 1 Dec 2018 04:24:11 +0000 (05:24 +0100)]
osd: remove an incorrect information

This is false, `./defaults/main.yml` is not supposed to be modified
directly. groups_vars a/o host_vars should always be preferred.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoremove kv store support
Guillaume Abrioux [Mon, 26 Nov 2018 13:54:02 +0000 (14:54 +0100)]
remove kv store support

the next stable release will drop this feature.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: create missing keyring only on running mon
Guillaume Abrioux [Thu, 29 Nov 2018 15:04:28 +0000 (16:04 +0100)]
rolling_update: create missing keyring only on running mon

try to create the potentially missing keys only on monitors that are
actually running.
The current node being played is stopped before this task.
By the way, delegating the command on all nodes but the current node
being played ensures that the generated keys will be present on all
monitors.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoAdd missing space before }}
Christian Berendt [Thu, 29 Nov 2018 13:58:13 +0000 (14:58 +0100)]
Add missing space before }}

This will fix the following yamllint warning:

Variables should have spaces after {{ and before }}

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>
6 years agoconfig: write jinja comment with appropriate syntax
Guillaume Abrioux [Thu, 29 Nov 2018 09:16:52 +0000 (10:16 +0100)]
config: write jinja comment with appropriate syntax

jinja comment should be written using the jinja syntax `{# ... #}`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: default ceph json output to empty dict
Sébastien Han [Wed, 28 Nov 2018 23:27:49 +0000 (00:27 +0100)]
rolling_update: default ceph json output to empty dict

So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agovalidate: change default value for `radosgw_address`
Guillaume Abrioux [Wed, 28 Nov 2018 19:53:10 +0000 (20:53 +0100)]
validate: change default value for `radosgw_address`

change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: rgw_multisite allow clusters to talk to each other
Guillaume Abrioux [Wed, 28 Nov 2018 17:46:45 +0000 (18:46 +0100)]
tests: rgw_multisite allow clusters to talk to each other

Adding this rule on the hypervisor will allow cluster to talk to each
other.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: update default pg num and pool size for podman scenario
Guillaume Abrioux [Wed, 28 Nov 2018 10:30:25 +0000 (11:30 +0100)]
tests: update default pg num and pool size for podman scenario

bring the recent refact about `osd_pool_default_pg_num` and
`osd_pool_default_size` into podman scenario as well.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: fix image tag for secondary rgw cluster (rgw_multisite)
Guillaume Abrioux [Tue, 27 Nov 2018 13:50:18 +0000 (14:50 +0100)]
tests: fix image tag for secondary rgw cluster (rgw_multisite)

the first cluster is using `latest-master` while the second is using
`latest` which is not the right version to be used here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: apply dev_setup on the secondary cluster for rgw_multisite
Guillaume Abrioux [Tue, 27 Nov 2018 09:26:41 +0000 (10:26 +0100)]
tests: apply dev_setup on the secondary cluster for rgw_multisite

we must apply this playbook before deploying the secondary cluster.
Otherwise, there will be a mismatch between the two deployed cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: fix mgr keyring error on rolling_update
Guillaume Abrioux [Tue, 27 Nov 2018 12:42:41 +0000 (13:42 +0100)]
mgr: fix mgr keyring error on rolling_update

when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018  12:03:34 +0000 (0:00:00.296)       0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-osd fix batch with container binary
Sébastien Han [Tue, 27 Nov 2018 09:03:07 +0000 (10:03 +0100)]
ceph-osd fix batch with container binary

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: fix after rebase
Sébastien Han [Tue, 27 Nov 2018 08:59:07 +0000 (09:59 +0100)]
ceph_key: fix after rebase

Fix the tests

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agofix template generation
Sébastien Han [Mon, 26 Nov 2018 16:22:04 +0000 (17:22 +0100)]
fix template generation

Position the right condition on ceph_docker_version, activate it when
the container_binary is 'docker'.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agocontainer-common: remove leftover
Sébastien Han [Mon, 26 Nov 2018 10:52:04 +0000 (11:52 +0100)]
container-common: remove leftover

ntp is installation is managed by the ceph-infra role.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoshrink-osd: add missing CEPH_BINARY
Sébastien Han [Thu, 22 Nov 2018 16:32:25 +0000 (17:32 +0100)]
shrink-osd: add missing CEPH_BINARY

We need to add the right binary to do the docker exec.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agodefaults: play set_radosgw_address.yml only on rgw nodes
Guillaume Abrioux [Thu, 22 Nov 2018 13:38:57 +0000 (14:38 +0100)]
defaults: play set_radosgw_address.yml only on rgw nodes

This is not needed to play these tasks on nodes that are not in rgw
group.

Always playing this code makes `shrink_mon.yml` failing.

Typical error:

```
TASK [ceph-defaults : set_fact _radosgw_address to radosgw_interface - ipv4] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-shrink_mon/roles/ceph-defaults/tasks/set_radosgw_address.yml:21
Thursday 22 November 2018  12:34:51 +0000 (0:00:00.154)       0:00:12.371 *****
fatal: [localhost]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute u'ansible_eth1'
```

Indeed, `radosgw_interface` is the network interface on rgw only. It is
expected that this same interface doesn't exist on `localhost`, so, when
running `shrink_mon.yml`, the role `ceph-defaults` is called in
`hosts: localhost` and causes the playbook to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: declare container_binary
Sébastien Han [Tue, 20 Nov 2018 21:29:53 +0000 (22:29 +0100)]
defaults: declare container_binary

Always declare container_binary and assign it a correct value.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph-defaults: use podman on Fedora only
Sébastien Han [Tue, 20 Nov 2018 10:28:02 +0000 (11:28 +0100)]
ceph-defaults: use podman on Fedora only

It seems Atomic 7.5 has podman already, however this is an old version
(0.4). The podman integration is targetting RHEL 8, so Fedora is
currently the closest to that.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosite: symlink site-docker to site-container
Sébastien Han [Tue, 20 Nov 2018 13:35:41 +0000 (14:35 +0100)]
site: symlink site-docker to site-container

We deprecated site-docker to site-container so let's have a symlink for
backward compatibility.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agorolling_update: update ceph_key task for container
Sébastien Han [Mon, 19 Nov 2018 16:16:00 +0000 (17:16 +0100)]
rolling_update: update ceph_key task for container

Use the new way to create keys on containerized env as introduced by: 1098b71bda90db3dad19ac179f0ba900ccb0f953

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoinfra playbooks: use the right container binary
Sébastien Han [Mon, 19 Nov 2018 16:13:37 +0000 (17:13 +0100)]
infra playbooks: use the right container binary

Use podman or docker wether they are available or not. podman will be
prioritized over docker if present.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agosite: choose the right container runtime binary
Sébastien Han [Mon, 19 Nov 2018 14:14:55 +0000 (15:14 +0100)]
site: choose the right container runtime binary

We need to verify wether podman exists or not, if yes we use it instead
of docker.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoiscsi: expose /dev/log in the container
Sébastien Han [Mon, 19 Nov 2018 13:12:43 +0000 (14:12 +0100)]
iscsi: expose /dev/log in the container

During its initialisation both rbd-target-api and rbd-target-gw try to
open /dev/log for their syslog handler. If the device is not present the
service fails to start. Thus expose /dev/log from the host in the
container solves that problem.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotestinfra: linting
Sébastien Han [Mon, 19 Nov 2018 10:09:30 +0000 (11:09 +0100)]
testinfra: linting

Make flake8 happy on the testinfra files.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotestinfra: add support for podman
Sébastien Han [Mon, 19 Nov 2018 10:03:45 +0000 (11:03 +0100)]
testinfra: add support for podman

Since we are now testing on docker and podman our functionnal tests must
reflect that. So now, if we detect the podman binary we will use it,
otherwise we default to docker.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: fix rstrip for python 3
Sébastien Han [Mon, 19 Nov 2018 08:56:45 +0000 (09:56 +0100)]
ceph_key: fix rstrip for python 3

Removing bytes literals since rstrip only supports type String or None.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotest_lookup_ceph_initial_entities: fix
Sébastien Han [Fri, 16 Nov 2018 09:42:34 +0000 (10:42 +0100)]
test_lookup_ceph_initial_entities: fix

The previous dict was missing 2 entities:

* client.bootstrap-mgr
* client.bootstrap-rbd-mirror

So the test was failing since it expects 7 entities to match.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotest_build_key_path_bootstrap_osd: fix
Sébastien Han [Fri, 16 Nov 2018 09:41:12 +0000 (10:41 +0100)]
test_build_key_path_bootstrap_osd: fix

The entity name is client.bootstrap-osd (as returned by Ceph), and not
bootstrap-osd. The build_key_path function split 'client.bootstrap-osd'
on the '.' so using bootstrap-osd fails with index out of range.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: remove set-uid support
Sébastien Han [Fri, 16 Nov 2018 09:37:07 +0000 (10:37 +0100)]
ceph_key: remove set-uid support

The support of set-uid was remove from Ceph during the Nautilus cycle by
the following commit: d6def8ba1126209f8dcb40e296977dc2b09a376e so this
will not work anymore when deploying Nautilus clusters and above.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: use the right container runtime binary
Sébastien Han [Sun, 18 Nov 2018 15:10:11 +0000 (16:10 +0100)]
ceph_key: use the right container runtime binary

Rework all the ceph_key invocation to use either docker or podman
binary.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoclient: do not use a dummy container anymore
Sébastien Han [Sat, 17 Nov 2018 18:58:54 +0000 (19:58 +0100)]
client: do not use a dummy container anymore

Since 84fcf4639140c390a7f1fcd790ba190503713f86 we now use the container
binary cli to create ceph keys instead of creating a container and
'docker execing' into it.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoAdd new container scenario
Sébastien Han [Thu, 8 Nov 2018 09:02:37 +0000 (10:02 +0100)]
Add new container scenario

Test with podman instead of docker and also support for python 3 only.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: rework container support
Sébastien Han [Fri, 16 Nov 2018 09:46:10 +0000 (10:46 +0100)]
ceph_key: rework container support

Previously, we were doing a 'docker exec' inside a mon container, this
worked but this wasn't ideal since it required a mon to be up to
generate keys. We must be able to generate a key without a running mon,
e.g, when we create the initial key or simply when you want to generate
a key from any node that is not a mon.
Now, just like the ceph_volume module we use a 'docker run' command with
the right binary as an entrypoint to perform the choosen action, this is
more elegant and also only requires an env variable to be set in the
playbook: CEPH_CONTAINER_IMAGE.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agohandler: show unit logs on error
Sébastien Han [Tue, 27 Nov 2018 09:45:05 +0000 (10:45 +0100)]
handler: show unit logs on error

This will tremendously help debugging daemons that fail on restart by
showing the systemd unit logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agovalidate: add nautilus release
Guillaume Abrioux [Mon, 26 Nov 2018 22:29:50 +0000 (23:29 +0100)]
validate: add nautilus release

validate must accept ceph nautilus release.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-volume: be idempotent when the batch strategy changes
Andrew Schoen [Tue, 20 Nov 2018 20:28:58 +0000 (14:28 -0600)]
ceph-volume: be idempotent when the batch strategy changes

If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both
HDD drives will be filtered out, because they're already used by ceph.
ceph-volume will report this as a 'strategy change' because the device
list went from a mixed type of HDD and SDD to a single type of only SDD.

This situation results in a non-zero exit code from ceph-volume. We want
to handle this situation gracefully and report that nothing will be changed.
A similar json structure to what would have been given by ceph-volume is
returned in the 'stdout' key.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agoosd: expose udev into the container
Sébastien Han [Mon, 26 Nov 2018 16:58:49 +0000 (17:58 +0100)]
osd: expose udev into the container

In order to be able to retrieve udev information, we must expose its
socket. As per, https://github.com/ceph/ceph/pull/25201 ceph-volume will
start consuming udev output.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoupdate: fix a typo
Guillaume Abrioux [Mon, 26 Nov 2018 13:10:19 +0000 (14:10 +0100)]
update: fix a typo

`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: do not fully override previous ceph_conf_overrides
Guillaume Abrioux [Thu, 22 Nov 2018 10:33:20 +0000 (11:33 +0100)]
tests: do not fully override previous ceph_conf_overrides

We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: refact set_fact `mon_host`
Guillaume Abrioux [Thu, 22 Nov 2018 16:52:58 +0000 (17:52 +0100)]
rolling_update: refact set_fact `mon_host`

each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018  14:02:30 +0000 (0:00:07.493)       0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: create rbd and rbd-mirror keyrings
Sébastien Han [Wed, 21 Nov 2018 15:18:58 +0000 (16:18 +0100)]
rolling_update: create rbd and rbd-mirror keyrings

During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: add a get_key function
Sébastien Han [Wed, 21 Nov 2018 15:17:04 +0000 (16:17 +0100)]
ceph_key: add a get_key function

When checking if a key exists we also have to ensure that the key exists
on the filesystem, the key can change on Ceph but still have an outdated
version on the filesystem. This solves this issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoswitch: do not look for devices anymore
Sébastien Han [Mon, 19 Nov 2018 13:58:03 +0000 (14:58 +0100)]
switch: do not look for devices anymore

It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoswitch: disable all ceph units
Sébastien Han [Fri, 16 Nov 2018 15:15:24 +0000 (16:15 +0100)]
switch: disable all ceph units

Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoswitch: do not mask systemd unit
Sébastien Han [Tue, 13 Nov 2018 16:43:21 +0000 (17:43 +0100)]
switch: do not mask systemd unit

If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0

Fixes the error:  Failed to execute operation: Cannot send after transport endpoint shutdown

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agotests: change default pools size
Guillaume Abrioux [Wed, 21 Nov 2018 16:28:31 +0000 (17:28 +0100)]
tests: change default pools size

default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoclient: change default pool size
Guillaume Abrioux [Wed, 21 Nov 2018 16:28:00 +0000 (17:28 +0100)]
client: change default pool size

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: change default size for openstack pools
Guillaume Abrioux [Wed, 21 Nov 2018 16:27:11 +0000 (17:27 +0100)]
defaults: change default size for openstack pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: change for default pool size for cephfs_pools
Guillaume Abrioux [Wed, 21 Nov 2018 16:08:19 +0000 (17:08 +0100)]
defaults: change for default pool size for cephfs_pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: add ceph related vars file
Guillaume Abrioux [Wed, 21 Nov 2018 10:06:45 +0000 (11:06 +0100)]
defaults: add ceph related vars file

This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorefact osd pool size customization
Guillaume Abrioux [Wed, 21 Nov 2018 10:00:11 +0000 (11:00 +0100)]
refact osd pool size customization

Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.

If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.

By the way, this kind of condition isn't really clear:
```
when:
  - rbd_pool_size | default ("")
```

we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomon: move `osd_pool_default_pg_num` in `ceph-defaults`
Guillaume Abrioux [Tue, 13 Nov 2018 14:40:35 +0000 (15:40 +0100)]
mon: move `osd_pool_default_pg_num` in `ceph-defaults`

`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoconfig: convert _osd_memory_target to int
Guillaume Abrioux [Wed, 21 Nov 2018 13:38:25 +0000 (14:38 +0100)]
config: convert _osd_memory_target to int

ceph.conf doesn't accept float value.

Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```

This commit ensures the value inserted in ceph.conf will be an integer.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agosite: resync container playbook
Sébastien Han [Tue, 20 Nov 2018 17:03:14 +0000 (18:03 +0100)]
site: resync container playbook

This PR https://github.com/ceph/ceph-ansible/pull/3251 forgot to create a symlink from site-docker.yml.sample to site-container.yml.sample.
This commit resyncs and put the symlink in place.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agodefaults/facts: Use list instead of keys
Boris Ranto [Tue, 20 Nov 2018 10:46:08 +0000 (11:46 +0100)]
defaults/facts: Use list instead of keys

It is safer to use the list filter than the keys() method since the keys
method does have some interoperability issues between python2 and
python3 based ansible/jinja.

Signed-off-by: Boris Ranto <branto@redhat.com>
6 years agostart_osds: Use list instead of keys
Boris Ranto [Mon, 19 Nov 2018 23:45:40 +0000 (00:45 +0100)]
start_osds: Use list instead of keys

If you use python3 based ansible then keys() returns a dict_keys object,
not a list of keys. This breaks the installation on such a system. Using
the list filter provides a more robust solution that should work on both
python2 and python3 based ansible. You can find some more information
about the issue, here:

https://github.com/ansible/ansible/issues/19514

Signed-off-by: Boris Ranto <branto@redhat.com>
6 years agoDiscover rbd facts.
Valentin Lorentz [Mon, 19 Nov 2018 20:49:45 +0000 (21:49 +0100)]
Discover rbd facts.

Signed-off-by: Valentin Lorentz <progval+git@progval.net>
6 years agovalidate plugin: handle missing exception fields without traceback
Dan Mick [Sat, 17 Nov 2018 00:28:54 +0000 (16:28 -0800)]
validate plugin: handle missing exception fields without traceback

"missing variable" errors introduced by PR3058 would attempt to
be reported, but since the exception contained no "path" definition,
would cause a second exception in the Invalid exception handler.
Make the exception handler verify that any field it tries to use
exists, clean up its message formatting, and reduce the verbose
level to see the literal error from notario in case more goes
wrong in future.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
6 years agoosd_memory_target: standardize unit and fix calculation
Neha Ojha [Mon, 19 Nov 2018 06:50:02 +0000 (06:50 +0000)]
osd_memory_target: standardize unit and fix calculation

* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.

* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target

Signed-off-by: Neha Ojha <nojha@redhat.com>
6 years agodoc: update doc to add stable-3.2 information
Guillaume Abrioux [Mon, 19 Nov 2018 08:42:08 +0000 (09:42 +0100)]
doc: update doc to add stable-3.2 information

Since the branch has been created, we must reflect it in the doc.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph.ceph-container-common remove symlink
Sébastien Han [Sun, 18 Nov 2018 20:48:47 +0000 (21:48 +0100)]
ceph.ceph-container-common remove symlink

This error was introduced in the recent refactor of ceph-docker-common
in https://github.com/ceph/ceph-ansible/pull/3251. However, the Ansible
galaxy linter is not happy about it and fails importing the role.
Removing this since it's not used anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoclient: fix a typo in create_users_keys.yml
Guillaume Abrioux [Sat, 17 Nov 2018 16:40:35 +0000 (17:40 +0100)]
client: fix a typo in create_users_keys.yml

cd1e4ee024ef400ded25e8c99948648ead3a0892 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoinfra: don't restart firewalld if unit is masked
Guillaume Abrioux [Thu, 15 Nov 2018 20:56:11 +0000 (21:56 +0100)]
infra: don't restart firewalld if unit is masked

if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoRemove outdated documentation
Noah Watkins [Thu, 15 Nov 2018 22:04:45 +0000 (14:04 -0800)]
Remove outdated documentation

Fixes BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1640525

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agotox: add lvm setup to shrink mon
Sébastien Han [Tue, 13 Nov 2018 16:54:14 +0000 (17:54 +0100)]
tox: add lvm setup to shrink mon

Fix shrink mon scenario by setting lvm so we can configure ceph-volume
lvm osds.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoosd: commonize start_osd code
Guillaume Abrioux [Wed, 7 Nov 2018 10:45:29 +0000 (11:45 +0100)]
osd: commonize start_osd code

since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: set pool size to 1 in ceph-override.json
Guillaume Abrioux [Thu, 8 Nov 2018 08:08:28 +0000 (09:08 +0100)]
tests: set pool size to 1 in ceph-override.json

setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agosite-docker: rename to 'site-container.yml.sample'
Guillaume Abrioux [Fri, 19 Oct 2018 15:49:34 +0000 (17:49 +0200)]
site-docker: rename to 'site-container.yml.sample'

Add a symlink for backward compatibility

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: rename role
Guillaume Abrioux [Fri, 19 Oct 2018 15:07:55 +0000 (17:07 +0200)]
docker-common: rename role

rename `ceph-docker-common` role to `ceph-container-common`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove system_checks.yml
Guillaume Abrioux [Fri, 19 Oct 2018 14:58:44 +0000 (16:58 +0200)]
docker-common: remove system_checks.yml

This check is now part of `ceph-validate`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove check_mandatory_vars.yml
Guillaume Abrioux [Fri, 19 Oct 2018 14:51:38 +0000 (16:51 +0200)]
docker-common: remove check_mandatory_vars.yml

this is part of `ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove dirs_permissions.yml
Guillaume Abrioux [Fri, 19 Oct 2018 14:06:44 +0000 (16:06 +0200)]
docker-common: remove dirs_permissions.yml

this is already done in `ceph-config` role.
Let's remove this duplicated task.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove legacy tasks for ntp configuration
Guillaume Abrioux [Fri, 19 Oct 2018 13:57:18 +0000 (15:57 +0200)]
docker-common: remove legacy tasks for ntp configuration

Those tasks aren't needed in docker-common since the introduction of
`ceph-infra` role. They are duplicated tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove duplicate running cluster check
Guillaume Abrioux [Fri, 19 Oct 2018 13:52:28 +0000 (15:52 +0200)]
docker-common: remove duplicate running cluster check

this is already done in ceph-defaults, there is no need to have this
check in ceph-docker-common.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodocker-common: remove duplicate set_fact (monitor_name)
Guillaume Abrioux [Wed, 17 Oct 2018 15:30:36 +0000 (17:30 +0200)]
docker-common: remove duplicate set_fact (monitor_name)

this fact is already set in ceph-defaults, there is no need to set it
again in ceph-docker-common

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoadd quotes around package names added in da6f384
Rishabh Dave [Thu, 8 Nov 2018 13:08:07 +0000 (18:38 +0530)]
add quotes around package names added in da6f384

Add quotes around package names added in the commit
da6f38422396307605d62ef63980bd0c5b7868f6 so that the difference between
the Ansible variables and package names is clear.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agopass the list of packages to package management modules
Rishabh Dave [Thu, 8 Nov 2018 09:26:58 +0000 (04:26 -0500)]
pass the list of packages to package management modules

Instead of looping over a list of packages or repeating the task
separately for different packages, pass the list of packages to the
task performing package management.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agorbd-mirror: use the new rbd-mirror key
Sébastien Han [Mon, 5 Nov 2018 16:14:31 +0000 (17:14 +0100)]
rbd-mirror: use the new rbd-mirror key

Instead of using the old rbd key let's use the new rbr-mirror key to
bootstrap the rbd -mirror daemon.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoceph_key: add fetch_initial_keys capability
Sébastien Han [Tue, 23 Oct 2018 16:38:40 +0000 (18:38 +0200)]
ceph_key: add fetch_initial_keys capability

This is needed for Nautilus since the ceph-create-keys script goes away.
(https://github.com/ceph/ceph/pull/21305)
Now the module if called with 'state: fetch_initial_keys' will lookup
keys generated by the monitor and write them down on the filesystem to
the right location (/etc/ceph and /var/lib/ceph/boostrap*).
This is not applicable to container since keys are generated by the
container only.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoigw: open iscsi target port
Mike Christie [Tue, 30 Oct 2018 19:03:37 +0000 (14:03 -0500)]
igw: open iscsi target port

Open the port the iscsi target uses for iscsi traffic.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: use api_port variable for firewall port setting
Mike Christie [Thu, 8 Nov 2018 21:23:24 +0000 (15:23 -0600)]
igw: use api_port variable for firewall port setting

Don't hard code api port because it might be overridden by the user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: fix firewall iscsi_group_name check
Mike Christie [Tue, 30 Oct 2018 18:54:52 +0000 (13:54 -0500)]
igw: fix firewall iscsi_group_name check

The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: Fix default api port
Mike Christie [Tue, 30 Oct 2018 18:54:03 +0000 (13:54 -0500)]
igw: Fix default api port

The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoigw: stop tcmu-runner on iscsi purge
Mike Christie [Thu, 8 Nov 2018 21:38:08 +0000 (15:38 -0600)]
igw: stop tcmu-runner on iscsi purge

When the iscsi purge playbook is run we stop the gw and api daemons but
not tcmu-runner which I forgot on the previous PR.

Fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agovalidate: do not validate ceph_repository if deploying containers
Andrew Schoen [Wed, 31 Oct 2018 15:25:26 +0000 (10:25 -0500)]
validate: do not validate ceph_repository if deploying containers

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1630975
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agodon't use "role" or "roles" to include roles
Noah Watkins [Wed, 7 Nov 2018 20:54:45 +0000 (12:54 -0800)]
don't use "role" or "roles" to include roles

see 3f62fc585f60fcaecbc783316a99cd8314aab062

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agoFix comments in shrink-osd-ceph-disk playbook
Noah Watkins [Wed, 7 Nov 2018 20:54:38 +0000 (12:54 -0800)]
Fix comments in shrink-osd-ceph-disk playbook

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agoAdd a ceph-volume aware shrink-osd playbook
Noah Watkins [Wed, 31 Oct 2018 18:17:16 +0000 (11:17 -0700)]
Add a ceph-volume aware shrink-osd playbook

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
6 years agoFixup shrink_osd[_container] scenario config
Noah Watkins [Tue, 6 Nov 2018 16:49:39 +0000 (08:49 -0800)]
Fixup shrink_osd[_container] scenario config

** configuration seems to be for filestore:

[ERROR]: [ceph-osd0] Validation failed for variable: lvm_volumes

** Removing `radosgw_interface: eth1` to resolve:

The task includes an option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
u'ansible_eth1'

The error appears to have been in
'/home/nwatkins/src/ceph-ansible/roles/ceph-defaults/tasks/set_radosgw_address.yml':
line 21, column 5, but may be elsewhere in the file depending on the
exact syntax problem.

The offending line appears to be:

  - name: set_fact _radosgw_address to radosgw_interface - ipv4
    ^ here

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
6 years agoRename ceph-disk version of shrink-osd playbook
Noah Watkins [Wed, 31 Oct 2018 18:14:08 +0000 (11:14 -0700)]
Rename ceph-disk version of shrink-osd playbook

This will be replaced by a ceph-volume aware verison.

Signed-off-by: Noah Watkins <nwatkins@redhat.com>