git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

Sébastien Han [Mon, 3 Dec 2018 21:58:19 +0000 (22:58 +0100)]

purge-cluster: remove support for other init system

We only support systemd and use the service module anyway.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3a154fa0ad64f6704a832743571e5d20b84e9813)

commit | commitdiff | tree

Sébastien Han [Mon, 3 Dec 2018 21:46:52 +0000 (22:46 +0100)]

purge-docker-cluster: add support for mgr/mon collocation

Recently we introduced the collocation of mon and mgr by default, so we
don't need to have an explicit mgrs section for this. This means we have
to remove the mgr container on the mon machines too.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 325a159415a0eb8699a45c04b2d8ea233b2157c2)

# Conflicts:
# infrastructure-playbooks/purge-docker-cluster.yml

commit | commitdiff | tree

Sébastien Han [Mon, 3 Dec 2018 15:46:38 +0000 (16:46 +0100)]

purge-docker-cluste: add a task to check hosts

It's useful when running on CI to see what might remain on the machines.
So we list all the containers and images. We expect the list to be
empty.

We fail if we see containers running.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2bcc00896f623e3baa2f306128115134bdce84ce)

commit | commitdiff | tree

Sébastien Han [Thu, 4 Oct 2018 15:40:25 +0000 (17:40 +0200)]

purge-docker-cluster: add ceph-volume support

This commits adds the support for purging cluster that were deployed
with ceph-volume. It also separates nicely with a block intruction the
work to do when lvm is used or not.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1751885bc9292581e5114f88fe5b513cb396ed72)

commit | commitdiff | tree

Bruceforce [Fri, 4 Jan 2019 16:26:02 +0000 (17:26 +0100)]

The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
This has to be pushed directly to stable-3.2 since master has diverged

Signed-off-by: Bruceforce <Bruceforce@users.noreply.github.com>

commit | commitdiff | tree

Rishabh Dave [Wed, 12 Dec 2018 11:23:23 +0000 (16:53 +0530)]

ceph-infra: disable unrequired NTP services

When one of the currently supported NTP services has been set up,
disable rest of the NTP services on Ceph nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 6fa757d34358e90ae3a2f035b50d319193521ec5)

commit | commitdiff | tree

Rishabh Dave [Wed, 12 Dec 2018 11:15:00 +0000 (16:45 +0530)]

ceph-infra: merge ntp_debian.yml and ntp_rpm.yml

Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b03ab607422eda0094d74223d52024a373b7ee9a)

# Conflicts:
# roles/ceph-infra/tasks/ntp_debian.yml
# roles/ceph-infra/tasks/ntp_rpm.yml

commit | commitdiff | tree

Sébastien Han [Tue, 4 Dec 2018 08:59:47 +0000 (09:59 +0100)]

fix json data type

Json is a type structure which is always typed as a string, where before
this we were declaring a dict, which is not a json valid structure.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1663026
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 896676ee80226121785f44f50d1f01fff5aa2fd7)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 2 Jan 2019 15:53:06 +0000 (16:53 +0100)]

update: do not enforce `serial: 1` on client nodes

There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.

NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650184
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 268f2cef821dcb5835bd925c42585ddda5a07861)

commit | commitdiff | tree

Rishabh Dave [Mon, 17 Dec 2018 10:34:46 +0000 (16:04 +0530)]

set any_errors_fatal to true for all host sections

Add `any_errors_fatal: true` to all host sections in `site.yml.sample`
and `site-container.yml.sample` so that the playbook execution
ceases spontaneously and instantaneously when errors occurs.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5f43dae5938b4c0a3bfbafccf9e2aa13816a237f)

commit | commitdiff | tree

Kai Wembacher [Thu, 13 Dec 2018 07:42:49 +0000 (08:42 +0100)]

add support for rocksdb and wal on the same partition in non-collocated

Signed-off-by: Kai Wembacher <kai@ktwe.de>
(cherry picked from commit a273ed7f6038b51d3ddb5198d4f3ab57d45bc328)

commit | commitdiff | tree

Sébastien Han [Tue, 4 Dec 2018 08:21:51 +0000 (09:21 +0100)]

purge: tox add lvm-setup

Since we deploy > purge > deploy the LVs are gone so we much recreate
them.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 656fbd290121a79722bd5f3af4bd44e928e74ae2)

commit | commitdiff | tree

Andrew Schoen [Tue, 11 Dec 2018 16:52:26 +0000 (10:52 -0600)]

purge-cluster: skip tasks that use ceph-volume if it's not installed

This will allow the playbook to be idempotent.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1656935
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit ffd56177e7616ba6345f1f1cc1f3b3e6ea7d66f3)

commit | commitdiff | tree

Noah Watkins [Thu, 6 Dec 2018 18:34:49 +0000 (10:34 -0800)]

ceph_keys: pass in module for error messages

fixes: #3421

Signed-off-by: Noah Watkins <nwatkins@redhat.com>
(cherry picked from commit 114fac15dc3200bbf9da183c75d889fd75794654)

commit | commitdiff | tree

Sébastien Han [Mon, 10 Dec 2018 08:47:39 +0000 (09:47 +0100)]

RELASE-NOTE: fix PR links

Fix wrong position of link and names. The format is [name](link).

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 10:38:22 +0000 (11:38 +0100)]

Add release note for stable-3.2

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 4 Dec 2018 09:29:22 +0000 (10:29 +0100)]

tests: reintroduce purge_cluster scenario

- reintroduce `purge_cluster_container` and `purge_cluster_non_container`
on `stable-3.2`,
- remove all purge scenario based on ceph-disk,
- remove purge_lvm_osds_* scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Nov 2018 10:25:25 +0000 (11:25 +0100)]

tests: add purge_lvm_osds_container scenario

This commits adds the purge_lvm_osds_container scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b04fe72f35a1d857968463b8ac0421e9b0e03872)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 29 Nov 2018 16:52:18 +0000 (17:52 +0100)]

purge: add iscsi support

add iscsi support for both non containerized and containerized
deployment in purge playbooks.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1651054
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 78116fa6dbe269d1319213251b01e43f1b8f3cff)

commit | commitdiff | tree

Guillaume Abrioux [Fri, 30 Nov 2018 16:12:21 +0000 (17:12 +0100)]

revert infra: don't restart firewalld if unit is masked

If firewalld unit is masked, setting `configure_firewall: false` is
enough

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655059
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1cff1f98065bf3b4056810a15998411f7300b58a)

commit | commitdiff | tree

Ramana Raja [Mon, 3 Dec 2018 14:25:42 +0000 (19:55 +0530)]

rolling_update: fail if less than 3 MONs

... for non-containerized deployments as well.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1655470
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit cb784c601d2063b95fb7d2514e39518137164e12)

commit | commitdiff | tree

Sébastien Han [Mon, 3 Dec 2018 09:04:38 +0000 (10:04 +0100)]

disable nfs scenario

The packages are broken, so let's remove it, until this solved.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a502327e52b2577a721790fce1cdc5e3201678bf)

commit | commitdiff | tree

Sébastien Han [Tue, 4 Dec 2018 09:44:28 +0000 (10:44 +0100)]

test: disable nfs for containers

Based on https://github.com/ceph/ceph-container/pull/1269 and given
there are no stable packages and reliable repository, we disable nfs
ganesha temporarly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6c3ef90ebe94eb874b415d1cfcf329e20232ba9a)

commit | commitdiff | tree

Sébastien Han [Fri, 30 Nov 2018 10:20:03 +0000 (11:20 +0100)]

osd: discover osd_objectstore on the fly

Applying and passing the OSD_BLUESTORE/FILESTORE on the fly is wrong for
existing clusters as their config will be changed.

Typically, if an OSD was prepared with ceph-disk on filestore and we
change the default objectstore to bluestore, the activation will fail.
The flag osd_objectstore should only be used for the preparation, not
activation. The activate in this case detects the osd objecstore which
prevents failures like the one described above.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c5113019893c92c4d75c9fc457b04158b86398b)

commit | commitdiff | tree

Sébastien Han [Tue, 27 Nov 2018 16:50:44 +0000 (17:50 +0100)]

ceph-osd: change jinja condition

If an existing cluster runs this config, and has ceph-disk OSD, the
`expose_partitions` won't be expected by jinja since it's inside the
'old' if. We need it as part of the osd_scenario != 'lvm' condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1640273
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bef522627e1e9827b86710c7a54f35a0cd596fbb)

commit | commitdiff | tree

Sébastien Han [Thu, 29 Nov 2018 13:26:41 +0000 (14:26 +0100)]

rolling_update: do not fail on missing keys

We don't want to fail on key that are not present since they will get
created after the mons are updated. They will be created by the task
"create potentially missing keys (rbd and rbd-mirror)".

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ebc901c6af67300f7b7b8da1b2d0a74147798da5)

commit | commitdiff | tree

Noah Watkins [Fri, 30 Nov 2018 23:46:42 +0000 (15:46 -0800)]

rgw: use correct default rgw frontend address

since 0.0.0.0 is the default radosgw address (not 'address'), not
configuring an address explicitly, and instead configuring the radosgw
interface, would result in 0.0.0.0 being used, instead of falling
through to section that inspects the interface config option.

backport note: this cannot be cherry-picked from master since this code
doesn't exist in master.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1655131

Signed-off-by: Noah Watkins <nwatkins@redhat.com>

commit | commitdiff | tree

Ramana Raja [Fri, 30 Nov 2018 15:01:13 +0000 (20:31 +0530)]

tox.ini: setup LVs in OSD hosts for '*-cluster' scenarios

... as the scenarios set up ceph clusters with LVM OSDs.

Closes: https://github.com/ceph/ceph-ansible/issues/3399
Signed-off-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 29 Nov 2018 13:59:25 +0000 (14:59 +0100)]

osd: manage legacy ceph-disk non-container startup

The code is now able (again) to start osds that where configured with
ceph-disk on a non-container scenario.

Closes: https://github.com/ceph/ceph-ansible/issues/3388
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 29 Nov 2018 09:16:52 +0000 (10:16 +0100)]

config: write jinja comment with appropriate syntax

jinja comment should be written using the jinja syntax `{# ... #}`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1654441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a86c2b85263f84891e2cbf7e782f7ac8891257b3)

commit | commitdiff | tree

Sébastien Han [Wed, 28 Nov 2018 23:27:49 +0000 (00:27 +0100)]

rolling_update: default ceph json output to empty dict

So we can avoid the following failure:

The conditional check 'hostvars[mon_host]['ansible_hostname'] in (ceph_health_raw.stdout | from_json)["quorum_names"] or hostvars[mon_host]['ansible_fqdn'] in (ceph_health_raw.stdout | from_json)["quorum_names"]
' failed. The error was: No JSON object could be decoded

We just need to set a default, the next iteration will have a more
complete json since the command won't fail.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 16:28:00 +0000 (17:28 +0100)]

client: change default pool size

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ed42262b372ace8688c2b20a05d143e46174ec08)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 16:27:11 +0000 (17:27 +0100)]

defaults: change default size for openstack pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6d1fe329980b91944cae58e68b909d34892667e7)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 16:08:19 +0000 (17:08 +0100)]

defaults: change for default pool size for cephfs_pools

default pool size should match the real default that is defined in ceph
itself.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fdc438dd0dd7ad91b296008a7335460a88c2ca4a)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 10:06:45 +0000 (11:06 +0100)]

defaults: add ceph related vars file

This is to add a granularity level.
We can have ceph specific variables that user shouldn't have to change
here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f1735e9bb016dd30c2164b9f8ec6f644914052b1)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 10:00:11 +0000 (11:00 +0100)]

refact osd pool size customization

Add real default value for osd pool size customization.
Ceph itself has an `osd_pool_default_size` default value to `3`.

If users don't specify a pool size in various pools definition within
ceph-ansible, we should default to `3`.

By the way, this kind of condition isn't really clear:
```
when:
- rbd_pool_size | default ("")
```

we should try to get the customized value then default to what is in
`osd_pool_default_size` (which has its default value pointing to
`ceph_osd_pool_default_size` (`3`) as well) and compare it to
`ceph_osd_pool_default_size`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7774069d45477df9f37c98bc414b3bf38cf41feb)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 13 Nov 2018 14:40:35 +0000 (15:40 +0100)]

mon: move `osd_pool_default_pg_num` in `ceph-defaults`

`osd_pool_default_pg_num` parameter is set in `ceph-mon`.
When using ceph-ansible with `--limit` on a specifc group of nodes, it
will fail when trying to access this variables since it wouldn't be
defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1518696
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d4c0960f04342e995db2453b50940aa9933ceb09)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 16:28:31 +0000 (17:28 +0100)]

tests: change default pools size

default pool size in our test should be explicitly set to 1

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 26 Nov 2018 13:10:19 +0000 (14:10 +0100)]

update: fix a typo

`hostvars[groups[mon_host]]['ansible_hostname']` seems to be a typo.
That should be `hostvars[mon_host]['ansible_hostname']`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7c99b6df6d8f0daa05ed8da987984d638af3a794)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 22 Nov 2018 10:33:20 +0000 (11:33 +0100)]

tests: do not fully override previous ceph_conf_overrides

We run an initial deployment with `osd_pool_default_size: 1` in
`ceph_conf_overrides`.
When re-running the playbook to test idempotency and handlers, we reset
`ceph_conf_overrides`, we must append a new value instead of just
overwritting it, otherwise, this can lead to error in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f290e49df86a6c878dfffa4d017537f3be6ff615)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 22 Nov 2018 16:52:58 +0000 (17:52 +0100)]

rolling_update: refact set_fact `mon_host`

each monitor node should select another monitor which isn't itself.
Otherwise, one node in the monitor group won't set this fact and causes
failure.

Typical error:
```
TASK [create potentially missing keys (rbd and rbd-mirror) when mon is containerized] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-dev-update_docker_cluster/rolling_update.yml:200
Thursday 22 November 2018 14:02:30 +0000 (0:00:07.493) 0:02:50.005 *****
fatal: [mon1]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute u'mon2'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit af78173584f1b3a99515e9b94f450be22420c545)

commit | commitdiff | tree

Sébastien Han [Wed, 21 Nov 2018 15:18:58 +0000 (16:18 +0100)]

rolling_update: create rbd and rbd-mirror keyrings

During an upgrade ceph won't create keys that were not existing on the
previous version. So after the upgrade of let's Jewel to Luminous, once
all the monitors have the new version they should get or create the
keys. It's ok to have the task fails, especially for the rbd-mirror
key, which only appears in Nautilus.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650572
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4e267bee4f9263b9ac3b5649f1e3cf3cbaf12d10)

commit | commitdiff | tree

Sébastien Han [Wed, 21 Nov 2018 15:17:04 +0000 (16:17 +0100)]

ceph_key: add a get_key function

When checking if a key exists we also have to ensure that the key exists
on the filesystem, the key can change on Ceph but still have an outdated
version on the filesystem. This solves this issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 691f373543d96d26b1af61c4ff7731fd888a9ce9)

commit | commitdiff | tree

Sébastien Han [Mon, 19 Nov 2018 13:58:03 +0000 (14:58 +0100)]

switch: do not look for devices anymore

It's easier lookup a directoriy instead of the block devices,
especially because of ceph-volume and ceph-disk have a different way to
handle devices.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c14f9b78ff7b88419148ac2dd01611b7ec830598)

commit | commitdiff | tree

Sébastien Han [Fri, 16 Nov 2018 15:15:24 +0000 (16:15 +0100)]

switch: disable all ceph units

Prior to this commit we were only disabling ceph-osd units, but forgot
the ceph.target which is controlling everything and will restart the
ceph-osd units at each reboot.
Now that everything gets disabled there won't be any conflicts between
the old non-container and the new container units.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit cd56dad9fa4574f8474c362083d97003f62926ab)

commit | commitdiff | tree

Sébastien Han [Tue, 13 Nov 2018 16:43:21 +0000 (17:43 +0100)]

switch: do not mask systemd unit

If we mask it we won't be able to start the OSD container since now the
osd container use the osd ID as a name such as: ceph-osd@0

Fixes the error: Failed to execute operation: Cannot send after transport endpoint shutdown

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fe1d09925ae1525e99f22a3eab9ca1823c079bda)

commit | commitdiff | tree

Sébastien Han [Wed, 28 Nov 2018 23:10:29 +0000 (00:10 +0100)]

osd: re-introduce disk_list check

This commit
https://github.com/ceph/ceph-ansible/commit/4cc1506303739f13bb7a6e1022646ef90e004c90#diff-51bbe3572e46e3b219ad726da44b64ebL13
accidentally removed this check.

This is a must have for ceph-disk based containerized OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 28 Nov 2018 19:53:10 +0000 (20:53 +0100)]

validate: change default value for `radosgw_address`

change default value of `radosgw_address` to keep consistency with
`monitor_address`.
Moreover, `ceph-validate` checks if the value is '0.0.0.0' to determine
if it has to run `check_eth_rgw.yml`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e4869ac8bd574af56952f02b1c8f63ecae0d5d86)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 28 Nov 2018 17:46:45 +0000 (18:46 +0100)]

tests: rgw_multisite allow clusters to talk to each other

Adding this rule on the hypervisor will allow cluster to talk to each
other.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 96ce8761ba7656d4a657df319362084f72080320)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 8 Nov 2018 08:08:28 +0000 (09:08 +0100)]

tests: set pool size to 1 in ceph-override.json

setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9aa0ea29041ce122b4dfac9a51fc235)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 7 Nov 2018 10:45:29 +0000 (11:45 +0100)]

osd: commonize start_osd code

since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fcc012e9a5b5d37bcffd39f3062adbc2886006)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 27 Nov 2018 12:42:41 +0000 (13:42 +0100)]

mgr: fix mgr keyring error on rolling_update

when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73287f91bcdbc4e9cf95f52f8389b561418cf3bd)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 27 Nov 2018 09:26:41 +0000 (10:26 +0100)]

tests: apply dev_setup on the secondary cluster for rgw_multisite

we must apply this playbook before deploying the secondary cluster.
Otherwise, there will be a mismatch between the two deployed cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3d8f4e63045d9453549e86fc280556663a9a9a1c)

commit | commitdiff | tree

Sébastien Han [Tue, 27 Nov 2018 09:45:05 +0000 (10:45 +0100)]

handler: show unit logs on error

This will tremendously help debugging daemons that fail on restart by
showing the systemd unit logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9b337ba660da641f36c79a92e0aace217175ff0)

commit | commitdiff | tree

Andrew Schoen [Tue, 20 Nov 2018 20:28:58 +0000 (14:28 -0600)]

ceph-volume: be idempotent when the batch strategy changes

If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both
HDD drives will be filtered out, because they're already used by ceph.
ceph-volume will report this as a 'strategy change' because the device
list went from a mixed type of HDD and SDD to a single type of only SDD.

This situation results in a non-zero exit code from ceph-volume. We want
to handle this situation gracefully and report that nothing will be changed.
A similar json structure to what would have been given by ceph-volume is
returned in the 'stdout' key.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e13f32c1c5be2e4007714f704297827b16488ec6)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 13:38:25 +0000 (14:38 +0100)]

config: convert _osd_memory_target to int

ceph.conf doesn't accept float value.

Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```

This commit ensures the value inserted in ceph.conf will be an integer.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 68dde424f6b254c657d75f1b5c47131dc84d9fc3)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 15 Nov 2018 20:56:11 +0000 (21:56 +0100)]

infra: don't restart firewalld if unit is masked

if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281
(cherry picked from commit 63b9835cbb0510415a2d0077697a0107e2d6c4f3)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Neha Ojha [Mon, 19 Nov 2018 06:50:02 +0000 (06:50 +0000)]

osd_memory_target: standardize unit and fix calculation

* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.

* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 10538e9a23c60c4e634226aafe0456680c5ccc6d)

commit | commitdiff | tree

Guillaume Abrioux [Sat, 17 Nov 2018 16:40:35 +0000 (17:40 +0100)]

client: fix a typo in create_users_keys.yml

cd1e4ee024ef400ded25e8c99948648ead3a0892 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 393ab94728cfff9ab2d4846eb39095becf69ad32)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 15 Nov 2018 21:03:28 +0000 (22:03 +0100)]

validate: allow stable-3.2 to run with ansible 2.4

Although this is not officially supported, this commit allows
`stable-3.2` to run against ansible 2.4.
This should ease the transition in RHOSP.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Jason Dillaman [Fri, 2 Nov 2018 14:30:34 +0000 (10:30 -0400)]

igw: add support for IPv6

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0aff0e9ede433d75d040a70d1a21b0acd8f4790f)

Conflicts:
library/igw_purge.py: trivial resolution
roles/ceph-iscsi-gw/library/igw_purge.py: trivial resolution

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 19:03:37 +0000 (14:03 -0500)]

igw: open iscsi target port

Open the port the iscsi target uses for iscsi traffic.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 5ba7d1671ed421995e263f6abf6c2ccffac12422)

commit | commitdiff | tree

Mike Christie [Thu, 8 Nov 2018 21:23:24 +0000 (15:23 -0600)]

igw: use api_port variable for firewall port setting

Don't hard code api port because it might be overridden by the user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit e2f1f81de4c829b52760dc6a98e2f8751d51255e)

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 18:54:52 +0000 (13:54 -0500)]

igw: fix firewall iscsi_group_name check

The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a4ff52842cc53917388901971a01242b036455e4)

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 18:54:03 +0000 (13:54 -0500)]

igw: Fix default api port

The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a10853c5f8bbd113b07efbb7ae93a2ef3f8304da)

commit | commitdiff | tree

VasishtaShastry [Sun, 28 Oct 2018 17:37:21 +0000 (23:07 +0530)]

ceph-validate : Added functions to accept true and flase

ceph-validate used to throw error for setting flags as 'true' or 'false' for True and False
Now user can set the flags 'dmcrypt' and 'osd_auto_discovery' as 'true' or 'false'

Will fix - Bug 1638325

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 098f42f2334c442bf418f09d3f4b3b99750c7ba0)

commit | commitdiff | tree

Rishabh Dave [Wed, 31 Oct 2018 14:46:13 +0000 (10:46 -0400)]

remove configuration files for ceph packages on ubuntu clusters

For apt-get, purge command needs to be used, instead of remove command,
to remove related configuration files. Otherwise, packages might be
shown as installed while running dpkg command even after removing them.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 640cad3fd810f0aacd41fc35b96f0be3f85fbd0d)

commit | commitdiff | tree

Mike Christie [Thu, 8 Nov 2018 21:38:08 +0000 (15:38 -0600)]

igw: stop tcmu-runner on iscsi purge

When the iscsi purge playbook is run we stop the gw and api daemons but
not tcmu-runner which I forgot on the previous PR.

Fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b523a44a1a9cf60f7512af833d97c52c1dee1bba)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 6 Nov 2018 08:17:29 +0000 (09:17 +0100)]

tests: test ooo_collocation agasint v3.0.3 ceph-container image

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 811f043947e946eb60bf2fc70a8f7f300a0cd4dc)

commit | commitdiff | tree

Sébastien Han [Mon, 5 Nov 2018 17:53:44 +0000 (18:53 +0100)]

rbd-mirror: enable ceph-rbd-mirror.target

Without this the daemon will never start after reboot.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b7a791e9029e4aca31b00a118e6eb6ac1737dc6d)

commit | commitdiff | tree

Andrew Schoen [Wed, 31 Oct 2018 15:25:26 +0000 (10:25 -0500)]

validate: do not validate ceph_repository if deploying containers

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1630975
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 9cd8ecf0cc74e5edfe54cbb5cebf1d72ca3bab8a)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 30 Oct 2018 14:01:46 +0000 (15:01 +0100)]

rgw: move multisite default variables in ceph-defaults

Move all rgw multisite variables in ceph-defaults so ceph-validate can
go through them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 17:50:31 +0000 (18:50 +0100)]

tests: add more memory for rgw_multsite scenarios

Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 15:37:12 +0000 (16:37 +0100)]

rgw: move multisite related tasks after docker/main.yml

We must play this task after the container has started otherwise
rgw_multisite tasks will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 13:05:59 +0000 (14:05 +0100)]

rgw: add rgw_multisite for containerized deployments

run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 12:30:59 +0000 (13:30 +0100)]

tests: add rgw_multisite functional test

Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 24 Oct 2018 20:27:28 +0000 (22:27 +0200)]

rgw: add testing scenario for rgw multisite

This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 11:05:09 +0000 (12:05 +0100)]

validate: remove check on rgw_multisite_endpoint_addr definition

since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Fri, 26 Oct 2018 14:39:56 +0000 (14:39 +0000)]

rgw: add ceph-validate tasks for multisite, other fixes

- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 26 Oct 2018 09:14:12 +0000 (11:14 +0200)]

rgw: add a dedicated variable for multisite endpoint

We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Mon, 18 Sep 2017 22:33:23 +0000 (18:33 -0400)]

rgw: update rgw multisite tasks

- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:18:16 +0000 (12:18 +0100)]

travis: add ansible-galaxy integration

This instructs Travis to notify Galaxy when a build completes. Since 3.0
the ansible-galaxy has the ability to build and push roles from repos
with multiple roles.

Closes: https://github.com/ceph/ceph-ansible/issues/3165
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:20:44 +0000 (12:20 +0100)]

gitignore: add mergify and travis as exceptions

Git must notice changes from .travis.yml and .mergify.yml

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:04:59 +0000 (12:04 +0100)]

contrib: rm script push-roles-to-ansible-galaxy.sh

The script is not used anymore and soon Travis CI will do this job of
pushing the role into the galaxy.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 10:28:23 +0000 (11:28 +0100)]

cleanup repos's root

Remove old files and move scripts to the contrib directory.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Maciej Naruszewicz [Fri, 19 Oct 2018 20:40:36 +0000 (22:40 +0200)]

ceph-volume: fix TypeError exception when setting osds-per-device > 1

osds-per-device needs to be passed to run_command as a string.
Otherwise, expandvars method will try to iterate over an integer.

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 15:24:45 +0000 (16:24 +0100)]

testinfra: change test osds for containers

We do not use @<device> anymore so we don't need to perform the
readlink check anymore.

Also we are making an exception for ooo which is still using ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 26 Oct 2018 14:30:32 +0000 (16:30 +0200)]

ceph_volume: add container support for batch

https://tracker.ceph.com/issues/36363 has been resolved and the patch
has been backported to luminous and mimic so let's enable the container
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541415
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 11:00:40 +0000 (12:00 +0100)]

test_osd: dynamically get the osd container

Do not enforce the container name since this will fail when we have
multiple VMs running OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 19:29:56 +0000 (15:29 -0400)]

test: convert all the tests to use lvm

ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 25 Oct 2018 14:15:36 +0000 (16:15 +0200)]

tox: change container image to use master

We have a latest-master image which contains builds from upstream ceph
so let's use it to verify build.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 18:55:20 +0000 (14:55 -0400)]

test: remove ceph-disk CI tests

Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 10:46:46 +0000 (11:46 +0100)]

roles: fix *_docker_memory_limit default value

append 'm' suffix to specify the unit size used in all
`*_docker_memory_limit`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Neha Ojha [Thu, 25 Oct 2018 17:45:00 +0000 (17:45 +0000)]

roles: do not limit docker_memory_limit for various daemons

Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.

Whenever required, these limits can be manually enforced by the user.

More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 13:53:47 +0000 (14:53 +0100)]

Merge branch 'jcsp-wip-rm-calamari'

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 13:50:37 +0000 (14:50 +0100)]

Merge branch 'master' into wip-rm-calamari

commit | commitdiff | tree

Ali Maredia [Mon, 29 Oct 2018 06:01:25 +0000 (06:01 +0000)]

infrastructure playbooks: ensure nvme_device is defined in lv-create.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 26 Oct 2018 13:27:33 +0000 (15:27 +0200)]

nfs: do not create the nfs user if already present

Check if the user exists and skip its creation if true.

Closes: https://github.com/ceph/ceph-ansible/issues/3254
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Jairo Llopis [Thu, 4 Oct 2018 05:48:03 +0000 (07:48 +0200)]

Fix problem with ceph_key in python3

Pretty basic problem of iteritems removal.

Signed-off-by: Jairo Llopis <yajo.sk8@gmail.com>

commit | commitdiff | tree

Sébastien Han [Wed, 24 Oct 2018 14:55:52 +0000 (16:55 +0200)]

ceph_volume: better error handling

When loading the json, if invalid, we should fail with a meaningful
error.

Signed-off-by: Sébastien Han <seb@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom