git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

projects / ceph-ansible.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Guillaume Abrioux [Thu, 8 Nov 2018 08:08:28 +0000 (09:08 +0100)]

tests: set pool size to 1 in ceph-override.json

setting this setting to 1 makes the CI covering the related code in the
playbook without breaking the upgrade scenarios.

Those scenarios were broken because there is a check `TASK [waiting for
clean pgs...]` in rolling_update.yml, since the pool size for
`cephfs_metadata` and `cephfs_data` are updated to `2` in
`ceph-override.json` and there is not enough osd to honor this size,
some PGs are degraded and make the mentioned check failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3ac6619fb9aa0ea29041ce122b4dfac9a51fc235)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 7 Nov 2018 10:45:29 +0000 (11:45 +0100)]

osd: commonize start_osd code

since `ceph-volume` introduction, there is no need to split those tasks.

Let's refact this part of the code so it's clearer.

By the way, this was breaking rolling_update.yml when `openstack_config:
true` playbook because nothing ensured OSDs were started in ceph-osd role (In
`openstack_config.yml` there is a check ensuring all OSD are UP which was
obviously failing) and resulted with OSDs on the last OSD node not started
anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f7fcc012e9a5b5d37bcffd39f3062adbc2886006)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 27 Nov 2018 12:42:41 +0000 (13:42 +0100)]

mgr: fix mgr keyring error on rolling_update

when upgrading from RHCS 2.5 to 3.2, it fails because the task `create
ceph mgr keyring(s) when mon is containerized` has a when condition
`inventory_hostname == groups[mon_group_name]|last`.
First, this is incorrect because `inventory_hostname` is referring to a
mgr node, it means this condition would have never been satisfied.
Then, this condition + `serial: 1` makes the mgr keyring creating skipped on
the first node. Further, the `ceph-mgr` role tries to copy the mgr
keyring (it's not aware we are running `serial: 1`) this leads to a
failure like the following:

```
TASK [ceph-mgr : copy ceph keyring(s) if needed] ***************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ceph-ansible/roles/ceph-mgr/tasks/common.yml:10
Tuesday 27 November 2018 12:03:34 +0000 (0:00:00.296) 0:11:01.290 ******
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'
failed: [magna021] (item={u'dest': u'/var/lib/ceph/mgr/local-magna021/keyring', u'name': u'/etc/ceph/local.mgr.magna021.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/local-magna021/keyring", "name": "/etc/ceph/local.mgr.magna021.keyring"}, "msg": "Could not find or access '~/ceph-ansible-keys/48d78ac1-e0d6-4e35-ab3e-772aea7828fc//etc/ceph/local.mgr.magna021.keyring'"}
```

The ceph_key module is idempotent, so there is no need to have such a
condition.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1649957
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 73287f91bcdbc4e9cf95f52f8389b561418cf3bd)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 27 Nov 2018 09:26:41 +0000 (10:26 +0100)]

tests: apply dev_setup on the secondary cluster for rgw_multisite

we must apply this playbook before deploying the secondary cluster.
Otherwise, there will be a mismatch between the two deployed cluster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3d8f4e63045d9453549e86fc280556663a9a9a1c)

commit | commitdiff | tree

Sébastien Han [Tue, 27 Nov 2018 09:45:05 +0000 (10:45 +0100)]

handler: show unit logs on error

This will tremendously help debugging daemons that fail on restart by
showing the systemd unit logs.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9b337ba660da641f36c79a92e0aace217175ff0)

commit | commitdiff | tree

Andrew Schoen [Tue, 20 Nov 2018 20:28:58 +0000 (14:28 -0600)]

ceph-volume: be idempotent when the batch strategy changes

If you deploy with 2 HDDs and 1 SDD then each subsequent deploy both
HDD drives will be filtered out, because they're already used by ceph.
ceph-volume will report this as a 'strategy change' because the device
list went from a mixed type of HDD and SDD to a single type of only SDD.

This situation results in a non-zero exit code from ceph-volume. We want
to handle this situation gracefully and report that nothing will be changed.
A similar json structure to what would have been given by ceph-volume is
returned in the 'stdout' key.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1650306
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e13f32c1c5be2e4007714f704297827b16488ec6)

commit | commitdiff | tree

Guillaume Abrioux [Wed, 21 Nov 2018 13:38:25 +0000 (14:38 +0100)]

config: convert _osd_memory_target to int

ceph.conf doesn't accept float value.

Typical error seen:
```
$ sudo ceph daemon osd.2 config get osd_memory_target
Can't get admin socket path: unable to get conf option admin_socket for osd.2:
parse error setting 'osd_memory_target' to '7823740108,8' (strict_si_cast:
unit prefix not recognized)
```

This commit ensures the value inserted in ceph.conf will be an integer.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 68dde424f6b254c657d75f1b5c47131dc84d9fc3)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 15 Nov 2018 20:56:11 +0000 (21:56 +0100)]

infra: don't restart firewalld if unit is masked

if firewalld.service systemd unit is masked, the handler will fail when
trying to restart it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1650281
(cherry picked from commit 63b9835cbb0510415a2d0077697a0107e2d6c4f3)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Neha Ojha [Mon, 19 Nov 2018 06:50:02 +0000 (06:50 +0000)]

osd_memory_target: standardize unit and fix calculation

* The default value of osd_memory_target used by ceph is 4294967296 bytes,
so use the same as ceph-ansible default.

* Convert ansible_memtotal_mb to bytes to calculate osd_memory_target

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 10538e9a23c60c4e634226aafe0456680c5ccc6d)

commit | commitdiff | tree

Guillaume Abrioux [Sat, 17 Nov 2018 16:40:35 +0000 (17:40 +0100)]

client: fix a typo in create_users_keys.yml

cd1e4ee024ef400ded25e8c99948648ead3a0892 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 393ab94728cfff9ab2d4846eb39095becf69ad32)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 15 Nov 2018 21:03:28 +0000 (22:03 +0100)]

validate: allow stable-3.2 to run with ansible 2.4

Although this is not officially supported, this commit allows
`stable-3.2` to run against ansible 2.4.
This should ease the transition in RHOSP.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Jason Dillaman [Fri, 2 Nov 2018 14:30:34 +0000 (10:30 -0400)]

igw: add support for IPv6

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0aff0e9ede433d75d040a70d1a21b0acd8f4790f)

Conflicts:
library/igw_purge.py: trivial resolution
roles/ceph-iscsi-gw/library/igw_purge.py: trivial resolution

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 19:03:37 +0000 (14:03 -0500)]

igw: open iscsi target port

Open the port the iscsi target uses for iscsi traffic.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 5ba7d1671ed421995e263f6abf6c2ccffac12422)

commit | commitdiff | tree

Mike Christie [Thu, 8 Nov 2018 21:23:24 +0000 (15:23 -0600)]

igw: use api_port variable for firewall port setting

Don't hard code api port because it might be overridden by the user.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit e2f1f81de4c829b52760dc6a98e2f8751d51255e)

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 18:54:52 +0000 (13:54 -0500)]

igw: fix firewall iscsi_group_name check

The firewall setup for igw is not getting setup because iscsi_group_name
does not it exist. It should be iscsi_gw_group_name.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a4ff52842cc53917388901971a01242b036455e4)

commit | commitdiff | tree

Mike Christie [Tue, 30 Oct 2018 18:54:03 +0000 (13:54 -0500)]

igw: Fix default api port

The default igw api port is 5000 in the manual setup docs and
ceph-iscsi-config package so this syncs up ansible.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit a10853c5f8bbd113b07efbb7ae93a2ef3f8304da)

commit | commitdiff | tree

VasishtaShastry [Sun, 28 Oct 2018 17:37:21 +0000 (23:07 +0530)]

ceph-validate : Added functions to accept true and flase

ceph-validate used to throw error for setting flags as 'true' or 'false' for True and False
Now user can set the flags 'dmcrypt' and 'osd_auto_discovery' as 'true' or 'false'

Will fix - Bug 1638325

Signed-off-by: VasishtaShastry <vipin.indiasmg@gmail.com>
(cherry picked from commit 098f42f2334c442bf418f09d3f4b3b99750c7ba0)

commit | commitdiff | tree

Rishabh Dave [Wed, 31 Oct 2018 14:46:13 +0000 (10:46 -0400)]

remove configuration files for ceph packages on ubuntu clusters

For apt-get, purge command needs to be used, instead of remove command,
to remove related configuration files. Otherwise, packages might be
shown as installed while running dpkg command even after removing them.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1640061
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 640cad3fd810f0aacd41fc35b96f0be3f85fbd0d)

commit | commitdiff | tree

Mike Christie [Thu, 8 Nov 2018 21:38:08 +0000 (15:38 -0600)]

igw: stop tcmu-runner on iscsi purge

When the iscsi purge playbook is run we stop the gw and api daemons but
not tcmu-runner which I forgot on the previous PR.

Fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit b523a44a1a9cf60f7512af833d97c52c1dee1bba)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 6 Nov 2018 08:17:29 +0000 (09:17 +0100)]

tests: test ooo_collocation agasint v3.0.3 ceph-container image

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 811f043947e946eb60bf2fc70a8f7f300a0cd4dc)

commit | commitdiff | tree

Sébastien Han [Mon, 5 Nov 2018 17:53:44 +0000 (18:53 +0100)]

rbd-mirror: enable ceph-rbd-mirror.target

Without this the daemon will never start after reboot.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b7a791e9029e4aca31b00a118e6eb6ac1737dc6d)

commit | commitdiff | tree

Andrew Schoen [Wed, 31 Oct 2018 15:25:26 +0000 (10:25 -0500)]

validate: do not validate ceph_repository if deploying containers

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1630975
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 9cd8ecf0cc74e5edfe54cbb5cebf1d72ca3bab8a)

commit | commitdiff | tree

Guillaume Abrioux [Tue, 30 Oct 2018 14:01:46 +0000 (15:01 +0100)]

rgw: move multisite default variables in ceph-defaults

Move all rgw multisite variables in ceph-defaults so ceph-validate can
go through them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 17:50:31 +0000 (18:50 +0100)]

tests: add more memory for rgw_multsite scenarios

Adding more memory to VMs for rgw_multisite scenarios could avoid this error
I have recently hit in the CI:

(It is worth it to set 1024Mb since there is only 2 nodes in those
scenarios.)

```
fatal: [osd0]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "--entrypoint",
        "/usr/bin/ceph",
        "docker.io/ceph/daemon:latest-luminous",
        "--version"
    ],
    "delta": "0:00:04.799084",
    "end": "2018-10-29 17:10:39.136602",
    "rc": 1,
    "start": "2018-10-29 17:10:34.337518"
}

STDERR:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 125, in <module>
    import rados
ImportError: libceph-common.so.0: cannot map zero-fill pages: Cannot allocate memory
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 15:37:12 +0000 (16:37 +0100)]

rgw: move multisite related tasks after docker/main.yml

We must play this task after the container has started otherwise
rgw_multisite tasks will fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 13:05:59 +0000 (14:05 +0100)]

rgw: add rgw_multisite for containerized deployments

run commands on containers when containerized deployments.
(At the moment, all commands are run on the host only)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 12:30:59 +0000 (13:30 +0100)]

tests: add rgw_multisite functional test

Add a playbook that will upload a file on the master then try to get
info from the secondary node, this way we can check if the replication
is ok.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 24 Oct 2018 20:27:28 +0000 (22:27 +0200)]

rgw: add testing scenario for rgw multisite

This will setup 2 cluster with rgw multisite enabled.
First cluster will act as the 'master', the 2nd will be the secondary
one.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 11:05:09 +0000 (12:05 +0100)]

validate: remove check on rgw_multisite_endpoint_addr definition

since `rgw_multisite_endpoint_addr` has a default value to
`{{ ansible_fqdn }}`, it shouldn't be mandatory to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Fri, 26 Oct 2018 14:39:56 +0000 (14:39 +0000)]

rgw: add ceph-validate tasks for multisite, other fixes

- updated README-MULTISITE
- re-added destroy.yml
- added tasks in ceph-validate to make sure the
rgw multisite vars are set

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 26 Oct 2018 09:14:12 +0000 (11:14 +0200)]

rgw: add a dedicated variable for multisite endpoint

We should give users the possibility to set the IP they want as
multisite endpoint, setting the default value to `{{ ansible_fqdn }}` to
not force them to set this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ali Maredia [Mon, 18 Sep 2017 22:33:23 +0000 (18:33 -0400)]

rgw: update rgw multisite tasks

- remove destroy tasks
- cleanup conditionals and syntax
- remove unnecessary realm pulls
- enable multisite to be tested in automated
testing infra
- add multisite related vars to main.yml and
group_vars
- update README-MULTISITE
- ensure all `radosgw-admin` commands are being run
on a mon

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:18:16 +0000 (12:18 +0100)]

travis: add ansible-galaxy integration

This instructs Travis to notify Galaxy when a build completes. Since 3.0
the ansible-galaxy has the ability to build and push roles from repos
with multiple roles.

Closes: https://github.com/ceph/ceph-ansible/issues/3165
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:20:44 +0000 (12:20 +0100)]

gitignore: add mergify and travis as exceptions

Git must notice changes from .travis.yml and .mergify.yml

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 11:04:59 +0000 (12:04 +0100)]

contrib: rm script push-roles-to-ansible-galaxy.sh

The script is not used anymore and soon Travis CI will do this job of
pushing the role into the galaxy.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 30 Oct 2018 10:28:23 +0000 (11:28 +0100)]

cleanup repos's root

Remove old files and move scripts to the contrib directory.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Maciej Naruszewicz [Fri, 19 Oct 2018 20:40:36 +0000 (22:40 +0200)]

ceph-volume: fix TypeError exception when setting osds-per-device > 1

osds-per-device needs to be passed to run_command as a string.
Otherwise, expandvars method will try to iterate over an integer.

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 15:24:45 +0000 (16:24 +0100)]

testinfra: change test osds for containers

We do not use @<device> anymore so we don't need to perform the
readlink check anymore.

Also we are making an exception for ooo which is still using ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 26 Oct 2018 14:30:32 +0000 (16:30 +0200)]

ceph_volume: add container support for batch

https://tracker.ceph.com/issues/36363 has been resolved and the patch
has been backported to luminous and mimic so let's enable the container
support.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541415
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 11:00:40 +0000 (12:00 +0100)]

test_osd: dynamically get the osd container

Do not enforce the container name since this will fail when we have
multiple VMs running OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 19:29:56 +0000 (15:29 -0400)]

test: convert all the tests to use lvm

ceph-disk is now deprecated in ceph-ansible so let's convert all the ci
tests to use lvm instead of ceph-disk.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 25 Oct 2018 14:15:36 +0000 (16:15 +0200)]

tox: change container image to use master

We have a latest-master image which contains builds from upstream ceph
so let's use it to verify build.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 10 Oct 2018 18:55:20 +0000 (14:55 -0400)]

test: remove ceph-disk CI tests

Since we are removing the ceph-disk test from the ci in master then
there is no need to have the functionnal tests in master anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 29 Oct 2018 10:46:46 +0000 (11:46 +0100)]

roles: fix *_docker_memory_limit default value

append 'm' suffix to specify the unit size used in all
`*_docker_memory_limit`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Neha Ojha [Thu, 25 Oct 2018 17:45:00 +0000 (17:45 +0000)]

roles: do not limit docker_memory_limit for various daemons

Since we do not have enough data to put valid upper bounds for the memory
usage of these daemons, do not put artificial limits by default. This will
help us avoid failures like OOM kills due to low default values.

Whenever required, these limits can be manually enforced by the user.

More details in
https://bugzilla.redhat.com/show_bug.cgi?id=1638148

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1638148
Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 13:53:47 +0000 (14:53 +0100)]

Merge branch 'jcsp-wip-rm-calamari'

commit | commitdiff | tree

Sébastien Han [Mon, 29 Oct 2018 13:50:37 +0000 (14:50 +0100)]

Merge branch 'master' into wip-rm-calamari

commit | commitdiff | tree

Ali Maredia [Mon, 29 Oct 2018 06:01:25 +0000 (06:01 +0000)]

infrastructure playbooks: ensure nvme_device is defined in lv-create.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 26 Oct 2018 13:27:33 +0000 (15:27 +0200)]

nfs: do not create the nfs user if already present

Check if the user exists and skip its creation if true.

Closes: https://github.com/ceph/ceph-ansible/issues/3254
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Jairo Llopis [Thu, 4 Oct 2018 05:48:03 +0000 (07:48 +0200)]

Fix problem with ceph_key in python3

Pretty basic problem of iteritems removal.

Signed-off-by: Jairo Llopis <yajo.sk8@gmail.com>

commit | commitdiff | tree

Sébastien Han [Wed, 24 Oct 2018 14:55:52 +0000 (16:55 +0200)]

ceph_volume: better error handling

When loading the json, if invalid, we should fail with a meaningful
error.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 24 Oct 2018 14:53:12 +0000 (16:53 +0200)]

ceph_volume: expose ceph-volume logs on the host

This will tremendously help debugging failures while performing any
ceph-volume command in containers.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 26 Oct 2018 07:46:29 +0000 (09:46 +0200)]

resync group_vars/*.sample files

ee2d52d33df2a311cdf0ff62abd353fccb3affbc missed this sync between
ceph-defaults/defaults/main.yml and group_vars/all.yml.sampl

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 25 Oct 2018 12:42:54 +0000 (14:42 +0200)]

tox: fix a typo

the line setting `ANSIBLE_CONFIG` obviously contains a typo introduced
by 1e283bf69be8b9efbc1a7a873d91212ad57c7351

`ANSIBLE_CONFIG` has to point to a path only (path to an ansible.cfg)

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Mike Christie [Wed, 12 Sep 2018 20:37:44 +0000 (15:37 -0500)]

igw: stop daemons on purge all calls

When purging the entire igw config (lio and rbd) stop disable the api
and gw daemons.

Fixes Red Hat BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1621255

Signed-off-by: Mike Christie <mchristi@redhat.com>

commit | commitdiff | tree

Rishabh Dave [Tue, 9 Oct 2018 20:47:40 +0000 (02:17 +0530)]

ceph-validate: avoid "list index out of range" error

Be sure that error.path has more than one members before using them.

Signed-off-by: Rishabh Dave <ridave@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 23 Oct 2018 07:49:50 +0000 (09:49 +0200)]

ceph-infra: reload firewall after rules are added

we ensure that firewalld is installed and running before adding any
rule. This has no sense anymore not to reload firewalld once the rule
are added.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Rishabh Dave [Mon, 1 Oct 2018 15:11:13 +0000 (11:11 -0400)]

allow custom pool size

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1596339
Signed-off-by: Rishabh Dave <ridave@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 19 Oct 2018 11:19:59 +0000 (13:19 +0200)]

tests: remove unnecessary variables definition

since we set `configure_firewall: true` in
`ceph-defaults/defaults/main.yml` there is no need to explicitly set it
in `centos7_cluster` and `docker_cluster` testing scenarios.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 19 Oct 2018 11:16:23 +0000 (13:16 +0200)]

defaults: set default `configure_firewall` to `True`

Let's configure firewalld by default.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 9 Aug 2018 09:32:53 +0000 (11:32 +0200)]

rolling_update: fix upgrade when using fqdn

CLusters that were deployed using 'mon_use_fqdn' have a different unit
name, so during the upgrade this must be used otherwise the upgrade will
fail, looking for a unit that does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1597516
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Tue, 16 Oct 2018 15:20:54 +0000 (10:20 -0500)]

validate: check the version of python-notario

If the version of python-notario is < 0.0.13 an error message is given
like "TypeError: validate() got an unexpected keyword argument
'defined_keys'", which is not helpful in figuring
out you've got an incorrect version of python-notario.

This check will avoid that situation by telling the user that they need
to upgrade python-notario before they hit that error.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 18 Oct 2018 20:29:02 +0000 (22:29 +0200)]

iscsi: fix networking issue on containerized env

The iscsi-gw containers can't reach monitors without `--net=host`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 18 Oct 2018 13:43:36 +0000 (15:43 +0200)]

Revert "tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes"

This approach doesn't work with all scenarios because it's comparing a
local OSD number expected to a global OSD number found in the whole
cluster.

This reverts commit b8ad35ceb99cdbd1644c79dd689b818f095ba8b8.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 18 Oct 2018 11:45:14 +0000 (13:45 +0200)]

tests: set configure_firewall: true in centos7|docker_cluster

This way the CI will cover this part of the code.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 18 Oct 2018 11:41:49 +0000 (13:41 +0200)]

infra: move restart fw handler in ceph-infra role

Move the handler to restart firewall in ceph-infra role.

Closes: #3243
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 16 Oct 2018 14:25:12 +0000 (16:25 +0200)]

tests: test `test_all_docker_osds_are_up_and_in()` from mon nodes

Let's get the osd tree from mons instead on osds.
This way we don't have to predict an OSD container name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 17 Oct 2018 11:57:09 +0000 (13:57 +0200)]

add-osds: followup on 3632b26

Three fixes:

- fix a typo in vagrant_variables that cause a networking issue for
containerized scenario.
- add containerized_deployment: true
- remove a useless block of code: the fact docker_exec_cmd is set in
ceph-defaults which is played right after.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 24 May 2018 17:47:29 +0000 (10:47 -0700)]

infra: add a gather-ceph-logs.yml playbook

Add a gather-ceph-logs.yml which will log onto all the machines from
your inventory and will gather ceph logs. This is not intended to work
on containerized environments since the logs are stored in journald.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582280
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 16 Oct 2018 15:05:10 +0000 (17:05 +0200)]

tests: add tests for day-2-operation playbook

Adding testing scenarios for day-2-operation playbook.

Steps:
- deploys a cluster,
- run testinfra,
- test idempotency,
- add a new osd node,
- run testinfra

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 27 Sep 2018 14:31:22 +0000 (16:31 +0200)]

infra: rename osd-configure to add-osd and improve it

The playbook has various improvements:

* run ceph-validate role before doing anything
* run ceph-fetch-keys only on the first monitor of the inventory list
* set noup flag so PGs get distributed once all the new OSDs have been
added to the cluster and unset it when they are up and running

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 27 Sep 2018 14:29:22 +0000 (16:29 +0200)]

ceph-fetch-keys: refact

This commits simplies the usage of the ceph-fetch-keys role. The role
now has a nicer way to find various ceph keys and fetch them on the
ansible server.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1624962
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andy McCrae [Fri, 5 Oct 2018 13:36:36 +0000 (14:36 +0100)]

Add ability to use a different client container

Currently a throw-away container is built to run ceph client
commands to setup users, pools & auth keys. This utilises
the same base ceph container which has all the ceph services
inside it.

This PR allows the use of a separate container if the deployer
wishes - but defaults to use the same full ceph container.

This can be used for different architectures or distributions,
which may support the the Ceph client, but not Ceph server,
and allows the deployer to build and specify a separate client
container if need be.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 16 Oct 2018 13:09:48 +0000 (15:09 +0200)]

infra: fix wrong condition on firewalld start task

a non skipped task won't have the `skipped` attribute, so `start
firewalld` task will complain about that.
Indeed, `skipped` and `rc` attributes won't exist since the first task
`check firewalld installation on redhat or suse` won't be skipped in
case of non-containerized deployment.

Fixes: #3236
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1541840
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Christian Berendt [Thu, 11 Oct 2018 10:26:04 +0000 (12:26 +0200)]

ceph-defaults: set ceph_stable_openstack_release_uca to queens

Liberty is no longer available in the UCA. The last available release there
is currently Queens.

Signed-off-by: Christian Berendt <berendt@betacloud-solutions.de>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 15 Oct 2018 21:42:16 +0000 (23:42 +0200)]

contrib: add a bash script to snapshort libvirt vms

This script is still 'work in progress' but could be used to make
snapshot of Libvirt VMs.
This can save some times when deploying again and again.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 15 Oct 2018 13:32:17 +0000 (15:32 +0200)]

handler: remove some leftover in restart_*_daemon.sh.j2

Remove some legacy in those restart script.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 15 Oct 2018 21:54:47 +0000 (23:54 +0200)]

doc: update default osd_objectstore value

since dc3319c3c4e2fb58cb1b5e6c60f165ed28260dc8 this should be reflected
in the doc.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Nan Li [Fri, 12 Oct 2018 03:26:04 +0000 (11:26 +0800)]

docker-ce is used in aarch64 instead of docker engine

Signed-off-by: Nan Li <herbert.nan@linaro.org>

commit | commitdiff | tree

Julien Danjou [Mon, 15 Oct 2018 13:17:26 +0000 (15:17 +0200)]

Mergify: fix regexp operator

commit | commitdiff | tree

Julien Danjou [Mon, 15 Oct 2018 12:30:33 +0000 (14:30 +0200)]

Update Mergify configuration to v2

Signed-off-by: Julien Danjou <julien@danjou.info>

commit | commitdiff | tree

binhong.hua [Wed, 10 Oct 2018 15:24:30 +0000 (23:24 +0800)]

vagrantfile: remove disk path of OSD nodes

osd node's disks will remain on vagrant host,when run "vagrant destroy",
because we use time as a part of disk path, and time on delete not equal time on create.

we already use random_hostname in Libvirt backend,it will create disk
use the hostname as a part of diskname. for example: vagrant_osd2_1539159988_065f15e3e1fa6ceb0770-hda.qcow2.

Signed-off-by: binhong.hua <binhong.hua@gmail.com>

commit | commitdiff | tree

Guillaume Abrioux [Sat, 13 Oct 2018 08:42:18 +0000 (10:42 +0200)]

handler: fix osd containers handler

`ceph_osd_container_stat` might not be set on other osd node.
We must ensure we are on the last node before trying to evaluate
`ceph_osd_container_stat`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 10 Oct 2018 19:24:22 +0000 (15:24 -0400)]

remove jewel support

As of now, we should no longer support Jewel in ceph-ansible.
The latest ceph-ansible release supporting Jewel is `stable-3.1`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 12 Oct 2018 16:58:41 +0000 (18:58 +0200)]

test: fix docker test for lvm

The CI is still running ceph-disk tests upstream. So until
https://github.com/ceph/ceph-ansible/pull/3187 is merged nothing will
pass anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 26 Sep 2018 12:24:26 +0000 (14:24 +0200)]

switch: allow switch big clusters (more than 99 osds)

The current regex had a limitation of 99 OSDs, now this limit has been
removed and regardless the number of OSDs they will all be collected.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630430
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 3 Oct 2018 17:52:42 +0000 (19:52 +0200)]

ceph_volume: refactor

This commit does a couple of things:

* Avoid code duplication
* Clarify the code
* add more unit tests
* add myself to the author of the module

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 9 Oct 2018 20:45:05 +0000 (16:45 -0400)]

tests: do not install lvm2 on atomic host

we need to detect whether we are running on atomic host to not try to
install lvm2 package.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 12 Jul 2018 18:30:59 +0000 (20:30 +0200)]

ci: test lvm in containerized

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 3 Oct 2018 14:52:14 +0000 (16:52 +0200)]

doc: improve osd configuration section

Simply add that all the scenarios support the containerized deployment
option.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 2 Oct 2018 16:35:52 +0000 (18:35 +0200)]

osd: do not run when lvm scenario

This task was created for ceph-disk based deployments so it's not needed
when osd are prepared with ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 2 Oct 2018 16:10:19 +0000 (18:10 +0200)]

handler: add support for ceph-volume containerized restart

The restart script wasn't working with the current new addition of
ceph-volume in container where now OSDs have the OSD id name in the
container name.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 2 Oct 2018 15:37:06 +0000 (17:37 +0200)]

ceph-handler: change osd container check

Now that the container is named ceph-osd@<id> looking for something that
contains a host is not necessary. This is also backward compatible as it
will continue to match container names with hostname in them.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 1 Oct 2018 14:00:21 +0000 (16:00 +0200)]

tests: osd adjust osd name

Now we use id of the OSD instead of the device name.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 1 Oct 2018 13:27:06 +0000 (15:27 +0200)]

validate: add warning for ceph-disk

ceph-disk will be removed in 3.3 and we encourage to start using
ceph-volume as of 3.2.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 28 Sep 2018 16:07:08 +0000 (18:07 +0200)]

osd: ceph-volume activate, just pass the OSD_ID

We don't need to pass the device and discover the OSD ID. We have a
task that gathers all the OSD ID present on that machine, so we simply
re-use them and activate them. This also handles the situation when you
have multiple OSDs running on the same device.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 28 Sep 2018 16:05:42 +0000 (18:05 +0200)]

osd: change unit template for ceph-volume container

We don't need to pass the hostname on the container name but we can keep
it simple and just call it ceph-osd-$id.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 28 Sep 2018 15:19:46 +0000 (17:19 +0200)]

osd: do not use expose_partitions on lvm

expose_partitions is only needed on ceph-disk OSDs so we don't need to
activate this code when running lvm prepared OSDs.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 28 Sep 2018 11:06:18 +0000 (13:06 +0200)]

ceph_volume: add container support for batch command

The batch option got recently added, while rebasing this patch it was
necessary to implement it. So now, the batch option can work on
containerized environments.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1630977
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 16 Jul 2018 16:09:33 +0000 (18:09 +0200)]

ceph_volume: try to get ride of the dummy container

If we run on a containerized deployment we pass an env variable which
contains the container image.

Signed-off-by: Sébastien Han <seb@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.