]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agotests: do not deploy all daemons for shrink osds scenarios
Guillaume Abrioux [Mon, 23 Jul 2018 14:40:49 +0000 (16:40 +0200)]
tests: do not deploy all daemons for shrink osds scenarios

Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b89cc1746f8652b67d95410ed80473d1a2c3d312)

7 years agoshrink-osd: purge osd on containerized deployment
Sébastien Han [Wed, 18 Jul 2018 14:20:47 +0000 (16:20 +0200)]
shrink-osd: purge osd on containerized deployment

Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ce1dd8d2b3986d3bd08b4d73efd88b4de72fcc00)

7 years agotests: stop hardcoding ansible version
Guillaume Abrioux [Thu, 19 Jul 2018 11:52:36 +0000 (13:52 +0200)]
tests: stop hardcoding ansible version

In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: add latest-bis-jewel for jewel tests
Guillaume Abrioux [Tue, 17 Jul 2018 08:47:28 +0000 (10:47 +0200)]
tests: add latest-bis-jewel for jewel tests

since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 05852b03013d15f6f400fe6728c24b11f22b75de)

7 years agoceph-iscsi: rename group iscsi_gws
Sébastien Han [Wed, 6 Jun 2018 04:07:33 +0000 (12:07 +0800)]
ceph-iscsi: rename group iscsi_gws

Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 20c8065e48b6ea3027669bd5fa2f79779eac570a)

7 years agomgr: fix condition to add modules to ceph-mgr
Guillaume Abrioux [Wed, 11 Jul 2018 14:34:09 +0000 (16:34 +0200)]
mgr: fix condition to add modules to ceph-mgr

Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ce5ac930c5b91621a46fc69ddd0dcafb2a24947d)

7 years agomgr: fix enabling of mgr module on mimic
Guillaume Abrioux [Mon, 18 Jun 2018 15:26:21 +0000 (17:26 +0200)]
mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6)

7 years agomgr: fix enabling of mgr module on mimic
Guillaume Abrioux [Mon, 18 Jun 2018 15:26:21 +0000 (17:26 +0200)]
mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6)

7 years agoAdd ability to enable ceph mgr modules.
Fabien Brachere [Mon, 16 Oct 2017 13:04:23 +0000 (15:04 +0200)]
Add ability to enable ceph mgr modules.

(Couldn't find actual author email to add the signoff accordingly so I've
added it with my email to make the CI happy.)

(cherry picked from commit 3a587575d76ecfec050d3d12cd416853baf590af)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: skip tests for node iscsi-gw when deploying jewel
Guillaume Abrioux [Fri, 22 Jun 2018 09:56:24 +0000 (11:56 +0200)]
tests: skip tests for node iscsi-gw when deploying jewel

CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d560b562a2d439fbed34b3066de93dfdff650ff)

7 years agotests: fix broken test when collocated daemons scenarios
Guillaume Abrioux [Wed, 20 Jun 2018 11:44:08 +0000 (13:44 +0200)]
tests: fix broken test when collocated daemons scenarios

At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.

Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`

Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d83b24d27121591dafcba8297288c6f3a7ede42e)

7 years agotests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined
Guillaume Abrioux [Fri, 22 Jun 2018 09:20:33 +0000 (11:20 +0200)]
tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined

since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
>       pools = node["vars"]["rgw_create_pools"]
E       KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1c3dae4a90816ac6503967779b7fd77ff84900b5)

7 years agotests: factorize docker tests using docker_exec_cmd logic
Guillaume Abrioux [Mon, 25 Jun 2018 15:10:37 +0000 (17:10 +0200)]
tests: factorize docker tests using docker_exec_cmd logic

avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2e57a56db2801818135ba85479fedfc00eae30c)

7 years agotests: add mimic support for test_rbd_mirror_is_up()
Guillaume Abrioux [Thu, 5 Jul 2018 13:16:19 +0000 (15:16 +0200)]
tests: add mimic support for test_rbd_mirror_is_up()

prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 09d795b5b737a05164772f5e3ba469577d605344)

7 years agotests: fix `_get_osd_id_from_host()` in TestOSDs()
Guillaume Abrioux [Mon, 9 Jul 2018 09:51:24 +0000 (11:51 +0200)]
tests: fix `_get_osd_id_from_host()` in TestOSDs()

We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E       UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a65ec231d7f76655caa964e1d9228ba7a910dea)

7 years agotests: refact test_all_*_osds_are_up_and_in
Guillaume Abrioux [Fri, 22 Jun 2018 23:43:49 +0000 (01:43 +0200)]
tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d24086fc61ae84a59bd61a053cddb62941)

7 years agocommon: switch from docker module to docker_container
Guillaume Abrioux [Mon, 9 Jul 2018 13:50:52 +0000 (15:50 +0200)]
common: switch from docker module to docker_container

As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0746e08586556b7d44ba4b20cd553a27860f30b)

7 years agomon: ensure socker is purged when mon is stopped
Guillaume Abrioux [Tue, 10 Jul 2018 09:56:17 +0000 (11:56 +0200)]
mon: ensure socker is purged when mon is stopped

On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f54b3b4a7c1a7015def3a7987e3e9e426385251)

7 years agoceph-common: fix typo in task 2842/head v3.0.39
Sébastien Han [Thu, 5 Jul 2018 07:44:31 +0000 (09:44 +0200)]
ceph-common: fix typo in task

Somehow one option moved up and broke the yaml. Putting it to the right
place.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-common: fix rhcs condition v3.0.38
Sébastien Han [Wed, 4 Jul 2018 14:39:33 +0000 (16:39 +0200)]
ceph-common: fix rhcs condition

We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fcf11ecc3567398f92b9f91e1a0749edb921131f)

7 years agoceph-mon: Generate initial keyring
Ha Phan [Thu, 21 Jun 2018 08:08:39 +0000 (16:08 +0800)]
ceph-mon: Generate initial keyring

Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
(cherry picked from commit a7b7735b6fd23985d24a492f1bf4c5be7f1961b2)

7 years agotests: skip iscsi-gw nodes on jewel
Guillaume Abrioux [Tue, 3 Jul 2018 07:35:38 +0000 (09:35 +0200)]
tests: skip iscsi-gw nodes on jewel

On stable-3.0 we can't test iscsi-gw nodes on jewel because it's not
implemented yet.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoEnable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients v3.0.37
Vasu Kulkarni [Tue, 26 Jun 2018 21:41:14 +0000 (14:41 -0700)]
Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
(cherry picked from commit 1d454b611f9ec5403a474fcb45a6333ca6d36715)

7 years agotests: no need to remove partitions in lvm_setup.yml
Andrew Schoen [Mon, 12 Mar 2018 19:06:39 +0000 (14:06 -0500)]
tests: no need to remove partitions in lvm_setup.yml

Now that we are using ceph_volume_zap the partitions are
kept around and should be able to be reused.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 98e237d234872a66b064ef9e3284a8763dda9186)

7 years agotest: when creating the /dev/sdc2 partition specify label as gpt
Andrew Schoen [Mon, 30 Oct 2017 20:31:04 +0000 (15:31 -0500)]
test: when creating the /dev/sdc2 partition specify label as gpt

ansible==2.4 requires that label be set to gpt, or it will be defaulted
to msdos.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 37a48209ccf702a817b445e8b6164e38b60ea708)

7 years agoSkip GPT header creation for lvm osd scenario
Vishal Kanaujia [Wed, 16 May 2018 09:58:31 +0000 (15:28 +0530)]
Skip GPT header creation for lvm osd scenario

The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
(cherry picked from commit ef5f52b1f36188c3cab40337640a816dec2542fa)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge-docker: added conditionals needed to successfully re-run purge
Randy J. Martinez [Wed, 28 Mar 2018 23:46:54 +0000 (18:46 -0500)]
purge-docker: added conditionals needed to successfully re-run purge

Added 'ignore_errors: true' to multiple lines which run docker commands; even in cases where docker is no longer installed. Because of this, certain tasks in the purge-docker-cluster.yml will cause the playbook to fail if re-run and stop the purge. This leaves behind a dirty environment, and a playbook which can no longer be run.
Fix Regex line 275: Sometimes 'list-units' will output 4 spaces between loaded+active. The update will account for both scenarios.
purge fetch_directory: in other roles fetch_directory is hard linked ex.: "{{ fetch_directory }}"/"{{ somedir }}". That being said, fetch_directory will never have a trailing slash in the all.yml so this task was never being run(causing failures when trying to re-deploy).

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit d1f2d64b15479524d9e347303c8d5e4bfe7c15d8)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosystemd: remove changed_when: false
Sébastien Han [Thu, 28 Jun 2018 07:53:03 +0000 (09:53 +0200)]
systemd: remove changed_when: false

When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6239972716b013bafcb9313c9c83723615aa7d6)

# Conflicts:
# roles/ceph-iscsi-gw/tasks/container/containerized.yml

7 years agoceph-osd: trigger osd container restart on script change
Sébastien Han [Thu, 28 Jun 2018 07:54:24 +0000 (09:54 +0200)]
ceph-osd: trigger osd container restart on script change

The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit abdb53e16a7f46ceebbf4de65ed1add04da0d543)

7 years agotests: reduce the amount of time we wait
Guillaume Abrioux [Tue, 26 Jun 2018 11:42:27 +0000 (13:42 +0200)]
tests: reduce the amount of time we wait

This sleep 120 looks a bit long, let's reduce this to 30sec and see if
things go faster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 081600842ff3758910109d9f636b54cb12a85ed9)

7 years agomon: honour mon_docker_net_host option
Sébastien Han [Wed, 27 Jun 2018 09:23:00 +0000 (11:23 +0200)]
mon: honour mon_docker_net_host option

--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 322e2de7d27c498a9d89b31b96729200bed56e19)

7 years agotests: add more nodes in ooo testing scenario 2822/head
Guillaume Abrioux [Wed, 13 Jun 2018 14:46:40 +0000 (16:46 +0200)]
tests: add more nodes in ooo testing scenario

adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 481c14455aa3bcd05cc1c190458148a4d516e991)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomon/osd: bump container memory limit
Sébastien Han [Fri, 15 Jun 2018 19:39:34 +0000 (15:39 -0400)]
mon/osd: bump container memory limit

As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9ed3579ae680949b4f53aee94003ca50d1ae721)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: fix *_has_correct_value tests
Guillaume Abrioux [Tue, 19 Jun 2018 16:08:10 +0000 (18:08 +0200)]
tests: fix *_has_correct_value tests

It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f68936ca7e7f556c8d8cee8b2c4565a3c94f72f9)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: keep same ceph release during handlers/idempotency test
Guillaume Abrioux [Fri, 15 Jun 2018 08:44:25 +0000 (10:44 +0200)]
tests: keep same ceph release during handlers/idempotency test

since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21894655a7eaf0db6c99a46955e0e8ebc59a83af)

7 years agotests: increase memory to 1024Mb for centos7_cluster scenario
Guillaume Abrioux [Mon, 11 Jun 2018 08:49:39 +0000 (10:49 +0200)]
tests: increase memory to 1024Mb for centos7_cluster scenario

we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bbb869133563c3b0ddd0388727b894306b6b8b26)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: improve mds tests
Guillaume Abrioux [Wed, 6 Jun 2018 19:56:38 +0000 (21:56 +0200)]
tests: improve mds tests

the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c94ada69e80d7a1ddfbd2de2b13086d57a6fdfcd)

7 years agomon: copy openstack keys over to all mon
Guillaume Abrioux [Thu, 7 Jun 2018 07:09:38 +0000 (09:09 +0200)]
mon: copy openstack keys over to all mon

When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

This should have been backported from
433ecc7cbcc1ac91cab509dabe5c647d58c18c7f but this would implie too much
changes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: fix facts gathering delegation
Guillaume Abrioux [Tue, 5 Jun 2018 14:30:12 +0000 (16:30 +0200)]
rolling_update: fix facts gathering delegation

this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 232a16d77ff1048a2d3c4aa743c44e864fa2b80b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoplaybook: follow up on #2553
Guillaume Abrioux [Thu, 24 May 2018 13:07:56 +0000 (15:07 +0200)]
playbook: follow up on #2553

Since we fixed the `gather and delegate facts` task, this exception is
not needed anymore. It's a leftover that should be removed to save some
time when deploying a cluster with a large client number.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 828848017cefd981e14ca9e4690dd7d1320f0eef)

7 years agorgws: renames create_pools variable with rgw_create_pools.
jtudelag [Thu, 31 May 2018 15:01:44 +0000 (17:01 +0200)]
rgws: renames create_pools variable with rgw_create_pools.

Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 600e1e2c2680e8102f4ef17855d4bcd89d6ef733)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdds RGWs pool creation to containerized installation.
jtudelag [Sun, 4 Mar 2018 22:06:48 +0000 (23:06 +0100)]
Adds RGWs pool creation to containerized installation.

ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 8704144e3157aa253fb7563fe701d9d434bf2f3e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: skip disabling fastest mirror detection on atomic host
Guillaume Abrioux [Tue, 5 Jun 2018 07:31:42 +0000 (09:31 +0200)]
tests: skip disabling fastest mirror detection on atomic host

There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0cd4b065144843762b9deca667e05a1903b2121)

7 years agoceph-defaults: Enable local epel repository
Erwan Velu [Fri, 1 Jun 2018 16:53:10 +0000 (18:53 +0200)]
ceph-defaults: Enable local epel repository

During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
(cherry picked from commit 493f615eae3510021687e8cfc821364cc26a71ac)

7 years agoMakefile: followup on #2585 v3.0.36
Guillaume Abrioux [Thu, 31 May 2018 09:25:49 +0000 (11:25 +0200)]
Makefile: followup on #2585

Fix a typo in `tag` target, double quote are missing here.

Without them, the `make tag` command fails like this:

```
if [[ "v3.0.35" ==  ]]; then \
            echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; \
            exit 1; \
        fi
/bin/sh: -c: line 0: unexpected argument `]]' to conditional binary operator
/bin/sh: -c: line 0: syntax error near `;'
/bin/sh: -c: line 0: `if [[ "v3.0.35" ==  ]]; then     echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35";     exit 1; fi'
make: *** [tag] Error 2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b67f42feb95594fb403908d61383dc25d6cd342)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMakefile: add "make tag" command
Ken Dreyer [Thu, 10 May 2018 23:08:05 +0000 (17:08 -0600)]
Makefile: add "make tag" command

Add a new "make tag" command. This automates some common operations:

1) Automatically determine the next Git tag version number to create.
   For example:
   "3.2.0beta1 -> "3.2.0beta2"
   "3.2.0rc1 -> "3.2.0rc2"
   "3.2.0" -> "3.2.1"

2) Create the Git tag, and print instructions for the user to push it to
   GitHub.

3) Sanity check that HEAD is a stable-* branch or master (bail on
   everything else).

4) Sanity check that HEAD is not already tagged.

Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".

Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fcea56849578bd47e65b130ab6884e0b96f9d89d)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorgw: container add option to configure multi-site zone
Sébastien Han [Mon, 16 Apr 2018 13:57:23 +0000 (15:57 +0200)]
rgw: container add option to configure multi-site zone

You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1c084efb3cb7e48d96c9cbd6bd05ca4f93526853)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: resize root partition when atomic host
Guillaume Abrioux [Wed, 30 May 2018 07:17:09 +0000 (09:17 +0200)]
tests: resize root partition when atomic host

For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34f70428521ab30414ce8806c7e2967a7387ff00)

7 years agotests: avoid yum failures
Guillaume Abrioux [Mon, 28 May 2018 10:02:49 +0000 (12:02 +0200)]
tests: avoid yum failures

In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
(cherry picked from commit 98cb6ed8f602d9c54b63c5381a17dbca75df6bc2)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoAdd privilege escalation to iscsi purge tasks v3.0.35
Paul Cuzner [Fri, 25 May 2018 00:13:20 +0000 (12:13 +1200)]
Add privilege escalation to iscsi purge tasks

Without the escalation, invocation from non-root
users with fail when accessing the rados config
object, or when attempting to log to /var/log

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 2890b57cfc2e1ef9897a791ce60f4a5545011907)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-radosgw: disable NSS PKI db when SSL is disabled
Luigi Toscano [Tue, 22 May 2018 09:46:33 +0000 (11:46 +0200)]
ceph-radosgw: disable NSS PKI db when SSL is disabled

The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a122b7ce8ec6a5c27a70bc19589d177.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98312734e2f12a1ea5ef29981e9072bd)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFix restarting OSDs twice during a rolling update.
Subhachandra Chandra [Fri, 16 Mar 2018 17:10:14 +0000 (10:10 -0700)]
Fix restarting OSDs twice during a rolling update.

During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.

(cherry picked from commit c7e269fcf5620a49909b880f57f5cbb988c27b07)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: restart_osd_daemon unit spaces
Sébastien Han [Fri, 18 May 2018 12:43:57 +0000 (14:43 +0200)]
defaults: restart_osd_daemon unit spaces

Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail

It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2f43e9dab5f077276162069f449978ea97c2e9c0)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agopurge_cluster: fix dmcrypt purge
Guillaume Abrioux [Fri, 18 May 2018 15:56:03 +0000 (17:56 +0200)]
purge_cluster: fix dmcrypt purge

dmcrypt devices aren't closed properly, therefore, it may fail when
trying to redeploy after a purge.

Typical errors:

```
ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command
'/sbin/blkid' returned non-zero exit status 2
```

```
ceph-disk: Error: unable to read dm-crypt key:
/var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf:
/etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key
```

Closing properly dmcrypt devices allows to redeploy without error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9801bde4d4ce501208fc297d5cb0ab2e0aa28702)

7 years agopurge_cluster: wipe all partitions
Guillaume Abrioux [Wed, 16 May 2018 15:34:38 +0000 (17:34 +0200)]
purge_cluster: wipe all partitions

In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9247c4de78dec8a63f17400deb8b06ce91e7267)

7 years agopurge_cluster: fix bug when building device list
Guillaume Abrioux [Wed, 16 May 2018 14:04:25 +0000 (16:04 +0200)]
purge_cluster: fix bug when building device list

there is some leftover on devices when purging osds because of a invalid
device list construction.

typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
    "changed": true,
    "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
    "delta": "0:00:00.015188",
    "end": "2018-05-16 12:41:40.408597",
    "item": "/dev/sda sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:40.393409"
}

STDOUT:

sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600

STDERR:

Error: Could not stat device /dev/sda sda1 - No such file or directory.
```

the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output

eg:

```
changed: [osd3] => (item=/dev/sda1) => {
    "changed": true,
    "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
    "delta": "0:00:00.013634",
    "end": "2018-05-16 12:41:09.068166",
    "item": "/dev/sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:09.054532"
}

STDOUT:

/dev/sda sda1
```

For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9cad113e2f22132d08208cd58462f11056c41305)

7 years agoswitch: fix ceph_uid fact for osd
Guillaume Abrioux [Wed, 25 Apr 2018 12:20:35 +0000 (14:20 +0200)]
switch: fix ceph_uid fact for osd

In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit adeecc51f8adf7834b936b7cf6a1be7e6bb82d27)

7 years agoswitch: fix ceph_uid fact
Sébastien Han [Thu, 19 Apr 2018 08:28:56 +0000 (10:28 +0200)]
switch: fix ceph_uid fact

Latest is now centos not ubuntu anymore so the condition was wrong.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 767abb5de02c0ecdf81a18f6ca63f2e978d3d7a4)

7 years agoswitch: disable ceph-disk units
Sébastien Han [Wed, 16 May 2018 15:37:10 +0000 (17:37 +0200)]
switch: disable ceph-disk units

During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 49a47124859e6577fb99e6dd680c5244ccd6f38f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotake-over: fix bug when trying to override variable
Guillaume Abrioux [Thu, 17 May 2018 15:29:20 +0000 (17:29 +0200)]
take-over: fix bug when trying to override variable

A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.

Typical error:

```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```

Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 415dc0a29b10b28cbd047fe28eb4dd38419ea5dc)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: move osd flag section
Sébastien Han [Wed, 16 May 2018 14:02:41 +0000 (16:02 +0200)]
rolling_update: move osd flag section

During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d80a871a078a175d0775e91df00baf625dc39725)

7 years agoiscsi: add python-rtslib repository
Sébastien Han [Mon, 14 May 2018 07:21:48 +0000 (09:21 +0200)]
iscsi: add python-rtslib repository

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c7c11b774f54078b32b652481145699dbbd79ff)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoiscsi-gw: fix issue when trying to mask target
Guillaume Abrioux [Mon, 14 May 2018 15:39:25 +0000 (17:39 +0200)]
iscsi-gw: fix issue when trying to mask target

trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a145caf947aec64467150a007b7aafe57abe2891)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFIX: run restart scripts in `noexec` /tmp v3.0.34
Arano-kai [Mon, 6 Nov 2017 14:02:47 +0000 (16:02 +0200)]
FIX: run restart scripts in `noexec` /tmp

- One can not run scripts directly in place, that mounted with `noexec`
option. But one can run scripts as arguments for `bash/sh`.

Signed-off-by: Arano-kai <captcha.is.evil@gmail.com>
(cherry picked from commit 5cde3175aede783feb89cbbc4ebb5c2f05649b99)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: clean legacy syntax in ceph-osd-run.sh.j2
Guillaume Abrioux [Wed, 9 May 2018 01:10:30 +0000 (03:10 +0200)]
osd: clean legacy syntax in ceph-osd-run.sh.j2

Quick clean on a legacy syntax due to e0a264c7e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b387b506a21fd71eedd7aabab9f114353b63abc)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadds missing state needed to upgrade nfs-ganesha v3.0.33
Gregory Meno [Wed, 9 May 2018 18:17:26 +0000 (11:17 -0700)]
adds missing state needed to upgrade nfs-ganesha

in tasks for os_family Red Hat we were missing this

fixes: bz1575859
Signed-off-by: Gregory Meno <gmeno@redhat.com>
(cherry picked from commit 26f6a650425517216fb57c08e1a8bda39ddcf2b5)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: make the delegate_facts feature optional
Guillaume Abrioux [Tue, 31 Oct 2017 13:39:29 +0000 (14:39 +0100)]
common: make the delegate_facts feature optional

Since we encountered issue with this on ansible2.2, this commit provide
the ability to enable or disable it regarding which ansible we are
running.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 4596fbaac1322a4c670026bc018e3b5b061b072b)

7 years agoplaybook: improve facts gathering
Guillaume Abrioux [Thu, 3 May 2018 16:41:16 +0000 (18:41 +0200)]
playbook: improve facts gathering

there is no need to gather facts with O(N^2) way.
Only one node should gather facts from other node.

Fixes: #2553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 75733daf23d56008b246d8c05c5069303edd4197)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMake sure the restart_mds_daemon script is created with the correct MDS name
Simone Caronni [Thu, 5 Apr 2018 14:14:23 +0000 (16:14 +0200)]
Make sure the restart_mds_daemon script is created with the correct MDS name

(cherry picked from commit b12bf62c36955d1e502552f8fddb03f44d7d6fc7)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: enable Tools repo for rhcs clients
Sébastien Han [Tue, 8 May 2018 14:11:14 +0000 (07:11 -0700)]
common: enable Tools repo for rhcs clients

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574458
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 07ca91b5cb7e213545687b8a62c421ebf8dd741d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: copy iso files if rolling_update
Sébastien Han [Thu, 3 May 2018 14:54:53 +0000 (16:54 +0200)]
common: copy iso files if rolling_update

If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4a186237e6fdc98f779c2e25985da4325b3b16cd)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "add .vscode/ to gitignore" v3.0.32
Sébastien Han [Fri, 27 Apr 2018 11:21:16 +0000 (13:21 +0200)]
Revert "add .vscode/ to gitignore"

This reverts commit ce67b05292e224d640738bf506ce873680ff9b97.

7 years agoadd .vscode/ to gitignore
Sébastien Han [Wed, 4 Apr 2018 14:23:54 +0000 (16:23 +0200)]
add .vscode/ to gitignore

I personally dev on vscode and I have some preferences to save when it
comes to running the python unit tests. So escaping this directory is
actually useful.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3c4319ca4b5355d69b2925e916420f86d29ee524)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoshrink-osd: ability to shrink NVMe drives
Sébastien Han [Fri, 20 Apr 2018 09:13:51 +0000 (11:13 +0200)]
shrink-osd: ability to shrink NVMe drives

Now if the service name contains nvme we know we need to remove the last
2 character instead of 1.

If nvme then osd_to_kill_disks is nvme0n1, we need nvme0
If ssd or hdd then osd_to_kill_disks is sda1, we need sda

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 66c1ea8cd561fce6cfe5cdd1ecaa13411c824e3a)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotox: use container latest tag for upgrades
Sébastien Han [Thu, 5 Apr 2018 08:28:51 +0000 (10:28 +0200)]
tox: use container latest tag for upgrades

Currently tag-build-master-luminous-ubuntu-16.04 is not used anymore.
Also now, 'latest' points to CentOS so we need to make that switch here
too.

We know have latest tags for each stable release so let's use them and
point tox at them to deploy the right version.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 14eff6b571eb760e8afcdfefc063f1af06342809)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-defaults: fix ceph_uid fact on container deployments
Randy J. Martinez [Thu, 29 Mar 2018 04:17:02 +0000 (23:17 -0500)]
ceph-defaults: fix ceph_uid fact on container deployments

Red Hat is now using tags[3,latest] for image rhceph/rhceph-3-rhel7.
Because of this, the ceph_uid conditional passes for Debian
when 'ceph_docker_image_tag: latest' on RH deployments.
I've added an additional task to check for rhceph image specifically,
and also updated the RH family task for ceph/daemon [centos|fedora]tags.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit 127a643fd0ce4d66a5243b789ab0905e54e9d960)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorhcs: re-add apt-pining
Sébastien Han [Tue, 17 Apr 2018 13:59:52 +0000 (15:59 +0200)]
rhcs: re-add apt-pining

When installing rhcs on Debian systems the red hat repos must have the
highest priority so we avoid packages conflicts and install the rhcs
version.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1565850
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a98885a71ec63ff129d7001301a0323bfaadad8a)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: check only 1 time if there is a running cluster
Guillaume Abrioux [Mon, 9 Apr 2018 16:07:31 +0000 (18:07 +0200)]
defaults: check only 1 time if there is a running cluster

There is no need to check for a running cluster n*nodes time in
`ceph-defaults` so let's add a `run_once: true` to save some resources
and time.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 899b0eb4514a9b1e6929dd5abf415195085c4e1d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agosetup cephx keys when not nfs_obj_gw
Patrick Donnelly [Sat, 10 Mar 2018 19:27:10 +0000 (11:27 -0800)]
setup cephx keys when not nfs_obj_gw

Copy the admin key when configured nfs_file_gw (but not nfs_obj_gw). Also,
copy/setup RGW related directories only when configured as nfs_obj_gw.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 7f91547304349199bf10a636b4e10ccaf20a4212)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agocommon: add tools repo for iscsi gw
Sébastien Han [Thu, 12 Apr 2018 10:15:35 +0000 (12:15 +0200)]
common: add tools repo for iscsi gw

To install iscsi gw packages we need to enable the tools repo.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1547849
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 37117071ebb7ab3cf68b607b6760077a2b46a00d)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agonfs: ensure nfs-server server is stopped v3.0.31
Ali Maredia [Mon, 2 Apr 2018 17:47:31 +0000 (13:47 -0400)]
nfs: ensure nfs-server server is stopped

NFS-ganesha cannot start is the nfs-server service
is running. This commit stops nfs-server in case it
is running on a (debian, redhat, suse) node before
the nfs-ganesha service starts up

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508506

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 01c58695fc344d65876b3acaea4f915f896401ac)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomds: to support copy_admin_keyring
vasishta p shastry [Tue, 10 Apr 2018 12:39:43 +0000 (18:09 +0530)]
mds: to support copy_admin_keyring

(cherry picked from commit db3a5ce6d917e399236163f7de097f1b40a9a26c)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFixed a typo (extra space)
vasishta p shastry [Tue, 10 Apr 2018 13:37:35 +0000 (19:07 +0530)]
Fixed a typo (extra space)

(cherry picked from commit 020e66c1b4374956a4bd8882d729eed65a3e3f90)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosd: to support copy_admin_key
vasishta p shastry [Tue, 10 Apr 2018 13:21:50 +0000 (18:51 +0530)]
osd: to support copy_admin_key

(cherry picked from commit e1a1f81b6fdab41ac051cbf5f29eb101df3b50da)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agonfs: to support copy_admin_key - containerized
vasishta p shastry [Tue, 10 Apr 2018 12:37:11 +0000 (18:07 +0530)]
nfs: to support copy_admin_key - containerized

(cherry picked from commit 6b59416f7596d7c62c46b8a607f9a1eb9988689e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: fix backward compatibility
Guillaume Abrioux [Mon, 9 Apr 2018 11:02:44 +0000 (13:02 +0200)]
defaults: fix backward compatibility

backward compatibility with `ceph_mon_docker_interface` and
`ceph_mon_docker_subnet` was not working since there wasn't lookup on
`monitor_interface` and `public_network`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 66c4118dcd0c8e7a7081bce5c8d6ba7752b959fd)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: upgrade/install ceph-test RPM first v3.0.30
Ken Dreyer [Thu, 5 Apr 2018 19:40:15 +0000 (13:40 -0600)]
common: upgrade/install ceph-test RPM first

Prior to this change, if a user had ceph-test-12.2.1 installed, and
upgraded to ceph v12.2.3 or newer, the RPM upgrade process would
fail.

The problem is that the ceph-test RPM did not depend on an exact version
of ceph-common until v12.2.3.

In Ceph v12.2.3, ceph-{osdomap,kvstore,monstore}-tool binaries moved
from ceph-test into ceph-base. When ceph-test is not yet up-to-date, Yum
encounters package conflicts between the older ceph-test and newer
ceph-base.

When all users have upgraded beyond Ceph < 12.2.3, this is no longer
relevant.

(cherry picked from commit 3752cc6f38dbf476845e975e6448225c0e103ad6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoDeploying without managed monitors failed v3.0.29
Attila Fazekas [Wed, 4 Apr 2018 13:30:55 +0000 (15:30 +0200)]
Deploying without managed monitors failed

Tripleo deployment failed when the monitors not manged
by tripleo itself with:
    FAILED! => {"msg": "list object has no element 0"}

The failing play item was introduced by
 f46217b69ae18317cb0c1cc3e391a0bca5767eb6 .

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1552327

Signed-off-by: Attila Fazekas <afazekas@redhat.com>
(cherry picked from commit ecd3563c2128553d4145a2f9c940ff31458c33b4)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-iscsi: fix certificates generation and distribution
Sébastien Han [Tue, 3 Apr 2018 13:20:06 +0000 (15:20 +0200)]
ceph-iscsi: fix certificates generation and distribution

Prior to this patch, the certificates where being generated on a single
node only (because of the run_once: true). Thus certificates were not
distributed on all the gateway nodes.

This would require a second ansible run to work. This patches fix the
creation and keys's distribution on all the nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1540845
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f3caee84605e17f1fdfa4add634f0bf2c2cd510e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodo not delegate facts on client nodes
Guillaume Abrioux [Wed, 21 Mar 2018 18:01:51 +0000 (19:01 +0100)]
do not delegate facts on client nodes

This commit is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1550977

We iterate over all nodes on each node and we delegate the facts gathering.
This is high memory consuming when having a large number of nodes in the
inventory.
That way of gathering is not necessary for clients node so we can simply
gather local facts for these nodes.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5b73be254d249a23ac2eb2f86c4412ef296352a9)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-mds: delete duplicate tasks which cause multimds container deployments to fail.
Randy J. Martinez [Thu, 29 Mar 2018 00:15:19 +0000 (19:15 -0500)]
ceph-mds: delete duplicate tasks which cause multimds container deployments to fail.

This update will resolve error['cephfs' is undefined.] in multimds container deployments.
See: roles/ceph-mon/tasks/create_mds_filesystems.yml. The same last two tasks are present there, and actully need to happen in that role since "{{ cephfs }}" gets defined in
roles/ceph-mon/defaults/main.yml, and not roles/ceph-mds/defaults/main.yml.

Signed-off-by: Randy J. Martinez <ramartin@redhat.com>
(cherry picked from commit ca572a11f1eb7ded5583c8d8b810a42db61cd98f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocleanup osd.conf.j2 in ceph-osd
Ning Yao [Fri, 23 Mar 2018 15:48:16 +0000 (23:48 +0800)]
cleanup osd.conf.j2 in ceph-osd

osd crush location is set by ceph_crush in the library,
osd.conf.j2 is not used any more.

Signed-off-by: Ning Yao <yaoning@unitedstack.com>
(cherry picked from commit 691ddf534989b4d27dc41997630b3307436835ea)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-osd note that some scenarios use ceph-disk vs. ceph-volume
Alfredo Deza [Wed, 28 Mar 2018 20:40:04 +0000 (16:40 -0400)]
ceph-osd note that some scenarios use ceph-disk vs. ceph-volume

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 3fcf966803e35d7ba30e7c1b0ba78db94c664594)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-defaults: set is_atomic variable
Andrew Schoen [Tue, 20 Mar 2018 19:13:28 +0000 (14:13 -0500)]
ceph-defaults: set is_atomic variable

This variable is needed for containerized clusters and is required for
the ceph-docker-common role. Typically the is_atomic variable is set in
site-docker.yml.sample though so if ceph-docker-common is used outside
of that playbook it needs set in another way. Moving the creation of
the variable inside this role means playbooks don't need to worry
about setting it.

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1558252

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 6cffbd5409353fc1ce05b3a4a6246d6ef244e731)

7 years agoFix config_template to consistently order sections
Andy McCrae [Fri, 16 Mar 2018 15:24:53 +0000 (15:24 +0000)]
Fix config_template to consistently order sections

In ec042219e64a321fa67fce0384af76eeb238c645 we added OrderedDict and
sorted to be able to preserve order for config_template k,v pairs inside
a section.

This patch adds a similar ordering for the sections themselves, which
could still change order and intiiate handler restarts.

OrderedDict isn't needed because we use .items() to return a list that
can then be sorted().

(cherry picked from commit fe4ba9d1353abb49775d5541060a55919978f45f)

7 years agocommon: run updatedb task on debian systems only v3.0.28
Sébastien Han [Thu, 1 Mar 2018 16:33:33 +0000 (17:33 +0100)]
common: run updatedb task on debian systems only

The command doesn't exist on Red Hat systems so it's better to skip it
instead of ignoring the error.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit cb0f598965d0619dd4f44a8f991af539b67c6f38)

7 years agorgw: add cluster name option to the handler
Sébastien Han [Thu, 1 Mar 2018 15:50:06 +0000 (16:50 +0100)]
rgw: add cluster name option to the handler

If the cluster name is different than 'ceph', the command will fail so
we need to pass the cluster name.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7f19df81964c669f649d9f6eb5104022b421eea3)

7 years agoci: add copy_admin_key test to container scenario
Sébastien Han [Thu, 1 Mar 2018 15:47:37 +0000 (16:47 +0100)]
ci: add copy_admin_key test to container scenario

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fd94840a6ef130c6e142e9b5c5138bb11c621d37)

7 years agorgw: ability to copy ceph admin key on containerized
Sébastien Han [Thu, 1 Mar 2018 15:47:22 +0000 (16:47 +0100)]
rgw: ability to copy ceph admin key on containerized

If we now set copy_admin_key while running a containerized scenario, the
ceph admin key will be copied on the node.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9c85280602142fa1fb60c6f15c6d0c9e8c62d401)

7 years agorgw: run the handler on a mon host
Sébastien Han [Thu, 1 Mar 2018 15:46:01 +0000 (16:46 +0100)]
rgw: run the handler on a mon host

In case the admin wasn't copied over to the node this command would
fail. So it's safer to run it from a monitor directly.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 67f46d8ec362b7b8aacb91e009e528b5e62d48ac)