git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

Guillaume Abrioux [Thu, 19 Jul 2018 21:38:08 +0000 (23:38 +0200)]

tests: add support of 'ooo-collocation' scenario when testing against ceph dev

The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 09:07:49 +0000 (11:07 +0200)]

tests: support update scenarios in test_rbd_mirror_is_up()

`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 15:38:09 +0000 (17:38 +0200)]

tests: stop hardcoding ansible version

In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 15:46:27 +0000 (17:46 +0200)]

validate: only run osd test on osd node

Do not run device validation on every hosts, only on OSD nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 13:54:33 +0000 (15:54 +0200)]

docs: update doc

- stable-2.1 and stable-2.2 shouldn't be referenced anymore.
- add stable-3.1 branch reference
- update the differents ansible version supported

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 14:20:47 +0000 (16:20 +0200)]

shrink-osd: purge osd on containerized deployment

Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 13:59:22 +0000 (15:59 +0200)]

valide: improve device check

We know make sure that:

* devices are actually block special files
* length of dedicated_device is identical to devices

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 17 Jul 2018 08:47:28 +0000 (10:47 +0200)]

tests: add latest-bis-jewel for jewel tests

since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 13 Jul 2018 08:10:51 +0000 (10:10 +0200)]

nfs: change default stable branch for nfs-ganesha repo

Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Jun 2018 09:32:00 +0000 (11:32 +0200)]

tests: followup on #2656

34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 12 Jul 2018 09:31:00 +0000 (11:31 +0200)]

validate: force ansible version

We currently only support Ansible 2.4.X so let's fail if the version is
different.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 11 Jul 2018 14:03:10 +0000 (16:03 +0200)]

client: do not rely on copy_admin_key to import keys

Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Andy McCrae [Thu, 12 Jul 2018 11:24:15 +0000 (12:24 +0100)]

Sync config_template with upstream for Ansible 2.6

The original_basename option in the copy module changed to be
_original_basename in Ansible 2.6+, this PR resyncs the config_template
module to allow this to work with both Ansible 2.6+ and before.

Additionally, this PR removes the _v1_config_template.py file, since
ceph-ansible no longer supports versions of Ansible before version 2,
and so we shouldn't continue to carry that code.

Closes: #2843
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 11 Jul 2018 14:34:09 +0000 (16:34 +0200)]

mgr: fix condition to add modules to ceph-mgr

Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 12 Jul 2018 12:10:15 +0000 (14:10 +0200)]

Update issue templates

Introduce templates for issues and feature requests.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Jul 2018 09:56:17 +0000 (11:56 +0200)]

mon: ensure socker is purged when mon is stopped

On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Jul 2018 13:50:52 +0000 (15:50 +0200)]

common: switch from docker module to docker_container

As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Jul 2018 09:51:24 +0000 (11:51 +0200)]

tests: fix `_get_osd_id_from_host()` in TestOSDs()

We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Shilpa Jagannath [Tue, 3 Jul 2018 09:43:07 +0000 (15:13 +0530)]

Remove zone from zonegroup and update period before deleting the zone to avoid inconsistent period information across other zones.

When you delete a zone without removing from zonegroup, the period update would
fail since that command needs to load the zone and zonegroup to be able to
update the master. Period update would fail with an error like this:

radosgw-admin period update --commit
-1 Cannot find zone id= (name=), switching to local zonegroup configuration
-1 Cannot find zone id= (name=)

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 3 Jul 2017 12:52:07 +0000 (14:52 +0200)]

common: remove hdparm

As of Kraken, the journal code does not use the hdparm command anymore
so we can remove it from our package dependency list.

Fixes: https://github.com/ceph/ceph-ansible/issues/1402
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)

commit | commitdiff | tree

Guillaume Abrioux [Fri, 6 Jul 2018 08:57:32 +0000 (10:57 +0200)]

tox: test mimic deployment

Let's try to deploy mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d2cdfd830c339738b43e6a4417ca5d41fe416f86)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 31 May 2018 10:02:26 +0000 (12:02 +0200)]

tests: refact ci testing master

We should test ceph-ansible against the latest ansible stable version on
master.

This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 5 Jul 2018 13:16:19 +0000 (15:16 +0200)]

tests: add mimic support for test_rbd_mirror_is_up()

prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 5 Jul 2018 12:10:33 +0000 (14:10 +0200)]

ceph-config: do not log cluster log on container

The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 4 Jul 2018 14:39:33 +0000 (16:39 +0200)]

ceph-common: fix rhcs condition

We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:35:20 +0000 (10:35 +0200)]

docs: make "OSD Configuration" a subsection

"OSD Configuration" should be part of the "Configuration and Usage" section.

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:42:34 +0000 (10:42 +0200)]

docs: change github/Github to GitHub

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:39:52 +0000 (10:39 +0200)]

docs: use apt instead of apt-get

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 18 Jun 2018 15:26:21 +0000 (17:26 +0200)]

mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Vishal Kanaujia [Wed, 13 Jun 2018 10:14:52 +0000 (15:44 +0530)]

Rolling upgrades: Migrate to ceph-key module

This change moves ceph-mgr upgrades to using ceph-key library.
Fixes: #2758
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 10:10:16 +0000 (12:10 +0200)]

ceph-client: do not kill the dummy container

The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 2 Jul 2018 14:07:10 +0000 (16:07 +0200)]

ci: remove DCO

We know a Signed-off-by check inside our pipeline so this bot is not
needed anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 09:48:01 +0000 (11:48 +0200)]

ceph-mds: enable application pool

We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 09:46:56 +0000 (11:46 +0200)]

ceph-defaults: add default application to pool

We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Vasu Kulkarni [Tue, 26 Jun 2018 21:41:14 +0000 (14:41 -0700)]

Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>

commit | commitdiff | tree

Andy McCrae [Thu, 21 Jun 2018 09:28:27 +0000 (10:28 +0100)]

Sync config_template with upstream

Some fixes have gone into
git.openstack.org/openstack/ansible-config_template to deal with a few
bugs we have run into.

This PR brings the ceph-ansible config_template version up to the same
as the ansible-config_template openstack repo.

Closes: #2742
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Sébastien Han [Thu, 28 Jun 2018 07:54:24 +0000 (09:54 +0200)]

ceph-osd: trigger osd container restart on script change

The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 28 Jun 2018 07:53:03 +0000 (09:53 +0200)]

systemd: remove changed_when: false

When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

George Shuklin [Mon, 25 Jun 2018 13:12:56 +0000 (16:12 +0300)]

Add ceph_keyring_permissions variable to control permissions for
keyring files in /etc/ceph. Default value is the same as it was (0600),
but this variable allows user to override it (f.e. set it to 0640).

Signed-off-by: George Shuklin <george.shuklin@gmail.com>

commit | commitdiff | tree

Ha Phan [Thu, 21 Jun 2018 08:08:39 +0000 (16:08 +0800)]

ceph-mon: Generate initial keyring

Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>

commit | commitdiff | tree

Ha Phan [Tue, 5 Jun 2018 08:43:55 +0000 (16:43 +0800)]

Generate a copy of ceph.conf locally

Refers to #2697

This change creates a copy of `ceph.conf` in ansible server.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>

commit | commitdiff | tree

Andy McCrae [Wed, 27 Jun 2018 13:05:44 +0000 (14:05 +0100)]

Fix package state for upgrades on SuSE/RHEL

During 226f80c22bf61a7e8f00f4cca5f35eda67280250 only Debian package
installs had the correct state set to ensure packages were upgraded when
the "upgrade_ceph_packages" var was set to true.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Sébastien Han [Wed, 27 Jun 2018 09:23:00 +0000 (11:23 +0200)]

mon: honour mon_docker_net_host option

--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 26 Jun 2018 11:42:27 +0000 (13:42 +0200)]

tests: reduce the amount of time we wait

This sleep 120 looks a bit long, let's reduce this to 30sec and see if
things go faster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 25 Jun 2018 15:10:37 +0000 (17:10 +0200)]

tests: factorize docker tests using docker_exec_cmd logic

avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 22 Jun 2018 23:43:49 +0000 (01:43 +0200)]

tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 22 Jun 2018 09:56:24 +0000 (11:56 +0200)]

tests: skip tests for node iscsi-gw when deploying jewel

CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 22 Jun 2018 09:20:33 +0000 (11:20 +0200)]

tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined

since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Jun 2018 11:44:08 +0000 (13:44 +0200)]

tests: fix broken test when collocated daemons scenarios

At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.

Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`

Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Julien Danjou [Tue, 26 Jun 2018 09:00:24 +0000 (11:00 +0200)]

Use rebase as merge strategy for PR

Signed-off-by: Julien Danjou <julien@danjou.info>

commit | commitdiff | tree

Julien Danjou [Tue, 26 Jun 2018 09:00:24 +0000 (11:00 +0200)]

Use rebase as merge strategy for PR

Signed-off-by: Julien Danjou <julien@danjou.info>

commit | commitdiff | tree

Sébastien Han [Mon, 25 Jun 2018 14:42:48 +0000 (16:42 +0200)]

add mergify

Mergify automatically merges pull requests when they're ready so you
don't have to. You set the rules, it does the rest.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Michel Rode [Mon, 18 Jun 2018 16:33:57 +0000 (18:33 +0200)]

Added 'squash' as a parameter to nfs-ganesha.

Set the default to 'root_squash' - which is the default of nfs-ganesha.

Signed-off-by: Michel Rode <rmichel@devnu11.net>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 22 Jun 2018 07:05:37 +0000 (09:05 +0200)]

doc: Update CONTRIBUTING.md

Let's add more information which could avoid contributors to waste their time
and CI resources.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Yaniv Kaul [Thu, 21 Jun 2018 09:46:05 +0000 (12:46 +0300)]

Fix RHEL based Ansible installation

It is now on its own channel, not extras.

Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 20 Jun 2018 14:52:38 +0000 (16:52 +0200)]

Revert "tests: add more verbosity in testinfra"

This reverts commit 68eb850b27cd65571b9654fe7c1fd9c1f8435fdd.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Christian Zunker [Wed, 20 Jun 2018 05:01:06 +0000 (07:01 +0200)]

reset failed count of ceph-mgr

Depending on your setup, ceph-mgr might get restarted multiple times.
When this is done to fast, systemd will prevent further restarts because of
configured limits in the ceph-mgr systemd unit file.

Resetting the failure count will prevent this problem. The reset is done before
the restart so in case of a real problem during the restart it still fails.

Fixes: #2768
Signed-off-by: Christian Zunker <christian.zunker@codecentric.cloud>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 19 Jun 2018 14:51:20 +0000 (16:51 +0200)]

tests: add more verbosity in testinfra

that may be helpful to know why a test has been skipped.

from pytest doc:

```
  -r chars              show extra test summary info as specified by chars
                        (f)ailed, (E)error, (s)skipped, (x)failed, (X)passed,
                        (p)passed, (P)passed with output, (a)all except pP.
                        Warnings are displayed at all times except when
                        --disable-warnings is set
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 19 Jun 2018 16:08:10 +0000 (18:08 +0200)]

tests: fix *_has_correct_value tests

It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 13 Jun 2018 14:46:40 +0000 (16:46 +0200)]

tests: add more nodes in ooo testing scenario

adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 15 Jun 2018 19:53:47 +0000 (15:53 -0400)]

common: start firewalld if configure_firewall

Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 15 Jun 2018 19:39:34 +0000 (15:39 -0400)]

mon/osd: bump container memory limit

As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 15 Jun 2018 08:44:25 +0000 (10:44 +0200)]

tests: keep same ceph release during handlers/idempotency test

since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 13 Jun 2018 11:54:59 +0000 (13:54 +0200)]

client: try to kill dummy container only on first client node

The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 4 Jun 2018 19:52:47 +0000 (12:52 -0700)]

ceph-mds: do not enable multimds on jewel

Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d745e4667b5d757433efec9dac0150bc
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 8 Jun 2018 19:15:27 +0000 (21:15 +0200)]

core: make ansible pinning to latest ansible 2.4

it looks "ansible~=2.4" install latest ansible release in 2.5 so we must
specify we want latest release but inferior to 2.5.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 11 Jun 2018 12:51:58 +0000 (14:51 +0200)]

common: ability to enable/disable fw configuration

Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 11 Jun 2018 08:49:39 +0000 (10:49 +0200)]

tests: increase memory to 1024Mb for centos7_cluster scenario

we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 11 Jun 2018 13:40:54 +0000 (15:40 +0200)]

tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous

Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 11 Jun 2018 08:16:26 +0000 (10:16 +0200)]

validate: be more explicit with error msg when notario isn't installed

This error message may be confusing and need to be more explicit on
where you have to install notario, indeed, people may think this library
must be installed on configured nodes while it must be installed on the
node you are running the playbook.

Fixes: #2649
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Konstantin Shalygin [Fri, 8 Jun 2018 18:03:00 +0000 (01:03 +0700)]

ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.

If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>

commit | commitdiff | tree

Vishal Kanaujia [Fri, 8 Jun 2018 07:04:49 +0000 (12:34 +0530)]

Fix to run secure cluster only once in a run

The current secure cluster play runs with all the monitors. The rerun
of this task is unnecessary and can be skipped.

Fixes: #2737
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>

commit | commitdiff | tree

Sébastien Han [Fri, 8 Jun 2018 10:01:16 +0000 (18:01 +0800)]

test: only on containerized iscsi

We don't have the same service running on non-container for now, this
will change soon but for let's only run the test on container.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 8 Jun 2018 06:49:37 +0000 (08:49 +0200)]

client: keyrings aren't created when single client node

combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 6 Jun 2018 06:41:46 +0000 (14:41 +0800)]

contrib: fix generate group_vars samples

For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 6 Jun 2018 04:07:33 +0000 (12:07 +0800)]

ceph-iscsi: rename group iscsi_gws

Let's try to avoid using dashes as testinfra needs to be able to read
the groups.
Typically, with iscsi-gws we can't add a marker for these iscsi nodes,
using an underscore fixes the issue.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 29 Mar 2018 10:19:29 +0000 (12:19 +0200)]

ci: add functionnal tests for iscsi

We test if:

* packages are installed
* services are runnning
* service units are enabled

Also fix linting issues

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 23 Mar 2018 04:06:58 +0000 (12:06 +0800)]

site-docker: add iscsi role

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 23 Mar 2018 03:27:35 +0000 (11:27 +0800)]

ci: add iscsi test

Add iscsi CI coverage, this will now deploy iscsi gateways in container.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 23 Mar 2018 03:24:56 +0000 (11:24 +0800)]

ceph-iscsi: support for containerize deployment

We now have the ability to deploy a containerized version of ceph-iscsi.
The result is similar to the non-containerized version, you simply have
3 containers running for the following services:

* rbd-target-api
* rbd-target-gw
* tcmu-runner

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1508144
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 23 May 2018 14:10:39 +0000 (09:10 -0500)]

pin version of ansible to 2.4 in requirements.txt

This is the latest version that we support. If we don't pin this we
get a 2.5.x version installed that causes the playbook to fail in
various ways.

Fixes: https://github.com/ceph/ceph-ansible/issues/2631
Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 6 Jun 2018 19:59:47 +0000 (14:59 -0500)]

tests: increase ssh timeout and retries in ansible.cfg

We see quite a few failures in the CI related to testing nodes losing
ssh connection. This modification allows ansible to retry more times and
wait longer before timing out. This seems to really affect testing
scenarios that use a large amount of testing nodes. The centos7_cluster
scenario specifically has 12 nodes and suffered from these failures
often.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 7 Jun 2018 14:11:39 +0000 (16:11 +0200)]

tests: update ooo inventory hostfile

Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 7 Jun 2018 13:49:03 +0000 (15:49 +0200)]

client: add a default value for keyring file

Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Jun 2018 21:42:08 +0000 (23:42 +0200)]

tests: add a dummy value for 'dev' release

Functional tests are broken when testing against 'dev' release (ceph).
Adding a dummy value here will make it possible to run ceph-ansible CI
against dev ceph release.

Typical error:

```
> if request.node.get_marker("from_luminous") and ceph_release_num[ceph_stable_release] < ceph_release_num['luminous']:
E KeyError: 'dev'
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fd1487d93f21b609a637053f5b33cd2a4e408d00)

commit | commitdiff | tree

Andrew Schoen [Mon, 4 Jun 2018 15:36:32 +0000 (10:36 -0500)]

ceph-common: move firewall checks after package installation

We need to do this because on dev or rhcs installs ceph_stable_release
is not mandatory and the firewall check tasks have a task that is
conditional based off the installed version of ceph. If we perform those
checks after package install then they will not fail on dev or rhcs
installs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 6 Jun 2018 11:59:26 +0000 (13:59 +0200)]

client: use dummy created container when there is no mon in inventory

the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 6 Jun 2018 19:56:38 +0000 (21:56 +0200)]

tests: improve mds tests

the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 6 Jun 2018 17:13:18 +0000 (19:13 +0200)]

osd: copy openstack keys over to all mon

When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Jun 2018 14:30:12 +0000 (16:30 +0200)]

rolling_update: fix facts gathering delegation

this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 4 Jun 2018 19:58:57 +0000 (12:58 -0700)]

change max_mds default to 1

Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d745e4667b5d757433efec9dac0150bc

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Jun 2018 07:31:42 +0000 (09:31 +0200)]

tests: skip disabling fastest mirror detection on atomic host

There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 5 Jun 2018 09:26:11 +0000 (11:26 +0200)]

tests: fix rgw tests

41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Vishal Kanaujia [Tue, 5 Jun 2018 08:01:35 +0000 (13:31 +0530)]

Syntax error fix in rgw multisite role

This checkin fixes a syntax error in RGW multisite role under when
clause.

Fixes: #2704
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>

commit | commitdiff | tree

Sébastien Han [Tue, 5 Jun 2018 03:56:55 +0000 (11:56 +0800)]

test: do not always copy admin key

The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Jun 2018 15:33:54 +0000 (17:33 +0200)]

rgw: refact rgw pools creation

Refact of 8704144e3157aa253fb7563fe701d9d434bf2f3e
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Ha Phan [Mon, 4 Jun 2018 09:36:48 +0000 (17:36 +0800)]

Use python instead of python2

The initial keyring is generated from ansible server locally and the snippet works well for both v2 and v3 of python.

I don't see any reason why we should explicitly invoke`python2` instead of just `python`.

In some setups, `python2` is not symlinked to `python`; while `python` and `python3` refer to v2 and v3 respectively.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>

commit | commitdiff | tree

Sébastien Han [Mon, 4 Jun 2018 02:40:14 +0000 (10:40 +0800)]

ceph-common: add firewall rules for ceph-mgr

Prior to this commit the firewall tasks were not opening the ceph-mgr
ports. This would lead to unclean configuration since the ceph-mgr
daemons can not connect to the OSDs.
Thi commit opens the right ports on the ceph-mgr nodes to talk with the
OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1526400
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Vishal Kanaujia [Wed, 30 May 2018 07:55:18 +0000 (13:25 +0530)]

Rolling upgrades should use norebalance flag for OSDs

The rolling upgrades playbook should have norebalance flag set for
OSDs upgrades to wait only for recovery.

Fixes: #2657
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>

commit | commitdiff | tree

Erwan Velu [Fri, 1 Jun 2018 16:53:10 +0000 (18:53 +0200)]

ceph-defaults: Enable local epel repository

During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom