git-server-git.apps.pok.os.sepia.ceph.com Git

lv-create: add an example logfile_path config option in lv_vars.yml

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 131796f2750f1209a019ae75a500e6f1a1ab37f8)
Signed-off-by: Sébastien Han <seb@redhat.com>

tests: adds a testing scenario for lv-create and lv-teardown

Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 810cc47892e53701485c540ff51c00c860ea0a00)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-teardown: fail silently if lv_vars.yml is not found

This allows user to opt out of using lv_vars.yml and load configuration
from other sources.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b0bfc173510ec7d5da715c0048e633a8fe3d2a4d)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-teardown: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 8424858b40bafebe3569b33279e4d8d824e2276b)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: fail silenty if lv_vars.yml is not found

If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e43eec57bb44bf5f7a10da8548ca22a8772c2d92)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit fde47be13cc753218153b3dbc0db5a4daa752b21)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: use the template module to write log file

The copy module will not expand the template and render the variables
included, so we must use template.

Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 35301b35af4e71713edb944eb54654b587710527)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/vars/lv_vars.yaml: minor fixes

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 909b38da829485b2ec56b61bf2b2fa0df02b0ed4)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: use tempfile to create logfile

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit f65f3ea89fdba98057172e32e1a43ee6370c04d9)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 65fdad072386698ab9b6f7962107b6000f1f8378)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: copy without using a template file

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 50a6d8141ced888f9c1ce5e90f9461e6a101d5bc)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: don't use action to copy

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 186c4e11c7832eeb676409171d401a0ba2864f2a)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks: standardize variable usage with a space after brackets

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 9d43806df9fe44496dd48c2b7d3bef2e59d92365)
Signed-off-by: Sébastien Han <seb@redhat.com>

vars/lv_vars.yaml: remove journal_device

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit e0293de3e72e11f3aeba2d84b24bba1f7839ab57)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals

These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.

Users should follow the guidelines set in var/lv_vars.yaml

After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 1f018d861267a2bde7d8f2179d47d673bfcfb13f)
Signed-off-by: Sébastien Han <seb@redhat.com>

Revert "osd: generate device list for osd_auto_discovery on rolling_update"

This reverts commit e84f11e99ef42057cd1c3fbfab41ef66cda27302.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3149b2564fb89f2352820d83be02c09f658bdf60)

rolling_update: register container osd units

Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dad10e8f3f67f1e0c6a14ef3e0b1f51f90d9d962)

contrib: fix generate group_vars samples

For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602327
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 315ab08b1604e655eee4b493eb2c1171a67df506)

Use /var/lib/ceph/osd folder to filter osd mount point

In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.

Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>
(cherry picked from commit 85cc61a6d9f23cc98a817ea988c8b50e6c55698f)

stable 3.1 igw: add api setting support

Port the parts of this upstream commit:

commit 91bf53ee932a6748c464bea762f8fb6f07f11347
Author: Sébastien Han <seb@redhat.com>
Date: Fri Mar 23 11:24:56 2018 +0800

ceph-iscsi: support for containerize deployment

that allows configuration of
API settings in roles/ceph-iscsi-gw/templates/iscsi-gateway.cfg.j2
using the iscsi-gws.yml.

This fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1613963

Signed-off-by: Mike Christie <mchristi@redhat.com>

stable 3.1 igw: enable and start rbd-target-api

Backport
https://github.com/ceph/ceph-ansible/pull/2984
to stable 3.1.

From upstream commit:

commit 1164cdc002cccb9dc1c6f10fb6b4370eafda3c4b
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date: Thu Aug 2 11:58:47 2018 +0200

iscsigw: install ceph-iscsi-cli package

installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This just enables and starts it.

This fixes Red Hat BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1613963.

Signed-off-by: Mike Christie <mchristi@redhat.com>

group_vars: resync missing options

resync group_vars file with the defaults/main.yml files.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2dd75a1e6ed6d8b323ef647cbb5cf1d2a2f90154)

fail if fqdn deployment attempted

fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

config: ensure rgw section has the correct name

the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.

We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f422efb1d6b56ce56a7d39a21736a471e4ed357c)

mgr: backward compatibility for module management

Follow up on 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6

The structure had even changed within `luminous` release.
It was first:

```
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "restful",
        "status"
    ],
    "disabled_modules": [
        "influx",
        "localpool",
        "prometheus",
        "selftest",
        "zabbix"
    ]
}
```
Then it changed for:

```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      "balancer",
      "dashboard",
      "influx",
      "localpool",
      "prometheus",
      "restful",
      "selftest",
      "zabbix"
  ]
}
```

and finally:
```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      {
          "name": "balancer",
          "can_run": true,
          "error_string": ""
      },
      {
          "name": "dashboard",
          "can_run": true,
          "error_string": ""
      }
  ]
}
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 36942af6983d60666f3f8a1a06b352a440a6c0da)

tests: resync iscsigw group name with master

let's align the name of that group in stable-3.1 with master branch.

Not having the same group name on different branches is confusing and
make some nightlies job failing in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix a typo in testinfra for iscsigws and jewel scenario

group name for iscsi-gw nodes in testing is `iscsi-gws`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: generate device list for osd_auto_discovery on rolling_update

rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e84f11e99ef42057cd1c3fbfab41ef66cda27302)

rolling_update: add role ceph-iscsi-gw

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e91648a7afab88e84aea64c6bb7627580d420466)

mon: fix calamari initialisation

If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c9e24a90fca2271978a066e38dfadead88d8167)

rgw: remove useless condition

The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 5a89479abe759844eb59bac190105d9ba34ed0b1)

rgw: remove unused file

copy_configs.yml was not including and is a leftover so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3bce117de2165b8cf3e47805bd80871da7361001)

rgw: ability to use ceph-ansible vars into containers

Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.

Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4d64dd468696ed98e4c63d07d9f39216c6a7d3cb)

# Conflicts:
# roles/ceph-rgw/tasks/docker/main.yml

common: upgrade/install ceph-test deb first

When we deploy a Jewel cluster on Ubuntu with ceph_test: True, we're
unable to upgrade that cluster to Luminous.

"apt-get install ceph-common" fails to upgrade to luminous if a jewel ceph-test package is installed:

  Some packages could not be installed. This may mean that you have
  requested an impossible situation or if you are using the unstable
  distribution that some required packages have not yet been created
  or been moved out of Incoming.
  The following information may help to resolve the situation:

  The following packages have unmet dependencies:
   ceph-base : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed
   ceph-mon : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed

In ceph-ansible master, we resolve this whole class of problem by
installing all the packages in one operation (see
b338fafd90bbe489726b92d703bf4bc29d1caf6d).

For the stable-3.1 branch, take a less-invasive approach, and upgrade
ceph-test prior to any other package. This matches the approach I took
for RPMs in 3752cc6f38dbf476845e975e6448225c0e103ad6, before we had the
better solution in b338fafd90bbe489726b92d703bf4bc29d1caf6d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610997
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>

Allow mgr bootstrap keyring to be defined

In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213
Signed-off-by: Graeme Gillies <ggillies@akamai.com>
(cherry picked from commit a46025820d363dc3e91c380fd6b60fb6152b998b)

Resync rhcs_edits.txt

We were missing an option so let's add it back.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 19518656a7966470842743da3c1c3bda0fa8c0f8)

test: remove osd_crush_location from shrink scenarios

This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 50be3fd9e8c0944cdddbd88bc8287e65765b0c63)

test: follow up on osd_crush_location for containers

This was fixed by
https://github.com/ceph/ceph-ansible/commit/578aa5c2d54a680912e4e015b6fb3dbbc94d4fd0
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77d4023fbe2d8a57affb65ba05d1a987308a576e)

iscsigw: install ceph-iscsi-cli package

Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1164cdc002cccb9dc1c6f10fb6b4370eafda3c4b)

Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
(cherry picked from commit 52d9d406b107c4926b582905b3d442feabf1fafc)

defaults: backward compatibility with fqdn deployments

This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a6ff6bbf8a7b4ba4ab5236eca93325d8ee61b1b)

rolling_update: set osd sortbitwise

upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2f88210589cfa56a5fe0a5092f79ee6)

config: enforce socket name

This was introduced by
https://github.com/ceph/ceph/commit/59ee2e8d3b14511e8d07ef8325ac8ca96e051784
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ea9e60d48d6631631ac9294d4ef291f8d7a30d78)

tests: support update scenarios in test_rbd_mirror_is_up()

`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8281e50f12eb299e2b419befab41cb7a0f39de2)

igw: fix image removal during purge

We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.

This just fixes the bug by passing in the ceph conf info which has the
clustername to use.

This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d572a9a6020592607d98aa30cea75940428506b0)

igw: do not fail purge on rbd removal errors

Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 6f72f96dadb7b38f065ccef3f0618a2897f8465f)

osd: do not remove expose_partition container

The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2ca8c519066555e06a261d5dee3fb46ce5daad0b)

ceph-osds: backward compatibility with jewel for osp pools creation

If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 053709da9730a826976f56a9c2c5e08e47722624)

rbd-mirror: bring back compatibility with jewel deployment

rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1ecbbbdcfa928a3ee7381b0dc2dcf0c460dfb549)

iscsigw: do not run common roles when deploying jewel

Let's not deploy common roles when iscsigw nodes for jewel deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1ca2c8fd373527b58b68b81d0f6966b1d83adfa)

tests: leave an OSD node in default crush root

jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 578aa5c2d54a680912e4e015b6fb3dbbc94d4fd0)

ceph ansible 3.1 igw: fix rbd-target-gw startup

The problem is rbd-target-gw needs the rbd pool to be created, keyring
to be copied over, and the iscsi-gateway.cfg to be setup before starting
the rbd-target-gw service.

In the master branch this is fixed by this commit:

    commit 91bf53ee932a6748c464bea762f8fb6f07f11347
    Author: Sébastien Han <seb@redhat.com>
    Date:   Fri Mar 23 11:24:56 2018 +0800

        ceph-iscsi: support for containerize deployment

where the needed setup tasks are done in common.yml which is done
before prerequisites.yml.

To avoid porting all those changes to 3.1 this patch just moves the
rbd-target-gw startup to configure_iscsi.yml after everything has
been setup.

This fixes red hat bz:

https://bugzilla.redhat.com/show_bug.cgi?id=1601325

Signed-off-by: Mike Christie <mchristi@redhat.com>

rgw: add more config option for civetweb frontend

In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e2ea5bac5111c7633640b66f2570dc83893bae7a)

Run creation of empty rados index object to first monitor

When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit e85e5ea781e7ae251b277c1fbca83877e3ebfd82)

tests: add mimic support in stable-3.1

Add mimic support in stable-3.1 branch so we can test it in nightlies CI
jobs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: do not deploy all daemons for shrink osds scenarios

Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b89cc1746f8652b67d95410ed80473d1a2c3d312)

shrink-osd: purge osd on containerized deployment

Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ce1dd8d2b3986d3bd08b4d73efd88b4de72fcc00)

tests: stop hardcoding ansible version

In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add latest-bis-jewel for jewel tests

since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 05852b03013d15f6f400fe6728c24b11f22b75de)

nfs: change default stable branch for nfs-ganesha repo

Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a626d3c615cb23216bff9281737334122e18b80)

client: do not rely on copy_admin_key to import keys

Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ef5fcd0b64ed1a0fe4ffb1750984d29599839a4)

mgr: fix condition to add modules to ceph-mgr

Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ce5ac930c5b91621a46fc69ddd0dcafb2a24947d)

tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined

since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1c3dae4a90816ac6503967779b7fd77ff84900b5)

tests: skip tests for node iscsi-gw when deploying jewel

CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d560b562a2d439fbed34b3066de93dfdff650ff)

tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d24086fc61ae84a59bd61a053cddb62941)

tests: fix broken test when collocated daemons scenarios

At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.

Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`

Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d83b24d27121591dafcba8297288c6f3a7ede42e)

tests: fix `_get_osd_id_from_host()` in TestOSDs()

We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a65ec231d7f76655caa964e1d9228ba7a910dea)

tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d24086fc61ae84a59bd61a053cddb62941)

tests: factorize docker tests using docker_exec_cmd logic

avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2e57a56db2801818135ba85479fedfc00eae30c)

tests: add mimic support for test_rbd_mirror_is_up()

prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

(cherry picked from commit 09d795b5b737a05164772f5e3ba469577d605344)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: switch from docker module to docker_container

As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0746e08586556b7d44ba4b20cd553a27860f30b)

mon: ensure socker is purged when mon is stopped

On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f54b3b4a7c1a7015def3a7987e3e9e426385251)

ceph-config: do not log cluster log on container

The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 713b9fcf9b825ba84b07781d05c967238ee96c14)

ceph-common: fix rhcs condition

We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fcf11ecc3567398f92b9f91e1a0749edb921131f)

mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6)

ceph-client: do not kill the dummy container

The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 63658c05c7609894f9e2b72c83ba985ea65b97ec)

ceph-defaults: add default application to pool

We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 103c279c218ae654ec9ced29c1aef54eb8b59990)

ceph-mds: enable application pool

We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a6294089679729b12022052381d1aa6cf9978df7)

Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
(cherry picked from commit 1d454b611f9ec5403a474fcb45a6333ca6d36715)

ceph-mon: Generate initial keyring

Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
(cherry picked from commit a7b7735b6fd23985d24a492f1bf4c5be7f1961b2)

systemd: remove changed_when: false

When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6239972716b013bafcb9313c9c83723615aa7d6)

# Conflicts:
# roles/ceph-iscsi-gw/tasks/container/containerized.yml

ceph-osd: trigger osd container restart on script change

The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit abdb53e16a7f46ceebbf4de65ed1add04da0d543)

tests: reduce the amount of time we wait

This sleep 120 looks a bit long, let's reduce this to 30sec and see if
things go faster.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 081600842ff3758910109d9f636b54cb12a85ed9)

mon: honour mon_docker_net_host option

--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 322e2de7d27c498a9d89b31b96729200bed56e19)

tests: add more nodes in ooo testing scenario

adding more node in this scenario could help to have a better coverage
so we can catch more potential bugs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 481c14455aa3bcd05cc1c190458148a4d516e991)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix *_has_correct_value tests

It might happen that the list of ips/hosts in following line (ceph.conf)
- `mon initial memebers = <hosts>`
- `mon host = <ips>`

are not ordered the same way depending on deployment.

This patch makes the tests looking for each ip or hostname in respective
lines.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f68936ca7e7f556c8d8cee8b2c4565a3c94f72f9)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: remove duplicate include of configure_firewall_rpm.yml

408ef69 has introduced a duplicated task, something went wrong with the
backport from 24ef47b (probably a conflict merge hasn't been solved properly).

It's better now to commit directly in stable-3.1 to definitely solve
this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: start firewalld if configure_firewall

Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bea4027f0c2b5beebf409a00e7b61e923f4fea0c)
Signed-off-by: Sébastien Han <seb@redhat.com>

mon/osd: bump container memory limit

As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9ed3579ae680949b4f53aee94003ca50d1ae721)
Signed-off-by: Sébastien Han <seb@redhat.com>

tests: keep same ceph release during handlers/idempotency test

since `latest` points to `mimic`, we need to force the test to keep the
same ceph release when testing anything else than `mimic`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 21894655a7eaf0db6c99a46955e0e8ebc59a83af)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-mds: do not enable multimds on jewel

Multiple active MDS became stable in Luminous.

Introduced-by: c8573fe0d745e4667b5d757433efec9dac0150bc
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 9ce81ae845510cb70eee76ceef4412aa8155713a)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

client: try to kill dummy container only on first client node

The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51cf3b7fa0211fbbfcbd8c4228dcd39d20f02e54)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.

If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
(cherry picked from commit 3a07568496f718ffe44077eac23d7397a17b3b09)
Signed-off-by: Sébastien Han <seb@redhat.com>

common: ability to enable/disable fw configuration

Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e8412734a81077c2b11de316e08bbecbed2de96)

tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous

Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a351b0872620e3bfca1a2ef5fb5235f35002b160)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: increase memory to 1024Mb for centos7_cluster scenario

we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bbb869133563c3b0ddd0388727b894306b6b8b26)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

client: keyrings aren't created when single client node

combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 090ecff94e341a674556c0f4f578caa73330a0f0)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: update ooo inventory hostfile

Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 28d21b4e9c13566855844e9d5da71e2c4ef80894)

client: add a default value for keyring file

Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a653cacd56553926126d0b43d328af94bbd0337)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

client: use dummy created container when there is no mon in inventory

the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b156deb67a9e137962161829e008bcc32835fe8)
Signed-off-by: Sébastien Han <seb@redhat.com>