git-server-git.apps.pok.os.sepia.ceph.com Git

run rados cmd in container if containerized deployment

When ceph-nfs is deployed containerized and ceph-common is not
installed on the host the start_nfs task fails because the rados
command is missing on the host.

Run rados commands from a ceph container instead so that
they will succeed.

Signed-off-by: Tom Barron <tpb@dyncloud.net>
(cherry picked from commit bf8f589958450ce07ec19d01fb98176ab50ab71f)

roles: ceph-rgw: Enable the ceph-radosgw target

If the ceph-radosgw target is not enabled, then enabling the
ceph-radosgw@ service has no effect since nothing will pull
it on the next reboot. As such, we need to ensure that the
target is enabled.

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit 217f35dbdb5036274be4674e9b0be2127b8875d7)

Dont run client dummy container on non-x86_64 hosts

The dummy client container currently wont work on non-x86_64 hosts.
This PR creates a filtered client group that contains only hosts
that are x86_64 - which can then be the group to run the
dummy container against.

This is for the specific case of a containerized_deployment where
there is a mixture of non-x86_64 hosts and x86_64 hosts. As such
the filtered group will contain all hosts when running with
containerized_deployment: false.

Currently ppc64le is not supported for Ceph server components.

Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>
(cherry picked from commit 772e6b9be20ce82d3b8f9ffdf6b7bc4f6be842b8)

doc: remove old statement

We have been supporting multiple devices for journalin containerized
deployments for a while now and forgot about this.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622393
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 124fc727f472551ab2a14a8d6b9d3d54159a1b08)

remove warning for unsupported variables

As promised, these will go unsupported for 3.1 so let's actually remove
them :).

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622729
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 9ba670567e97b7ad16e6f623ae99a5ad3ee6d880)

sites: fix conditonnal

Same problem again... ceph_release_num[ceph_release] is only set in
ceph-docker-common/common roles so putting the condition on that role
will never work. Removing the condition.

The downside of this is we will be installing packages and then skip the
role on the node.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1622210
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ae5ebeeb00214d9ea27929b4670c6de4ad27d829)

site-docker.yml: remove useless condition

If we play site-docker.yml, we are already in a
containerized_deployment. So the condition is not needed.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 30cfeb5427535cd8dc98370ee33205be3b67bde0)

ci: stop using different images on the same run

There is no point of using hosts running on atomic AND centos hosts. So
let's run containerized scenarios on Atomic only.

This solves this error here:

```
fatal: [client2]: FAILED! => {
"failed": true
}

MSG:

The conditional check 'ceph_current_status.rc == 0' failed. The error was: error while evaluating conditional (ceph_current_status.rc == 0): 'dict object' has no attribute 'rc'

The error appears to have been in '/home/jenkins-build/build/workspace/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/roles/ceph-defaults/tasks/facts.yml': line 74, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: set_fact ceph_current_status (convert to json)
^ here
```

From https://2.jenkins.ceph.com/view/ceph-ansible-stable3.1/job/ceph-ansible-nightly-luminous-stable-3.1-ooo_collocation/37/consoleFull#1765217701b5dd38fa-a56e-4233-a5ca-584604e56e3a

What's happening here is all the hosts excepts the clients are running atomic, so here: https://github.com/ceph/ceph-ansible/blob/master/site-docker.yml.sample#L62
The condition will skipped all the nodes excepts the clients, thus when running ceph-default, the task "is ceph running already?" is skipped but the task above needs the rc of the skipped task.
This is not an error from the playbook, it's a CI setup issue.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 7012835d2b1880e7a6ef9a224df456b2dd1024cc)

release-note: stable-3.1

stable-3.1 is approaching, so let's write our first release note.

Signed-off-by: Sébastien Han <seb@redhat.com>

defaults: fix rgw_hostname

A couple if things were wrong in the initial commit:

* ceph_release_num[ceph_release] >= ceph_release_num['luminous'] will
never work since the ceph_release fact is set in the roles after. So
either ceph-common or ceph-docker-common set it

* we can easily re-use the initial command to check if a cluster is
running, it's more elegant than running it twice.

* set the fact rgw_hostname on rgw nodes only

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1618678
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 6d7fa99ff74b3ec25d1a6010b1ddb25e00c123be)

rolling_upgrade: set sortbitwise properly

Running 'osd set sortbitwise' when we detect a version 12 of Ceph is
wrong. When OSD are getting updated, even though the package is updated
they won't send their updated version (12) and will stick with 10 if the
command is not applied. So we have to check if OSD are sending a version
10 and then run the command to unlock the OSDs.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e6e885bb75156c74735a65c05b4757b031041bb)

iscsi group name preserve backward compatibility

Recently we renamed the group_name for iscsi iscsigws where previously
it was named iscsi-gws. Existing deployments with a host file section
with iscsi-gws must continue to work.

This commit adds the old group name as a backoward compatility, no error
from Ansible should be expected, if the hostgroup is not found nothing
is played.

Close: https://bugzilla.redhat.com/show_bug.cgi?id=1619167
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77a3a682f358c8e9a40c5b50e980b5e9ec5f6d60)

osd: fix ceph_release

We need ceph_release in the condition, not ceph_stable_release

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1619255
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c70a5b1975c31992cdfa0a46a04bd9afbc1a806)

take-over-existing-cluster: do not call var_files

We were using var_files long ago when default variables were not in
ceph-defaults, now the role exists this is not need. Moreover having
these two var files added:

- roles/ceph-defaults/defaults/main.yml
- group_vars/all.yml

Will create collision and override necessary variables.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1555305
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b7387068109a521796f8e423a61449541043f4e6)

roles: ceph-defaults: Delegate cluster information task to monitor node

Since commit f422efb1d6b56ce56a7d39a21736a471e4ed357 ("config: ensure
rgw section has the correct name") we observe the following failures in
new Ceph deployment with OpenStack-Ansible

fatal: [aio1_ceph-rgw_container-fc588f0a]: FAILED! => {"changed": false,
"cmd": "ceph --cluster ceph -s -f json", "msg": "[Errno 2] No such file
or directory"

This is because the task executes 'ceph' but at this point no package
installation has happened. Packages are normally installed in the
'ceph-common' role which runs after the 'ceph-defaults' one.

Since we are looking to obtain cluster information, the task should be
delegated to a monitor node similar to other tasks in that role

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit 37e50114dedf6a7aec0f1b2e1b9d2dd997a11d8e)

roles: ceph-defaults: Check if 'rgw' attribute exists for rgw_hostname

If there are no services on the cluster, then the 'rgw' could be missing
and the task is failing with the following problem:

msg": "The task includes an option with an undefined variable.
The error was: 'dict object' has no attribute 'rgw'

We fix this by checking the existence of the 'rgw' attribute. If it's
missing, we skip the task since the role already contains code to set
a good default rgw_hostname.

Signed-off-by: Markos Chandras <mchandras@suse.de>
(cherry picked from commit 126e2e3f92475a17f9a04e1e792ee6eb69fbfab0)

mgr: improve/fix disabled modules check

Follow up on 36942af6983d60666f3f8a1a06b352a440a6c0da

"disabled_modules" is always a list, it's the items in the list that
can be dicts in mimic. Many ways to fix this, here's one.

Signed-off-by: Dardo D Kleiner <dardokleiner@gmail.com>
(cherry picked from commit f6519e4003404e10ae1f5e86298cffd4405591da)

lv-create: use copy instead of the template module

The copy module does in fact do variable interpolation so we do not need
to use the template module or keep a template in the source.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 04df3f0802c0bc903172314d05a38e869f0eee6a)
Signed-off-by: Sébastien Han <seb@redhat.com>

tests: cat the contents of lv-create.log in infra_lv_create

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit f5a4c8986982f277f6fd5bcd5b28c6099f655d79)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: add an example logfile_path config option in lv_vars.yml

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 131796f2750f1209a019ae75a500e6f1a1ab37f8)
Signed-off-by: Sébastien Han <seb@redhat.com>

tests: adds a testing scenario for lv-create and lv-teardown

Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 810cc47892e53701485c540ff51c00c860ea0a00)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-teardown: fail silently if lv_vars.yml is not found

This allows user to opt out of using lv_vars.yml and load configuration
from other sources.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit b0bfc173510ec7d5da715c0048e633a8fe3d2a4d)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-teardown: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 8424858b40bafebe3569b33279e4d8d824e2276b)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: fail silenty if lv_vars.yml is not found

If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit e43eec57bb44bf5f7a10da8548ca22a8772c2d92)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit fde47be13cc753218153b3dbc0db5a4daa752b21)
Signed-off-by: Sébastien Han <seb@redhat.com>

lv-create: use the template module to write log file

The copy module will not expand the template and render the variables
included, so we must use template.

Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 35301b35af4e71713edb944eb54654b587710527)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/vars/lv_vars.yaml: minor fixes

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 909b38da829485b2ec56b61bf2b2fa0df02b0ed4)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: use tempfile to create logfile

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit f65f3ea89fdba98057172e32e1a43ee6370c04d9)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 65fdad072386698ab9b6f7962107b6000f1f8378)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: copy without using a template file

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 50a6d8141ced888f9c1ce5e90f9461e6a101d5bc)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks/lv-create.yml: don't use action to copy

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 186c4e11c7832eeb676409171d401a0ba2864f2a)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks: standardize variable usage with a space after brackets

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 9d43806df9fe44496dd48c2b7d3bef2e59d92365)
Signed-off-by: Sébastien Han <seb@redhat.com>

vars/lv_vars.yaml: remove journal_device

Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit e0293de3e72e11f3aeba2d84b24bba1f7839ab57)
Signed-off-by: Sébastien Han <seb@redhat.com>

infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals

These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.

Users should follow the guidelines set in var/lv_vars.yaml

After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit 1f018d861267a2bde7d8f2179d47d673bfcfb13f)
Signed-off-by: Sébastien Han <seb@redhat.com>

Revert "osd: generate device list for osd_auto_discovery on rolling_update"

This reverts commit e84f11e99ef42057cd1c3fbfab41ef66cda27302.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3149b2564fb89f2352820d83be02c09f658bdf60)

rolling_update: register container osd units

Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit dad10e8f3f67f1e0c6a14ef3e0b1f51f90d9d962)

contrib: fix generate group_vars samples

For ceph-iscsi-gw and ceph-rbd-mirror roles the group_name are named
differently (by default) than the role name so we have to change the
script to generate the correct name.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602327
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 315ab08b1604e655eee4b493eb2c1171a67df506)

Use /var/lib/ceph/osd folder to filter osd mount point

In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.

Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>
(cherry picked from commit 85cc61a6d9f23cc98a817ea988c8b50e6c55698f)

stable 3.1 igw: add api setting support

Port the parts of this upstream commit:

commit 91bf53ee932a6748c464bea762f8fb6f07f11347
Author: Sébastien Han <seb@redhat.com>
Date: Fri Mar 23 11:24:56 2018 +0800

ceph-iscsi: support for containerize deployment

that allows configuration of
API settings in roles/ceph-iscsi-gw/templates/iscsi-gateway.cfg.j2
using the iscsi-gws.yml.

This fixes Red Hat BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1613963

Signed-off-by: Mike Christie <mchristi@redhat.com>

stable 3.1 igw: enable and start rbd-target-api

Backport
https://github.com/ceph/ceph-ansible/pull/2984
to stable 3.1.

From upstream commit:

commit 1164cdc002cccb9dc1c6f10fb6b4370eafda3c4b
Author: Guillaume Abrioux <gabrioux@redhat.com>
Date: Thu Aug 2 11:58:47 2018 +0200

iscsigw: install ceph-iscsi-cli package

installs the cli package but does not start and enable the
rbd-target-api daemon needed for gwcli to communicate with the igw
nodes. This just enables and starts it.

This fixes Red Hat BZ
https://bugzilla.redhat.com/show_bug.cgi?id=1613963.

Signed-off-by: Mike Christie <mchristi@redhat.com>

group_vars: resync missing options

resync group_vars file with the defaults/main.yml files.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2dd75a1e6ed6d8b323ef647cbb5cf1d2a2f90154)

fail if fqdn deployment attempted

fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

config: ensure rgw section has the correct name

the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.

We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f422efb1d6b56ce56a7d39a21736a471e4ed357c)

mgr: backward compatibility for module management

Follow up on 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6

The structure had even changed within `luminous` release.
It was first:

```
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "restful",
        "status"
    ],
    "disabled_modules": [
        "influx",
        "localpool",
        "prometheus",
        "selftest",
        "zabbix"
    ]
}
```
Then it changed for:

```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      "balancer",
      "dashboard",
      "influx",
      "localpool",
      "prometheus",
      "restful",
      "selftest",
      "zabbix"
  ]
}
```

and finally:
```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      {
          "name": "balancer",
          "can_run": true,
          "error_string": ""
      },
      {
          "name": "dashboard",
          "can_run": true,
          "error_string": ""
      }
  ]
}
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 36942af6983d60666f3f8a1a06b352a440a6c0da)

tests: resync iscsigw group name with master

let's align the name of that group in stable-3.1 with master branch.

Not having the same group name on different branches is confusing and
make some nightlies job failing in the CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: fix a typo in testinfra for iscsigws and jewel scenario

group name for iscsi-gw nodes in testing is `iscsi-gws`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

osd: generate device list for osd_auto_discovery on rolling_update

rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e84f11e99ef42057cd1c3fbfab41ef66cda27302)

rolling_update: add role ceph-iscsi-gw

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575829
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e91648a7afab88e84aea64c6bb7627580d420466)

mon: fix calamari initialisation

If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4c9e24a90fca2271978a066e38dfadead88d8167)

rgw: remove useless condition

The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 5a89479abe759844eb59bac190105d9ba34ed0b1)

rgw: remove unused file

copy_configs.yml was not including and is a leftover so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3bce117de2165b8cf3e47805bd80871da7361001)

rgw: ability to use ceph-ansible vars into containers

Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.

Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4d64dd468696ed98e4c63d07d9f39216c6a7d3cb)

# Conflicts:
# roles/ceph-rgw/tasks/docker/main.yml

common: upgrade/install ceph-test deb first

When we deploy a Jewel cluster on Ubuntu with ceph_test: True, we're
unable to upgrade that cluster to Luminous.

"apt-get install ceph-common" fails to upgrade to luminous if a jewel ceph-test package is installed:

  Some packages could not be installed. This may mean that you have
  requested an impossible situation or if you are using the unstable
  distribution that some required packages have not yet been created
  or been moved out of Incoming.
  The following information may help to resolve the situation:

  The following packages have unmet dependencies:
   ceph-base : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed
   ceph-mon : Breaks: ceph-test (< 12.2.2-14) but 10.2.11-1xenial is to be installed

In ceph-ansible master, we resolve this whole class of problem by
installing all the packages in one operation (see
b338fafd90bbe489726b92d703bf4bc29d1caf6d).

For the stable-3.1 branch, take a less-invasive approach, and upgrade
ceph-test prior to any other package. This matches the approach I took
for RPMs in 3752cc6f38dbf476845e975e6448225c0e103ad6, before we had the
better solution in b338fafd90bbe489726b92d703bf4bc29d1caf6d.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610997
Signed-off-by: Ken Dreyer <kdreyer@redhat.com>

Allow mgr bootstrap keyring to be defined

In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213
Signed-off-by: Graeme Gillies <ggillies@akamai.com>
(cherry picked from commit a46025820d363dc3e91c380fd6b60fb6152b998b)

Resync rhcs_edits.txt

We were missing an option so let's add it back.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 19518656a7966470842743da3c1c3bda0fa8c0f8)

test: remove osd_crush_location from shrink scenarios

This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 50be3fd9e8c0944cdddbd88bc8287e65765b0c63)

test: follow up on osd_crush_location for containers

This was fixed by
https://github.com/ceph/ceph-ansible/commit/578aa5c2d54a680912e4e015b6fb3dbbc94d4fd0
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 77d4023fbe2d8a57affb65ba05d1a987308a576e)

iscsigw: install ceph-iscsi-cli package

Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1164cdc002cccb9dc1c6f10fb6b4370eafda3c4b)

Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>
(cherry picked from commit 52d9d406b107c4926b582905b3d442feabf1fafc)

defaults: backward compatibility with fqdn deployments

This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0a6ff6bbf8a7b4ba4ab5236eca93325d8ee61b1b)

rolling_update: set osd sortbitwise

upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2f88210589cfa56a5fe0a5092f79ee6)

config: enforce socket name

This was introduced by
https://github.com/ceph/ceph/commit/59ee2e8d3b14511e8d07ef8325ac8ca96e051784
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ea9e60d48d6631631ac9294d4ef291f8d7a30d78)

tests: support update scenarios in test_rbd_mirror_is_up()

`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d8281e50f12eb299e2b419befab41cb7a0f39de2)

igw: fix image removal during purge

We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.

This just fixes the bug by passing in the ceph conf info which has the
clustername to use.

This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit d572a9a6020592607d98aa30cea75940428506b0)

igw: do not fail purge on rbd removal errors

Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 6f72f96dadb7b38f065ccef3f0618a2897f8465f)

osd: do not remove expose_partition container

The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2ca8c519066555e06a261d5dee3fb46ce5daad0b)

ceph-osds: backward compatibility with jewel for osp pools creation

If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 053709da9730a826976f56a9c2c5e08e47722624)

rbd-mirror: bring back compatibility with jewel deployment

rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1ecbbbdcfa928a3ee7381b0dc2dcf0c460dfb549)

iscsigw: do not run common roles when deploying jewel

Let's not deploy common roles when iscsigw nodes for jewel deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a1ca2c8fd373527b58b68b81d0f6966b1d83adfa)

tests: leave an OSD node in default crush root

jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 578aa5c2d54a680912e4e015b6fb3dbbc94d4fd0)

ceph ansible 3.1 igw: fix rbd-target-gw startup

The problem is rbd-target-gw needs the rbd pool to be created, keyring
to be copied over, and the iscsi-gateway.cfg to be setup before starting
the rbd-target-gw service.

In the master branch this is fixed by this commit:

    commit 91bf53ee932a6748c464bea762f8fb6f07f11347
    Author: Sébastien Han <seb@redhat.com>
    Date:   Fri Mar 23 11:24:56 2018 +0800

        ceph-iscsi: support for containerize deployment

where the needed setup tasks are done in common.yml which is done
before prerequisites.yml.

To avoid porting all those changes to 3.1 this patch just moves the
rbd-target-gw startup to configure_iscsi.yml after everything has
been setup.

This fixes red hat bz:

https://bugzilla.redhat.com/show_bug.cgi?id=1601325

Signed-off-by: Mike Christie <mchristi@redhat.com>

rgw: add more config option for civetweb frontend

In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit e2ea5bac5111c7633640b66f2570dc83893bae7a)

Run creation of empty rados index object to first monitor

When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>
(cherry picked from commit e85e5ea781e7ae251b277c1fbca83877e3ebfd82)

tests: add mimic support in stable-3.1

Add mimic support in stable-3.1 branch so we can test it in nightlies CI
jobs.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: do not deploy all daemons for shrink osds scenarios

Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b89cc1746f8652b67d95410ed80473d1a2c3d312)

shrink-osd: purge osd on containerized deployment

Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ce1dd8d2b3986d3bd08b4d73efd88b4de72fcc00)

tests: stop hardcoding ansible version

In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

tests: add latest-bis-jewel for jewel tests

since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 05852b03013d15f6f400fe6728c24b11f22b75de)

nfs: change default stable branch for nfs-ganesha repo

Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1a626d3c615cb23216bff9281737334122e18b80)

client: do not rely on copy_admin_key to import keys

Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ef5fcd0b64ed1a0fe4ffb1750984d29599839a4)

mgr: fix condition to add modules to ceph-mgr

Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ce5ac930c5b91621a46fc69ddd0dcafb2a24947d)

tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined

since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```

skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1c3dae4a90816ac6503967779b7fd77ff84900b5)

tests: skip tests for node iscsi-gw when deploying jewel

CI is deploying a iscsigw node anyway but its not deployed let's skip
test accordingly

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2d560b562a2d439fbed34b3066de93dfdff650ff)

tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d24086fc61ae84a59bd61a053cddb62941)

tests: fix broken test when collocated daemons scenarios

At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.

Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`

Co-Authored-by: Alfredo Deza <adeza@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d83b24d27121591dafcba8297288c6f3a7ede42e)

tests: fix `_get_osd_id_from_host()` in TestOSDs()

We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9a65ec231d7f76655caa964e1d9228ba7a910dea)

tests: refact test_all_*_osds_are_up_and_in

these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.

The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fe79a5d24086fc61ae84a59bd61a053cddb62941)

tests: factorize docker tests using docker_exec_cmd logic

avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.

The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f2e57a56db2801818135ba85479fedfc00eae30c)

tests: add mimic support for test_rbd_mirror_is_up()

prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

(cherry picked from commit 09d795b5b737a05164772f5e3ba469577d605344)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

common: switch from docker module to docker_container

As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d0746e08586556b7d44ba4b20cd553a27860f30b)

mon: ensure socker is purged when mon is stopped

On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9f54b3b4a7c1a7015def3a7987e3e9e426385251)

ceph-config: do not log cluster log on container

The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 713b9fcf9b825ba84b07781d05c967238ee96c14)

ceph-common: fix rhcs condition

We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit fcf11ecc3567398f92b9f91e1a0749edb921131f)

mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6)

ceph-client: do not kill the dummy container

The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 63658c05c7609894f9e2b72c83ba985ea65b97ec)

ceph-defaults: add default application to pool

We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 103c279c218ae654ec9ced29c1aef54eb8b59990)

ceph-mds: enable application pool

We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a6294089679729b12022052381d1aa6cf9978df7)

Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
(cherry picked from commit 1d454b611f9ec5403a474fcb45a6333ca6d36715)

ceph-mon: Generate initial keyring

Minor fix so that initial keyring can be generated using python3.

Signed-off-by: Ha Phan <thanhha.work@gmail.com>
(cherry picked from commit a7b7735b6fd23985d24a492f1bf4c5be7f1961b2)

systemd: remove changed_when: false

When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6239972716b013bafcb9313c9c83723615aa7d6)

# Conflicts:
# roles/ceph-iscsi-gw/tasks/container/containerized.yml