git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log

Andrew Schoen [Wed, 8 Aug 2018 22:12:30 +0000 (17:12 -0500)]

tests: adds a testing scenario for lv-create and lv-teardown

Using an explicitly named testing environment name allows us to have a
specific [testenv] block for this test. This greatly simplifies how it will
work as it doesn't really anything from the ceph cluster tests.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 8 Aug 2018 22:04:29 +0000 (17:04 -0500)]

lv-teardown: fail silently if lv_vars.yml is not found

This allows user to opt out of using lv_vars.yml and load configuration
from other sources.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 8 Aug 2018 22:04:07 +0000 (17:04 -0500)]

lv-teardown: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 8 Aug 2018 21:49:34 +0000 (16:49 -0500)]

lv-create: fail silenty if lv_vars.yml is not found

If a user decides to to use the lv_vars.yml file then it should fail
silenty so that configuration can be picked up from other places.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 8 Aug 2018 21:48:42 +0000 (16:48 -0500)]

lv-create: set become: true at the playbook level

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Wed, 8 Aug 2018 21:43:55 +0000 (16:43 -0500)]

lv-create: use the template module to write log file

The copy module will not expand the template and render the variables
included, so we must use template.

Creating a temp file and using it locally means that you must run the
playbook with sudo privledges, which I don't think we want to require.
This introduces a logfile_path variable that the user can use to control
where the logfile is written to, defaulting to the cwd.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Neha Ojha [Tue, 7 Aug 2018 20:08:38 +0000 (20:08 +0000)]

infrastructure-playbooks/vars/lv_vars.yaml: minor fixes

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Tue, 7 Aug 2018 16:54:29 +0000 (16:54 +0000)]

infrastructure-playbooks/lv-create.yml: use tempfile to create logfile

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Mon, 6 Aug 2018 18:14:37 +0000 (18:14 +0000)]

infrastructure-playbooks/lv-create.yml: add lvm_volumes to suggested paste

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Mon, 6 Aug 2018 17:40:58 +0000 (17:40 +0000)]

infrastructure-playbooks/lv-create.yml: copy without using a template file

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Fri, 3 Aug 2018 20:32:58 +0000 (20:32 +0000)]

infrastructure-playbooks/lv-create.yml: don't use action to copy

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Fri, 3 Aug 2018 20:08:31 +0000 (20:08 +0000)]

infrastructure-playbooks: standardize variable usage with a space after brackets

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Neha Ojha [Fri, 3 Aug 2018 19:17:13 +0000 (19:17 +0000)]

vars/lv_vars.yaml: remove journal_device

Signed-off-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Ali Maredia [Tue, 24 Jul 2018 13:33:09 +0000 (13:33 +0000)]

infrastructure-playbooks: playbooks for creating LVs for bucket indexes and journals

These playbooks create and tear down logical
volumes for OSD data on HDDs and for a bucket index and
journals on 1 NVMe device.

Users should follow the guidelines set in var/lv_vars.yaml

After the lv-create.yml playbook is run, output is
sent to /tmp/logfile.txt for copy and paste into
osds.yml

Signed-off-by: Ali Maredia <amaredia@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 13 Aug 2018 13:59:25 +0000 (15:59 +0200)]

rolling_update: register container osd units

Before running the upgrade, let's call systemd to collect unit names
instead of relaying on the device list. This is more accurate and fix
the osd_auto_discovery scenario too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 13 Aug 2018 13:54:37 +0000 (15:54 +0200)]

Revert "osd: generate device list for osd_auto_discovery on rolling_update"

This reverts commit e84f11e99ef42057cd1c3fbfab41ef66cda27302.

This commit was giving a new failure later during the rolling_update
process. Basically, this was modifying the list of devices and started
impacting the ceph-osd itself. The modification to accomodate the
osd_auto_discovery parameter should happen outside of the ceph-osd.

Also we are trying to not play ceph-osd role during the rolling_update
process so we can speed up the upgrade.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Jeffrey Zhang [Mon, 13 Aug 2018 05:23:48 +0000 (13:23 +0800)]

Use /var/lib/ceph/osd folder to filter osd mount point

In some case, use may mount a partition to /var/lib/ceph, and umount
it will be failure and no need to do so too.

Signed-off-by: Jeffrey Zhang <zhang.lei.fly@gmail.com>

commit | commitdiff | tree

Markos Chandras [Mon, 13 Aug 2018 12:05:05 +0000 (15:05 +0300)]

roles: ceph-defaults: Set ceph_uid on SUSE distributions

The ceph_uid is also '167' on SUSE systems so extend the existing task.

Signed-off-by: Markos Chandras <mchandras@suse.de>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 7 Aug 2018 12:46:07 +0000 (14:46 +0200)]

mgr: backward compatibility for module management

Follow up on 3abc253fecc91f29c90e23ae95e1b83f8ffd3de6

The structure had even changed within `luminous` release.
It was first:

```
{
    "enabled_modules": [
        "balancer",
        "dashboard",
        "restful",
        "status"
    ],
    "disabled_modules": [
        "influx",
        "localpool",
        "prometheus",
        "selftest",
        "zabbix"
    ]
}
```
Then it changed for:

```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      "balancer",
      "dashboard",
      "influx",
      "localpool",
      "prometheus",
      "restful",
      "selftest",
      "zabbix"
  ]
}
```

and finally:
```
{
  "enabled_modules": [
      "status"
  ],
  "disabled_modules": [
      {
          "name": "balancer",
          "can_run": true,
          "error_string": ""
      },
      {
          "name": "dashboard",
          "can_run": true,
          "error_string": ""
      }
  ]
}
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 9 Aug 2018 09:23:07 +0000 (11:23 +0200)]

validate: fail if fqdn deployment attempted

fqdn configuration possibility caused a lot of trouble, it's adding a
lot of complexity because of multiple cases and the relation between
ceph-ansible and ceph-container. Moreover, there is no benefit for such
a feature.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 9 Aug 2018 09:03:32 +0000 (11:03 +0200)]

config: ensure rgw section has the correct name

the ceph.conf.j2 always assumes the hostname used to register the
radosgw in the servicemap is equivalent to `{{ ansible_hostname }}`
which returns the shortname form.

We need to detect which form of the hostname was used in case of already
deployed cluster and update the ceph.conf accordingly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1580408
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 9 Aug 2018 08:44:34 +0000 (10:44 +0200)]

config: clean template, remove useless conditions

there is no need to have all these conditions.

for instance, assuming `mds_group_name` is set to 'mdss':

  - `if groups[mds_group_name] is defined` checks if `'mdss'` is present in `{{ groups }}`

  - `if {{ mds_group_name }} in group_names` checks if the current node is part
  the group `'mdss'`

  - `if inventory_hostname in groups.get(mds_group_name, [])` checks if
  the current node is part of the group 'mdss'

The third condition is enough to cover the need of ensuring we are
running on a mds node.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 9 Aug 2018 09:29:13 +0000 (11:29 +0200)]

doc: update ansible supported version matrix.

Closes: #2989
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 10 Aug 2018 09:08:14 +0000 (11:08 +0200)]

mon: fix calamari initialisation

If calamari is already installed and ceph has been upgraded to a higher
version the initialisation will fail later. So if we detect the
calamari-server is too old compare to ceph_rhcs_version we try to update
it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1601755
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Thu, 9 Aug 2018 15:40:16 +0000 (10:40 -0500)]

lvm: fix condition when selecting which scenario to run

devices and lvm_volumes will always be defined, so we need to instead
check it's length before deciding to run the scenario.

This fixes the failure here:
https://2.jenkins.ceph.com/job/ceph-ansible-prs-luminous-bluestore_lvm_osds/86/consoleFull#1667273050b5dd38fa-a56e-4233-a5ca-584604e56e3a

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 9 Aug 2018 13:18:34 +0000 (15:18 +0200)]

osd: generate device list for osd_auto_discovery on rolling_update

rolling_update relies on the list of devices when performing the restart
of the OSDs. The task that is builind the devices list out of the
ansible_devices dict only runs when there are no partitions on the
drives. However during an upgrade the OSD are already configured, they
have been prepared and have partitions so this task won't run and thus
the devices list will be empty, skipping the restart during
rolling_update. We now run the same task under different requirements
when rolling_update is true and build a list when:

* osd_auto_discovery is true
* rolling_update is true
* ansible_devices exists
* no dm/lv are part of the discovery
* the device is not removable
* the device has more than 1 sector

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1613626
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Tue, 7 Aug 2018 16:52:13 +0000 (11:52 -0500)]

updates group_vars/osds.yml.sample to inlude crush_device_class

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Tue, 7 Aug 2018 16:47:53 +0000 (11:47 -0500)]

ceph-volume: docs for "lvm batch" support

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Mon, 6 Aug 2018 21:20:07 +0000 (16:20 -0500)]

tests: adds crush_device_class to lvm-batch scenario

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Mon, 6 Aug 2018 20:14:53 +0000 (15:14 -0500)]

ceph-osd: adds crush_device_class config option

This is used with the lvm osd scenario. When using devices you need the
option to set the crush device class for all of the OSDs that are
created from those devices.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Andrew Schoen [Fri, 3 Aug 2018 16:15:58 +0000 (11:15 -0500)]

ceph-volume: implement the 'lvm batch' subcommand

This adds the action 'batch' to the ceph-volume module so that we can
run the new 'ceph-volume lvm batch' subcommand. A functional test is
also included.

If devices is defind and osd_scenario is lvm then the 'ceph-volume lvm
batch' command will be used to create the OSDs.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 27 Jul 2018 15:46:38 +0000 (17:46 +0200)]

rgw: ability to use ceph-ansible vars into containers

Since the container now simply reads the ceph.conf, we remove all the
unnecessary options.

Also this PR is the foundation to support multiple backend, such as the
new 'beast' from Ceph Mimic.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 7 Aug 2018 12:53:04 +0000 (14:53 +0200)]

rgw: remove unused file

copy_configs.yml was not including and is a leftover so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 7 Aug 2018 12:15:23 +0000 (14:15 +0200)]

rgw: remove useless condition

The include does not need a condition on containerized_deployment since
we are already in an include than has the same condition.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Graeme Gillies [Mon, 30 Jul 2018 23:24:21 +0000 (09:24 +1000)]

Allow mgr bootstrap keyring to be defined

In environments where we wish to have manual/greater control over
how the bootstrap keyrings are used, we need to able to externally
define what the mgr keyring secret will be and have ceph-ansible
use it, instead of it being autogenerated

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610213
Signed-off-by: Graeme Gillies <ggillies@akamai.com>

commit | commitdiff | tree

Sébastien Han [Wed, 8 Aug 2018 13:51:34 +0000 (15:51 +0200)]

Resync rhcs_edits.txt

We were missing an option so let's add it back.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Christian Berendt [Tue, 7 Aug 2018 16:07:33 +0000 (18:07 +0200)]

docs: overall improvements

* PR -> pull request
* add 2018 to copyright
* add link to OFTC
* add missing ``
* add / remove some newlines
* ansible -> Ansible, ceph -> Ceph, docker -> Docker, jenkins -> Jenkins,
osd -> OSD, python -> Python
* fix reference syntax
* improve some titles

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Sébastien Han [Tue, 7 Aug 2018 11:38:36 +0000 (13:38 +0200)]

test: follow up on osd_crush_location for containers

This was fixed by
https://github.com/ceph/ceph-ansible/commit/578aa5c2d54a680912e4e015b6fb3dbbc94d4fd0
on non-container, we need to apply the same fix for containers.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Tue, 7 Aug 2018 11:36:44 +0000 (13:36 +0200)]

test: remove osd_crush_location from shrink scenarios

This is not needed since this is already covered by docker_cluster and
centos_cluster scenarios.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Christian Berendt [Tue, 7 Aug 2018 08:56:59 +0000 (10:56 +0200)]

docs: overall syntax improvements

* add some missing dots and ``
* add/remove line breaks
* consistent use of shell prompt in consoles outpus
* fix block indents Bearbeiten
* use code blocks

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Artur Fijalkowski [Thu, 2 Aug 2018 11:28:44 +0000 (13:28 +0200)]

Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.

Closes: #2964
Signed-off-by: Artur Fijalkowski <artur.fijalkowski@ing.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 2 Aug 2018 09:58:47 +0000 (11:58 +0200)]

iscsigw: install ceph-iscsi-cli package

Install ceph-iscsi-cli in order to provide the `gwcli` command tool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602785
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 31 Jul 2018 13:18:28 +0000 (15:18 +0200)]

defaults: backward compatibility with fqdn deployments

This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 30 Jul 2018 16:29:00 +0000 (18:29 +0200)]

config: enforce socket name

This was introduced by
https://github.com/ceph/ceph/commit/59ee2e8d3b14511e8d07ef8325ac8ca96e051784
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.

So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Mike Christie [Thu, 26 Jul 2018 18:52:44 +0000 (13:52 -0500)]

igw: do not fail purge on rbd removal errors

Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.

Signed-off-by: Mike Christie <mchristi@redhat.com>

commit | commitdiff | tree

Mike Christie [Wed, 25 Jul 2018 18:13:17 +0000 (13:13 -0500)]

igw: fix image removal during purge

We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.

This just fixes the bug by passing in the ceph conf info which has the
clustername to use.

This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949

Signed-off-by: Mike Christie <mchristi@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 27 Jul 2018 14:50:40 +0000 (16:50 +0200)]

restapi: disable it when ceph version > luminous

ceph-rest-api binary has been removed in mimic so we cannot deploy it
anymore. We just keep the role and the compability for existing users.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 27 Jul 2018 14:52:19 +0000 (16:52 +0200)]

osd: do not remove expose_partition container

The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 25 Jul 2018 14:39:35 +0000 (16:39 +0200)]

site: report ceph -s status at the end of the deployment

We now show the output of 'ceph -s'. Example output below:

TASK [display post install message] **********************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "  cluster:",
        "    id:     753212df-f32a-4cc9-a097-2db6fe89a251",
        "    health: HEALTH_OK",
        " ",
        "  services:",
        "    mon: 1 daemons, quorum ceph-nano-lul-faa32aebf00b",
        "    mgr: ceph-nano-lul-faa32aebf00b(active)",
        "    osd: 1 osds: 1 up, 1 in",
        " ",
        "  data:",
        "    pools:   4 pools, 32 pgs",
        "    objects: 224 objects, 2546 bytes",
        "    usage:   1027 MB used, 9212 MB / 10240 MB avail",
        "    pgs:     32 active+clean",
        " "
    ]
}

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1602910
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 26 Jul 2018 14:43:35 +0000 (16:43 +0200)]

tests: leave an OSD node in default crush root

jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 26 Jul 2018 09:43:29 +0000 (11:43 +0200)]

iscsigw: do not run common roles when deploying jewel

Let's not deploy common roles when iscsigw nodes for jewel deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 25 Jul 2018 21:57:38 +0000 (23:57 +0200)]

rbd-mirror: bring back compatibility with jewel deployment

rbd-mirror can't start when deploying jewel because it needs admin
keyring.
Getting back this task brings backward compatibility for jewel
deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 25 Jul 2018 16:12:06 +0000 (18:12 +0200)]

ceph-osds: backward compatibility with jewel for osp pools creation

If we want to be backward compatible with release prior to luminous, we
have to set the rule name accordingly to default values used in jewel.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 26 Jul 2018 13:27:21 +0000 (15:27 +0200)]

client: fix an incorrect title in a task

This task would be run on both containerized *and* non containerized
deployment.
Let's have a proper title to avoid confusion.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Arata Notsu [Wed, 25 Jul 2018 06:17:33 +0000 (15:17 +0900)]

site.yml.sample: fix install python2

Check `systempython2.stat` instead of `systempython2.stat.exists`.

Without this change, in the case that python2 is not installed, the `stat`
task fails without defining `systempython2.stat`. It leads that the next
installation tasks fail because of undefined `systempython2.stat`.

An example error output (edited for readability):

```
TASK [check for python2] ***********************************************
Wednesday 25 July 2018  14:52:47 +0900 (0:00:00.182)       0:00:00.182 *
fatal: [ceph-osd1.vlan221.vtj]: FAILED! => {
"changed": false, "module_stderr": "/bin/sh: 1: /usr/bin/python: not
found\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 127}
...ignoring

TASK [install python2 for debian based systems] ************************
Wednesday 25 July 2018  14:51:00 +0900 (0:00:01.742)       0:00:01.926 *
fatal: [ceph-mon2]: FAILED! => {
"msg": "The conditional check 'systempython2.stat.exists is undefined or
systempython2.stat.exists == false' failed. The error was: error while
evaluating conditional (systempython2.stat.exists is undefined or
systempython2.stat.exists == false): 'dict object' has no attribute 'stat'
\n\n The error appears to have been in
'/Users/arata/git/ceph-ansible/site.yml.sample': line 36, column 7, but
may\n be elsewhere in the file depending on the exact syntax problem.\n\n
The offending line appears to be:\n\n\n
    - name: install python2 for debian based systems\n
      ^ here\n
"}
...ignoring
```

Fixes: #2930
Signed-off-by: Arata Notsu <arata776@gmail.com>

commit | commitdiff | tree

Sébastien Han [Tue, 24 Jul 2018 16:27:12 +0000 (18:27 +0200)]

rgw: add more config option for civetweb frontend

In containerized deployments we now inherite from the
radosgw_civetweb_options options when bootstrapping the container.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1582411
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Giulio Fidente [Tue, 24 Jul 2018 16:04:23 +0000 (18:04 +0200)]

Run creation of empty rados index object to first monitor

When distributing ceph-nfs role, creation of rados index object
fails as it assumes availability of client.admin locally.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1607970
Signed-off-by: Giulio Fidente <gfidente@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 24 Jul 2018 14:55:33 +0000 (16:55 +0200)]

main: update requirements.txt

update requirements.txt accordingly to the last ansible version tested
on master.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 23 Jul 2018 14:02:43 +0000 (16:02 +0200)]

validate: add checks for interfaces

Check if the interface provided:

* exists in the gathered facts (thus on the system)
* is active
* has an IP address (depending on ip_version )

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600227
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 23 Jul 2018 12:56:20 +0000 (14:56 +0200)]

rolling_update: set osd sortbitwise

upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 24 Jul 2018 14:35:42 +0000 (16:35 +0200)]

tests: followup on b89cc1746f

Update network subnets in group_vars/all

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 23 Jul 2018 14:40:49 +0000 (16:40 +0200)]

tests: do not deploy all daemons for shrink osds scenarios

Let's create a dedicated environment for these scenarios, there is no
need to deploy everything.
By the way, doing so will save some times.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 19 Jul 2018 12:39:42 +0000 (14:39 +0200)]

tests: test master against ansible 2.6

Ansible 2.4 is currently end-of-life.
Ansible 2.5 will go end-of-life after Ansible 2.7 is released.

Fixes: #2901
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 19 Jul 2018 21:38:08 +0000 (23:38 +0200)]

tests: add support of 'ooo-collocation' scenario when testing against ceph dev

The group_vars/all file is not available on 'ooo-collocation' scenario,
it's making the `dev_setup.yml` failing because this path is hardcoded.

The idea here is to check if the pattern 'ooo-collocation' is present in
`change_dir` variable so we can set this path properly according to the
scenario being run.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 09:07:49 +0000 (11:07 +0200)]

tests: support update scenarios in test_rbd_mirror_is_up()

`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 15:38:09 +0000 (17:38 +0200)]

tests: stop hardcoding ansible version

In addition to ceph/ceph-build#1082

Let's set the ansible version in each ceph-ansible branch's respective
requirements.txt.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 15:46:27 +0000 (17:46 +0200)]

validate: only run osd test on osd node

Do not run device validation on every hosts, only on OSD nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 18 Jul 2018 13:54:33 +0000 (15:54 +0200)]

docs: update doc

- stable-2.1 and stable-2.2 shouldn't be referenced anymore.
- add stable-3.1 branch reference
- update the differents ansible version supported

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 14:20:47 +0000 (16:20 +0200)]

shrink-osd: purge osd on containerized deployment

Prior to this commit we were only stopping the container, but now we
also purge the devices.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572933
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 18 Jul 2018 13:59:22 +0000 (15:59 +0200)]

valide: improve device check

We know make sure that:

* devices are actually block special files
* length of dedicated_device is identical to devices

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 17 Jul 2018 08:47:28 +0000 (10:47 +0200)]

tests: add latest-bis-jewel for jewel tests

since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 13 Jul 2018 08:10:51 +0000 (10:10 +0200)]

nfs: change default stable branch for nfs-ganesha repo

Since `V2.6-stable` is available and has packages for `mimic`, let's
update this default value accordingly so nfs nodes can be deployed with
mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Fri, 1 Jun 2018 09:32:00 +0000 (11:32 +0200)]

tests: followup on #2656

34f70428 has introduced a fix using `command` module while this could
have been achieved by using `lvol` module.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 12 Jul 2018 09:31:00 +0000 (11:31 +0200)]

validate: force ansible version

We currently only support Ansible 2.4.X so let's fail if the version is
different.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 11 Jul 2018 14:03:10 +0000 (16:03 +0200)]

client: do not rely on copy_admin_key to import keys

Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.

Fixes: #2867
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Andy McCrae [Thu, 12 Jul 2018 11:24:15 +0000 (12:24 +0100)]

Sync config_template with upstream for Ansible 2.6

The original_basename option in the copy module changed to be
_original_basename in Ansible 2.6+, this PR resyncs the config_template
module to allow this to work with both Ansible 2.6+ and before.

Additionally, this PR removes the _v1_config_template.py file, since
ceph-ansible no longer supports versions of Ansible before version 2,
and so we shouldn't continue to carry that code.

Closes: #2843
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 11 Jul 2018 14:34:09 +0000 (16:34 +0200)]

mgr: fix condition to add modules to ceph-mgr

Follow up on #2784

We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600155
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 12 Jul 2018 12:10:15 +0000 (14:10 +0200)]

Update issue templates

Introduce templates for issues and feature requests.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Jul 2018 09:56:17 +0000 (11:56 +0200)]

mon: ensure socker is purged when mon is stopped

On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.

Typical error:

```
fatal: [osd0]: FAILED! => {}

MSG:

'dict object' has no attribute 'osd_pool_default_pg_num'
```

the fact is not set because of this previous failure earlier:

```
ok: [mon0] => {
    "changed": false,
    "cmd": "docker exec ceph-mon-mon0 ceph --cluster test daemon mon.mon0 config get osd_pool_default_pg_num",
    "delta": "0:00:00.217382",
    "end": "2018-07-09 22:25:53.155969",
    "failed_when_result": false,
    "rc": 22,
    "start": "2018-07-09 22:25:52.938587"
}

STDERR:

admin_socket: exception getting command descriptions: [Errno 111] Connection refused

MSG:

non-zero return code
```

This failure happens when the ceph-mon service is stopped, indeed, since
the socket isn't purged, it's a leftover which is confusing the process.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Jul 2018 13:50:52 +0000 (15:50 +0200)]

common: switch from docker module to docker_container

As of ansible 2.4, `docker` module has been removed (was deprecated
since ansible 2.1).
We must switch to `docker_container` instead.

See: https://docs.ansible.com/ansible/latest/modules/docker_module.html#docker-module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 9 Jul 2018 09:51:24 +0000 (11:51 +0200)]

tests: fix `_get_osd_id_from_host()` in TestOSDs()

We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.

Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```

Fixes: #2860
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Shilpa Jagannath [Tue, 3 Jul 2018 09:43:07 +0000 (15:13 +0530)]

Remove zone from zonegroup and update period before deleting the zone to avoid inconsistent period information across other zones.

When you delete a zone without removing from zonegroup, the period update would
fail since that command needs to load the zone and zonegroup to be able to
update the master. Period update would fail with an error like this:

radosgw-admin period update --commit
-1 Cannot find zone id= (name=), switching to local zonegroup configuration
-1 Cannot find zone id= (name=)

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 3 Jul 2017 12:52:07 +0000 (14:52 +0200)]

common: remove hdparm

As of Kraken, the journal code does not use the hdparm command anymore
so we can remove it from our package dependency list.

Fixes: https://github.com/ceph/ceph-ansible/issues/1402
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit f6910efa24389c264062963b2054c7cd29ffebb3)

commit | commitdiff | tree

Guillaume Abrioux [Fri, 6 Jul 2018 08:57:32 +0000 (10:57 +0200)]

tox: test mimic deployment

Let's try to deploy mimic.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit d2cdfd830c339738b43e6a4417ca5d41fe416f86)

commit | commitdiff | tree

Guillaume Abrioux [Thu, 31 May 2018 10:02:26 +0000 (12:02 +0200)]

tests: refact ci testing master

We should test ceph-ansible against the latest ansible stable version on
master.

This commit also remove the pinning to 1.7.1 version of testinfra
because ansible 2.5 requires a newer version.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 5 Jul 2018 13:16:19 +0000 (15:16 +0200)]

tests: add mimic support for test_rbd_mirror_is_up()

prior mimic, the data structure returned by `ceph -s -f json` used to
gather information about rbd-mirror daemons looked like below:

```
  "servicemap": {
    "epoch": 8,
    "modified": "2018-07-05 13:21:06.207483",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "ceph-nano-luminous-faa32aebf00b": {
            "start_epoch": 8,
            "start_stamp": "2018-07-05 13:21:04.668450",
            "gid": 14107,
            "addr": "172.17.0.2:0/2229952892",
            "metadata": {
              "arch": "x86_64",
              "ceph_version": "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)",
              "cpu": "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-nano-luminous-faa32aebf00b",
              "instance_id": "14107",
              "kernel_description": "#1 SMP Wed Mar 14 15:12:16 UTC 2018",
              "kernel_version": "4.9.87-linuxkit-aufs",
              "mem_swap_kb": "1048572",
              "mem_total_kb": "2046652",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This part has changed from mimic and became:
```
  "servicemap": {
    "epoch": 2,
    "modified": "2018-07-04 09:54:36.164786",
    "services": {
      "rbd-mirror": {
        "daemons": {
          "summary": "",
          "14151": {
            "start_epoch": 2,
            "start_stamp": "2018-07-04 09:54:35.541272",
            "gid": 14151,
            "addr": "192.168.1.80:0/240942528",
            "metadata": {
              "arch": "x86_64",
              "ceph_release": "mimic",
              "ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
              "ceph_version_short": "13.2.0",
              "cpu": "Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz",
              "distro": "centos",
              "distro_description": "CentOS Linux 7 (Core)",
              "distro_version": "7",
              "hostname": "ceph-rbd-mirror0",
              "id": "ceph-rbd-mirror0",
              "instance_id": "14151",
              "kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
              "kernel_version": "3.10.0-862.2.3.el7.x86_64",
              "mem_swap_kb": "1572860",
              "mem_total_kb": "1015548",
              "os": "Linux"
            }
          }
        }
      }
    }
  }
```

This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Sébastien Han [Thu, 5 Jul 2018 12:10:33 +0000 (14:10 +0200)]

ceph-config: do not log cluster log on container

The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:

2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory

So we now tell the mon to not log cluster log on the filesystem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Wed, 4 Jul 2018 14:39:33 +0000 (16:39 +0200)]

ceph-common: fix rhcs condition

We forgot to add mgr_group_name when checking for the mon repo, thus the
conditional on the next task was failing.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1598185
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:35:20 +0000 (10:35 +0200)]

docs: make "OSD Configuration" a subsection

"OSD Configuration" should be part of the "Configuration and Usage" section.

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:42:34 +0000 (10:42 +0200)]

docs: change github/Github to GitHub

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Christian Berendt [Wed, 4 Jul 2018 08:39:52 +0000 (10:39 +0200)]

docs: use apt instead of apt-get

Signed-off-by: Christian Berendt <berendt@b1-systems.de>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 18 Jun 2018 15:26:21 +0000 (17:26 +0200)]

mgr: fix enabling of mgr module on mimic

The data structure has slightly changed on mimic.

Prior to mimic, it used to be:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        "balancer",
        "dashboard",
        "influx",
        "localpool",
        "prometheus",
        "restful",
        "selftest",
        "zabbix"
    ]
}
```

From mimic it looks like this:

```
{
    "enabled_modules": [
        "status"
    ],
    "disabled_modules": [
        {
            "name": "balancer",
            "can_run": true,
            "error_string": ""
        },
        {
            "name": "dashboard",
            "can_run": true,
            "error_string": ""
        }
    ]
}
```

This means we can't simply check if `item` is in `item in
_ceph_mgr_modules.disabled_modules`

the idea here is to use filter `map(attribute='name')` to build a list
when deploying mimic.

Fixes: #2766
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>

commit | commitdiff | tree

Vishal Kanaujia [Wed, 13 Jun 2018 10:14:52 +0000 (15:44 +0530)]

Rolling upgrades: Migrate to ceph-key module

This change moves ceph-mgr upgrades to using ceph-key library.
Fixes: #2758
Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 10:10:16 +0000 (12:10 +0200)]

ceph-client: do not kill the dummy container

The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Mon, 2 Jul 2018 14:07:10 +0000 (16:07 +0200)]

ci: remove DCO

We know a Signed-off-by check inside our pipeline so this bot is not
needed anymore.

Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 09:48:01 +0000 (11:48 +0200)]

ceph-mds: enable application pool

We now enable the application type 'cephfs' for each cephfs pools we
create.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Sébastien Han [Fri, 29 Jun 2018 09:46:56 +0000 (11:46 +0200)]

ceph-defaults: add default application to pool

We now add a default 'rbd' application type to each pool we create. This
will remove the warning: " application not enabled on N pool(s) "

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590275
Signed-off-by: Sébastien Han <seb@redhat.com>

commit | commitdiff | tree

Vasu Kulkarni [Tue, 26 Jun 2018 21:41:14 +0000 (14:41 -0700)]

Enable monitor repo for mgr nodes and Tools repo for iscsi/nfs/clients

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>

commit | commitdiff | tree

Andy McCrae [Thu, 21 Jun 2018 09:28:27 +0000 (10:28 +0100)]

Sync config_template with upstream

Some fixes have gone into
git.openstack.org/openstack/ansible-config_template to deal with a few
bugs we have run into.

This PR brings the ceph-ansible config_template version up to the same
as the ansible-config_template openstack repo.

Closes: #2742
Signed-off-by: Andy McCrae <andy.mccrae@gmail.com>

commit | commitdiff | tree

Sébastien Han [Thu, 28 Jun 2018 07:54:24 +0000 (09:54 +0200)]

ceph-osd: trigger osd container restart on script change

The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.

Closing: https://bugzilla.redhat.com/show_bug.cgi?id=1596061
Signed-off-by: Sébastien Han <seb@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom