]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
7 years agoclient: try to kill dummy container only on first client node v3.1.0rc9
Guillaume Abrioux [Wed, 13 Jun 2018 11:54:59 +0000 (13:54 +0200)]
client: try to kill dummy container only on first client node

The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}

```

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1590746
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 51cf3b7fa0211fbbfcbd8c4228dcd39d20f02e54)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
Konstantin Shalygin [Fri, 8 Jun 2018 18:03:00 +0000 (01:03 +0700)]
ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.

If 'openstack_config' is false this task shouldn't be executed.

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
(cherry picked from commit 3a07568496f718ffe44077eac23d7397a17b3b09)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocommon: ability to enable/disable fw configuration
Sébastien Han [Mon, 11 Jun 2018 12:51:58 +0000 (14:51 +0200)]
common: ability to enable/disable fw configuration

Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e8412734a81077c2b11de316e08bbecbed2de96)

7 years agotests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous
Guillaume Abrioux [Mon, 11 Jun 2018 13:40:54 +0000 (15:40 +0200)]
tests: set CEPH_DOCKER_IMAGE_TAG when ceph release is luminous

Since latest points to mimic for the ceph container images, we need to
set `CEPH_DOCKER_IMAGE_TAG` to `latest-luminous` when ceph release is
luminous

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a351b0872620e3bfca1a2ef5fb5235f35002b160)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: increase memory to 1024Mb for centos7_cluster scenario
Guillaume Abrioux [Mon, 11 Jun 2018 08:49:39 +0000 (10:49 +0200)]
tests: increase memory to 1024Mb for centos7_cluster scenario

we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit bbb869133563c3b0ddd0388727b894306b6b8b26)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: keyrings aren't created when single client node v3.1.0rc8
Guillaume Abrioux [Fri, 8 Jun 2018 06:49:37 +0000 (08:49 +0200)]
client: keyrings aren't created when single client node

combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.

In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 090ecff94e341a674556c0f4f578caa73330a0f0)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: update ooo inventory hostfile v3.1.0rc7
Guillaume Abrioux [Thu, 7 Jun 2018 14:11:39 +0000 (16:11 +0200)]
tests: update ooo inventory hostfile

Update the inventory host for tripleo testing scenario so it's the same
parameters than in tripleo CI.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 28d21b4e9c13566855844e9d5da71e2c4ef80894)

7 years agoclient: add a default value for keyring file
Guillaume Abrioux [Thu, 7 Jun 2018 13:49:03 +0000 (15:49 +0200)]
client: add a default value for keyring file

Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:

```
fatal: [client2]: FAILED! => {}

MSG:

The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'

The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: get client cephx keys
  ^ here

exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'

```

adding a default value will avoid the deployment failing for this.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 8a653cacd56553926126d0b43d328af94bbd0337)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoclient: use dummy created container when there is no mon in inventory
Guillaume Abrioux [Wed, 6 Jun 2018 11:59:26 +0000 (13:59 +0200)]
client: use dummy created container when there is no mon in inventory

the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b156deb67a9e137962161829e008bcc32835fe8)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: improve mds tests
Guillaume Abrioux [Wed, 6 Jun 2018 19:56:38 +0000 (21:56 +0200)]
tests: improve mds tests

the expected number of mds daemon consist of number of daemons that are
'up' + number of daemons 'up:standby'.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c94ada69e80d7a1ddfbd2de2b13086d57a6fdfcd)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosd: copy openstack keys over to all mon
Guillaume Abrioux [Wed, 6 Jun 2018 17:13:18 +0000 (19:13 +0200)]
osd: copy openstack keys over to all mon

When configuring openstack, the created keyrings aren't copied over to
all monitors nodes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1588093
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 433ecc7cbcc1ac91cab509dabe5c647d58c18c7f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: fix facts gathering delegation
Guillaume Abrioux [Tue, 5 Jun 2018 14:30:12 +0000 (16:30 +0200)]
rolling_update: fix facts gathering delegation

this is kind of follow up on what has been made in #2560.
See #2560 and #2553 for details.

Closes: #2708
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 232a16d77ff1048a2d3c4aa743c44e864fa2b80b)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotest: do not always copy admin key
Sébastien Han [Tue, 5 Jun 2018 03:56:55 +0000 (11:56 +0800)]
test: do not always copy admin key

The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.

Now we only copy the key when running on the shrink-osd scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 41b4632abca51b4f1ab052e8b47d0bebd2e838e8)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agochange max_mds default to 1
Patrick Donnelly [Mon, 4 Jun 2018 19:58:57 +0000 (12:58 -0700)]
change max_mds default to 1

Otherwise, with the removal of mds_allow_multimds, the default of 3 will be set
on every new FS.

Introduced by: c8573fe0d745e4667b5d757433efec9dac0150bc

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 91f9da530f139cc6f378d1fc549870cbbc45d460)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: fix rgw tests
Guillaume Abrioux [Tue, 5 Jun 2018 09:26:11 +0000 (11:26 +0200)]
tests: fix rgw tests

41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47276764f7576b7bf1f258db73f6e09aab77c3b9)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorgw: refact rgw pools creation
Guillaume Abrioux [Fri, 1 Jun 2018 15:33:54 +0000 (17:33 +0200)]
rgw: refact rgw pools creation

Refact of 8704144e3157aa253fb7563fe701d9d434bf2f3e
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1550281
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2cf06b515f3e173aa3720cf28e4149149d881941)

7 years agorgws: renames create_pools variable with rgw_create_pools.
jtudelag [Thu, 31 May 2018 15:01:44 +0000 (17:01 +0200)]
rgws: renames create_pools variable with rgw_create_pools.

Renamed to be consistent with the role (rgw) and have a meaningful name.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 600e1e2c2680e8102f4ef17855d4bcd89d6ef733)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdds RGWs pool creation to containerized installation.
jtudelag [Sun, 4 Mar 2018 22:06:48 +0000 (23:06 +0100)]
Adds RGWs pool creation to containerized installation.

ceph command has to be executed from one of the monitor containers
if not admin copy present in RGWs. Task has to be delegated then.

Adds test to check proper RGW pool creation for Docker container scenarios.

Signed-off-by: Jorge Tudela <jtudelag@redhat.com>
(cherry picked from commit 8704144e3157aa253fb7563fe701d9d434bf2f3e)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: skip disabling fastest mirror detection on atomic host
Guillaume Abrioux [Tue, 5 Jun 2018 07:31:42 +0000 (09:31 +0200)]
tests: skip disabling fastest mirror detection on atomic host

There is no need to execute this task on atomic hosts.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f0cd4b065144843762b9deca667e05a1903b2121)

7 years agoceph-defaults: Enable local epel repository
Erwan Velu [Fri, 1 Jun 2018 16:53:10 +0000 (18:53 +0200)]
ceph-defaults: Enable local epel repository

During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)

This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.

That should speed up the build process but also avoid the random errors
we face.

This patch is part of a patch series that tries to remove all possible yum failures.

Signed-off-by: Erwan Velu <erwan@redhat.com>
(cherry picked from commit 493f615eae3510021687e8cfc821364cc26a71ac)

7 years agoFix template reference for ganesha.conf
Andy McCrae [Mon, 19 Feb 2018 16:57:18 +0000 (16:57 +0000)]
Fix template reference for ganesha.conf

We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.

Signed-off-by: Lionel Sausin <ls@initiatives.fr>
7 years agoceph-defaults: add the nautilus 14.x entry to ceph_release_num
Andrew Schoen [Thu, 31 May 2018 17:02:46 +0000 (12:02 -0500)]
ceph-defaults: add the nautilus 14.x entry to ceph_release_num

The first 14.x tag has been cut so this needs to be added so that
version detection will still work on the master branch of ceph.

Fixes: https://github.com/ceph/ceph-ansible/issues/2671
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit c2423e2c48f68407e50ec075ec27510f2135f0fa)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomons: move set_fact of openstack_keys in ceph-osd v3.1.0rc6
Guillaume Abrioux [Fri, 1 Jun 2018 13:11:21 +0000 (15:11 +0200)]
mons: move set_fact of openstack_keys in ceph-osd

Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1585139
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit aae37b44f5f17d14034181d5777226d3a582b42d)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: wait for osds to be up before creating pools
Guillaume Abrioux [Fri, 1 Jun 2018 08:38:46 +0000 (10:38 +0200)]
osds: wait for osds to be up before creating pools

This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.

Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9d5265fe11fb5c1d0058525e8508aba80a396a6b)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMakefile: followup on #2585
Guillaume Abrioux [Thu, 31 May 2018 09:25:49 +0000 (11:25 +0200)]
Makefile: followup on #2585

Fix a typo in `tag` target, double quote are missing here.

Without them, the `make tag` command fails like this:

```
if [[ "v3.0.35" ==  ]]; then \
            echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; \
            exit 1; \
        fi
/bin/sh: -c: line 0: unexpected argument `]]' to conditional binary operator
/bin/sh: -c: line 0: syntax error near `;'
/bin/sh: -c: line 0: `if [[ "v3.0.35" ==  ]]; then     echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35";     exit 1; fi'
make: *** [tag] Error 2
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0b67f42feb95594fb403908d61383dc25d6cd342)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMakefile: add "make tag" command
Ken Dreyer [Thu, 10 May 2018 23:08:05 +0000 (17:08 -0600)]
Makefile: add "make tag" command

Add a new "make tag" command. This automates some common operations:

1) Automatically determine the next Git tag version number to create.
   For example:
   "3.2.0beta1 -> "3.2.0beta2"
   "3.2.0rc1 -> "3.2.0rc2"
   "3.2.0" -> "3.2.1"

2) Create the Git tag, and print instructions for the user to push it to
   GitHub.

3) Sanity check that HEAD is a stable-* branch or master (bail on
   everything else).

4) Sanity check that HEAD is not already tagged.

Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".

Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fcea56849578bd47e65b130ab6884e0b96f9d89d)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorgw: container add option to configure multi-site zone
Sébastien Han [Mon, 16 Apr 2018 13:57:23 +0000 (15:57 +0200)]
rgw: container add option to configure multi-site zone

You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1c084efb3cb7e48d96c9cbd6bd05ca4f93526853)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: remove check on pg_num for cephfs_pools v3.1.0rc5
Guillaume Abrioux [Wed, 30 May 2018 15:04:53 +0000 (17:04 +0200)]
mon: remove check on pg_num for cephfs_pools

It should have been backported from 29a9dff but for better clarity I
think it's better to create a new commit for this.

c68126d6 aims to not make `pgs` attribute mandatory for each element of
`cephfs_pools`. Therefore, we must remove the check in
`roles/ceph-mon/tasks/check_mandatory_vars.yml`.
This task has been removed by 29a9dff but I've chosen to not backport
this commit since it's part of a bunch of commits belonging to a PR
implementing `ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomdss: do not make pg_num a mandatory params
Guillaume Abrioux [Tue, 29 May 2018 07:18:08 +0000 (09:18 +0200)]
mdss: do not make pg_num a mandatory params

When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).

At the moment the variable definition for `cephfs_pools` looks like:

```
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "" }
  - { name: "{{ cephfs_metadata }}", pgs: "" }
```

and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.

We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1581164
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c68126d6fdde8efbeb6e5c495a0147a34a5dfd0e)

7 years agotests: fix broken symlink
Guillaume Abrioux [Mon, 28 May 2018 08:30:42 +0000 (10:30 +0200)]
tests: fix broken symlink

`requirements2.5.txt` is pointing to `tests/requirements2.4.txt` while
it should point to `requirements2.4.txt` since they are in the same
directory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6f489015e44f8c10ea0ad47b23bada6ff8351f68)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosds: do not set docker_exec_cmd fact
Guillaume Abrioux [Wed, 30 May 2018 10:09:16 +0000 (12:09 +0200)]
osds: do not set docker_exec_cmd fact

in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.

By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.

Removing this task will allow to collocate an OSD with a MON.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1584179
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34e646e767955024c9b186f9eaa61f809fe45af0)

7 years agotests: resize root partition when atomic host
Guillaume Abrioux [Wed, 30 May 2018 07:17:09 +0000 (09:17 +0200)]
tests: resize root partition when atomic host

For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.

The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.

Typical error seen:

```
STDERR:

failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```

Indeed, on the machine we can see:
```
Every 2.0s: df -h                                                                                                                                                                                                                                       Tue May 29 17:21:13 2018
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  3.0G  3.0G   14M 100% /
```

The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.

```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
  Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
  Logical volume atomicos/root successfully resized.
```

```
-bash-4.2# xfs_growfs /
meta-data=/dev/mapper/atomicos-root isize=512    agcount=4, agsize=192000 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=768000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 768000 to 2543616
```

```
-bash-4.2# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  9.7G  1.4G  8.4G  14% /
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 34f70428521ab30414ce8806c7e2967a7387ff00)

7 years agotests: avoid yum failures
Guillaume Abrioux [Mon, 28 May 2018 10:02:49 +0000 (12:02 +0200)]
tests: avoid yum failures

In the CI we can see at many times failures like following:

`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`

It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.

This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Erwan Velu <evelu@redhat.com>
(cherry picked from commit 98cb6ed8f602d9c54b63c5381a17dbca75df6bc2)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomds: move mds fs pools creation v3.1.0rc4
Guillaume Abrioux [Fri, 25 May 2018 00:39:01 +0000 (02:39 +0200)]
mds: move mds fs pools creation

When collocating mds on monitor node, the cephpfs will fail
because `docker_exec_cmd` is reset to `ceph-mds-monXX` which is
incorrect because we need to delegate the task on `ceph-mon-monXX`.
In addition, it wouldn't have worked since `ceph-mds-monXX` container
isn't started yet.

Moving the task earlier in the `ceph-mds` role will fix this issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 608ea947a9a2dcf35b4a9cf43c0ea0486c36dfb5)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoAdd privilege escalation to iscsi purge tasks
Paul Cuzner [Fri, 25 May 2018 00:13:20 +0000 (12:13 +1200)]
Add privilege escalation to iscsi purge tasks

Without the escalation, invocation from non-root
users with fail when accessing the rados config
object, or when attempting to log to /var/log

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 2890b57cfc2e1ef9897a791ce60f4a5545011907)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoplaybook: follow up on #2553
Guillaume Abrioux [Thu, 24 May 2018 13:07:56 +0000 (15:07 +0200)]
playbook: follow up on #2553

Since we fixed the `gather and delegate facts` task, this exception is
not needed anymore. It's a leftover that should be removed to save some
time when deploying a cluster with a large client number.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 828848017cefd981e14ca9e4690dd7d1320f0eef)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-defaults: move cephfs vars from the ceph-mon role
Andrew Schoen [Mon, 30 Apr 2018 19:21:12 +0000 (14:21 -0500)]
ceph-defaults: move cephfs vars from the ceph-mon role

We're doing this so we can validate this in the ceph-validate role

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
(cherry picked from commit 1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4)

7 years agogroup_vars: resync group_vars
Sébastien Han [Wed, 23 May 2018 19:44:24 +0000 (12:44 -0700)]
group_vars: resync group_vars

The previous commit changed the content of roles/$ROLE/default/main.yml
so we have to re generate the group_vars files.

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3c32280ca1093f6c3abe0038f524ee3b88dd3672)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomdss: move cephfs pools creation in ceph-mds
Guillaume Abrioux [Wed, 23 May 2018 03:07:38 +0000 (05:07 +0200)]
mdss: move cephfs pools creation in ceph-mds

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move cephfs pools creation in `ceph-mds` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3a0e168a76beaf8fb43c6afa56c6cf2b634a8aa8)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotests: move cephfs_pools variable
Guillaume Abrioux [Wed, 23 May 2018 02:59:37 +0000 (04:59 +0200)]
tests: move cephfs_pools variable

let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 so
we keep consistency between the playbook and the tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a10e73d78d07179ff20ea7cabc2f2ccd1b1b967f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoosds: move openstack pools creation in ceph-osd
Guillaume Abrioux [Tue, 22 May 2018 14:41:40 +0000 (16:41 +0200)]
osds: move openstack pools creation in ceph-osd

When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.

The idea here is to move openstack pools creation at the end of `ceph-osd` role.

[1] https://github.com/ceph/ceph/blob/e59258943bcfe3e52d40a59ff30df55e1e6a3865/src/mon/OSDMonitor.cc#L5673

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1578086
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 564a662baf10b9085a6da8c9152400914e310d15)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefaults: resync sample files with actual defaults
Guillaume Abrioux [Tue, 22 May 2018 14:04:15 +0000 (16:04 +0200)]
defaults: resync sample files with actual defaults

6644dba5e3a46a5a8c1cf7e66b97f7b7d62e8e95 and
1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 introduced changes some changes
in defaults variables files but it seems we've forgotten to
regenerate the sample files.
This commit aims to resync the content of `all.yml.sample`,
`mons.yml.sample` and `rhcs.yml.sample`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f8260119cd920441fa0b8ae063b3d501899406f7)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph-radosgw: disable NSS PKI db when SSL is disabled
Luigi Toscano [Tue, 22 May 2018 09:46:33 +0000 (11:46 +0200)]
ceph-radosgw: disable NSS PKI db when SSL is disabled

The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.

It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.

Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a122b7ce8ec6a5c27a70bc19589d177.
Now profiles drives the setting of rgw keystone *.

Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98312734e2f12a1ea5ef29981e9072bd)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorhcs: bump version to 3.0 for stable 3.1
Sébastien Han [Fri, 4 May 2018 23:41:49 +0000 (01:41 +0200)]
rhcs: bump version to 3.0 for stable 3.1

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bf9593bcedea6bd0220eeeb4029c6632f5a8e6f6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoSkip GPT header creation for lvm osd scenario
Vishal Kanaujia [Wed, 16 May 2018 09:58:31 +0000 (15:28 +0530)]
Skip GPT header creation for lvm osd scenario

The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.

fixes: https://github.com/ceph/ceph-ansible/issues/2592

Signed-off-by: Vishal Kanaujia <vishal.kanaujia@flipkart.com>
(cherry picked from commit ef5f52b1f36188c3cab40337640a816dec2542fa)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: fix get fsid for containers
Sébastien Han [Tue, 22 May 2018 23:52:40 +0000 (16:52 -0700)]
rolling_update: fix get fsid for containers

When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:

TASK [get current fsid] ********************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-luminous-ansible2.4-update_docker_cluster/rolling_update.yml:214
Tuesday 22 May 2018  22:48:32 +0000 (0:00:02.615)       0:11:01.035 ***********
fatal: [mgr0 -> mon0]: FAILED! => {
    "changed": true,
    "cmd": [
        "ceph",
        "--cluster",
        "test",
        "fsid"
    ],
    "delta": "0:05:00.260674",
    "end": "2018-05-22 22:53:34.555743",
    "rc": 1,
    "start": "2018-05-22 22:48:34.295069"
}

STDERR:

2018-05-22 22:48:34.495651 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).connect protocol feature mismatch, my 83ffffffffffff < peer 481dff8eea4fffb missing 400000000000000
2018-05-22 22:48:34.495684 7f89482c6700  0 -- 192.168.17.10:0/1022712 >> 192.168.17.12:6789/0 pipe(0x7f8944067010 sd=4 :42654 s=1 pgs=0 cs=0 l=1 c=0x7f894405d510).fault

This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit da5b104098023194cfa607ebb0a861540d21346c)

7 years agoFix restarting OSDs twice during a rolling update.
Subhachandra Chandra [Fri, 16 Mar 2018 17:10:14 +0000 (10:10 -0700)]
Fix restarting OSDs twice during a rolling update.

During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.

(cherry picked from commit c7e269fcf5620a49909b880f57f5cbb988c27b07)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoswitch: disable ceph-disk units
Sébastien Han [Wed, 16 May 2018 15:37:10 +0000 (17:37 +0200)]
switch: disable ceph-disk units

During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 49a47124859e6577fb99e6dd680c5244ccd6f38f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agopurge_cluster: fix dmcrypt purge
Guillaume Abrioux [Fri, 18 May 2018 15:56:03 +0000 (17:56 +0200)]
purge_cluster: fix dmcrypt purge

dmcrypt devices aren't closed properly, therefore, it may fail when
trying to redeploy after a purge.

Typical errors:

```
ceph-disk: Cannot discover filesystem type: device /dev/sdb1: Command
'/sbin/blkid' returned non-zero exit status 2
```

```
ceph-disk: Error: unable to read dm-crypt key:
/var/lib/ceph/osd-lockbox/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf:
/etc/ceph/dmcrypt-keys/c6e01af1-ed8c-4d40-8be7-7fc0b4e104cf.luks.key
```

Closing properly dmcrypt devices allows to redeploy without error.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9801bde4d4ce501208fc297d5cb0ab2e0aa28702)

7 years agopurge_cluster: wipe all partitions
Guillaume Abrioux [Wed, 16 May 2018 15:34:38 +0000 (17:34 +0200)]
purge_cluster: wipe all partitions

In order to ensure there is no leftover after having purged a cluster,
we must wipe all partitions properly.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a9247c4de78dec8a63f17400deb8b06ce91e7267)

7 years agopurge_cluster: fix bug when building device list
Guillaume Abrioux [Wed, 16 May 2018 14:04:25 +0000 (16:04 +0200)]
purge_cluster: fix bug when building device list

there is some leftover on devices when purging osds because of a invalid
device list construction.

typical error:
```
changed: [osd3] => (item=/dev/sda sda1) => {
    "changed": true,
    "cmd": "# if the disk passed is a raw device AND the boot system disk\n if parted -s \"/dev/sda sda1\" print | grep -sq boot; then\n echo \"Looks like /dev/sda sda1 has a boot partition,\"\n echo \"if you want to delete specific partitions point to the partition instead of the raw device\"\n echo \"Do not use your system disk!\"\n exit 1\n fi\n echo sgdisk -Z \"/dev/sda sda1\"\n echo dd if=/dev/zero of=\"/dev/sda sda1\" bs=1M count=200\n echo udevadm settle --timeout=600",
    "delta": "0:00:00.015188",
    "end": "2018-05-16 12:41:40.408597",
    "item": "/dev/sda sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:40.393409"
}

STDOUT:

sgdisk -Z /dev/sda sda1
dd if=/dev/zero of=/dev/sda sda1 bs=1M count=200
udevadm settle --timeout=600

STDERR:

Error: Could not stat device /dev/sda sda1 - No such file or directory.
```

the devices list in the task `resolve parent device` isn't built
properly because the command used to resolve the parent device doesn't
return the expected output

eg:

```
changed: [osd3] => (item=/dev/sda1) => {
    "changed": true,
    "cmd": "echo /dev/$(lsblk -no pkname \"/dev/sda1\")",
    "delta": "0:00:00.013634",
    "end": "2018-05-16 12:41:09.068166",
    "item": "/dev/sda1",
    "rc": 0,
    "start": "2018-05-16 12:41:09.054532"
}

STDOUT:

/dev/sda sda1
```

For instance, it will result with a devices list like:
`['/dev/sda sda1', '/dev/sdb', '/dev/sdc sdc1']`
where we expect to have:
`['/dev/sda', '/dev/sdb', '/dev/sdc']`

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1492242
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 9cad113e2f22132d08208cd58462f11056c41305)

7 years agodefaults: restart_osd_daemon unit spaces
Sébastien Han [Fri, 18 May 2018 12:43:57 +0000 (14:43 +0200)]
defaults: restart_osd_daemon unit spaces

Extra space in systemctl list-units can cause restart_osd_daemon.sh to
fail

It looks like if you have more services enabled in the node space
between "loaded" and "active" get more space as compared to one space
given in command the command[1].

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1573317
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2f43e9dab5f077276162069f449978ea97c2e9c0)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoDo nothing when mgr module is in good state
Michael Vollman [Thu, 17 May 2018 19:17:29 +0000 (15:17 -0400)]
Do nothing when mgr module is in good state

Check whether a mgr module is supposed to be disabled before disabling
it and whether it is already enabled before enabling it.

Signed-off-by: Michael Vollman <michael.b.vollman@gmail.com>
(cherry picked from commit ed050bf3f682e74d9453451276d10af8c6b5947f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agotake-over: fix bug when trying to override variable
Guillaume Abrioux [Thu, 17 May 2018 15:29:20 +0000 (17:29 +0200)]
take-over: fix bug when trying to override variable

A customer has been facing an issue when trying to override
`monitor_interface` in inventory host file.
In his use case, all nodes had the same interface for
`monitor_interface` name except one. Therefore, they tried to override
this variable for that node in the inventory host file but the
take-over-existing-cluster playbook was failing when trying to generate
the new ceph.conf file because of undefined variable.

Typical error:

```
fatal: [srvcto103cnodep01]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute u'ansible_bond0.15'"}
```

Including variables like this `include_vars: group_vars/all.yml` prevent
us from overriding anything in inventory host file because it
overwrites everything you would have defined in inventory.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1575915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 415dc0a29b10b28cbd047fe28eb4dd38419ea5dc)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agorolling_update: move osd flag section
Sébastien Han [Wed, 16 May 2018 14:02:41 +0000 (16:02 +0200)]
rolling_update: move osd flag section

During a minor update from a jewel to a higher jewel version (10.2.9 to
10.2.10 for example) osd flags don't get applied because they were done
in the mgr section which is skipped in jewel since this daemons does not
exist.
Moving the set flag section after all the mons have been updated solves
that problem.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1548071
Co-authored-by: Tomas Petr <tpetr@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d80a871a078a175d0775e91df00baf625dc39725)

7 years agoclient: remove default value for pg_num in pools creation
Guillaume Abrioux [Thu, 3 May 2018 19:36:21 +0000 (21:36 +0200)]
client: remove default value for pg_num in pools creation

trying to set the default value for pg_num to
`hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num'])` will
break in case of external client nodes deployment.
the `pg_num` attribute should be mandatory and be tested in future
`ceph-validate` role.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit f60b049ae53bbf54dd550587e84b986fef15fbe6)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: move mgr key creation v3.1.0rc3
Sébastien Han [Thu, 10 May 2018 17:38:55 +0000 (10:38 -0700)]
rolling_update: move mgr key creation

Until all the mons haven't been updated to Luminous, there is no way to
create a key. So we should do the key creation in the mon role only if
we are not part of an update.
If we are then the key creation is done after the mons upgrade to
Luminous.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 52fc8a0385a7bc58b8b33fc0c5e05db1a03c5c1f)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "mon: fix mgr keyring creation when upgrading from jewel"
Sébastien Han [Thu, 10 May 2018 17:02:44 +0000 (10:02 -0700)]
Revert "mon: fix mgr keyring creation when upgrading from jewel"

This reverts commit 259fae931d77f056b7e1077b023710cfab1e5cca.

(cherry picked from commit e810fb217f1b78df4039ee50593b8c770fb70dde)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agorolling_update: fix dest path for mgr keys fetching
Guillaume Abrioux [Tue, 15 May 2018 09:41:26 +0000 (11:41 +0200)]
rolling_update: fix dest path for mgr keys fetching

the role `ceph-mgr` that is played later in the playbook fails because
the destination path for the fetched keys is wrong.
This patch fix the destination path used in the task `fetch ceph mgr
key(s)` so there is no mismatch.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 1b4c3f292d8779158ea445a8c9a11c8ed26abe11)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoiscsi-gw: fix issue when trying to mask target
Guillaume Abrioux [Mon, 14 May 2018 15:39:25 +0000 (17:39 +0200)]
iscsi-gw: fix issue when trying to mask target

trying to mask target when `/etc/systemd/system/target.service` doesn't
exist seems to be a bug.
There is no need to mask a unit file which doesn't exist.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a145caf947aec64467150a007b7aafe57abe2891)

7 years agoiscsi: add python-rtslib repository
Sébastien Han [Mon, 14 May 2018 07:21:48 +0000 (09:21 +0200)]
iscsi: add python-rtslib repository

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c7c11b774f54078b32b652481145699dbbd79ff)

7 years agoAllow os_tuning_params to overwrite fs.aio-max-nr
Andy McCrae [Thu, 10 May 2018 10:15:30 +0000 (11:15 +0100)]
Allow os_tuning_params to overwrite fs.aio-max-nr

The order of fs.aio-max-nr (which is hard-coded to 1048576) means that
if you set fs.aio-max-nr in os_tuning_params it will effectively be
ignored for bluestore scenarios.

To resolve this we should move the setting of fs.aio-max-nr above the
setting of os_tuning_params, in this way the operator can define the
value of fs.aio-max-nr to be something other than 1048576 if they want
to.

Additionally, we can make the sysctl settings happen in 1 task rather
than multiple.

(cherry picked from commit 08a2b58d39a687e25436afdf3fda1591d3be8ca1)

7 years agoadds missing state needed to upgrade nfs-ganesha
Gregory Meno [Wed, 9 May 2018 18:17:26 +0000 (11:17 -0700)]
adds missing state needed to upgrade nfs-ganesha

in tasks for os_family Red Hat we were missing this

fixes: bz1575859
Signed-off-by: Gregory Meno <gmeno@redhat.com>
(cherry picked from commit 26f6a650425517216fb57c08e1a8bda39ddcf2b5)
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: fix mgr keyring creation when upgrading from jewel v3.1.0rc2
Guillaume Abrioux [Wed, 9 May 2018 12:42:27 +0000 (14:42 +0200)]
mon: fix mgr keyring creation when upgrading from jewel

On containerized deployment,
when upgrading from jewel to luminous, mgr keyring creation fails because the
command to create mgr keyring is executed on a container that is still
running jewel since the container is restarted later to run the new
image, therefore, it fails with bad entity error.

To get around this situation, we can delegate the command to create
these keyrings on the first monitor when we are running the playbook on the last monitor.
That way we ensure we will issue the command on a container that has
been well restarted with the new image.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoosd: clean legacy syntax in ceph-osd-run.sh.j2
Guillaume Abrioux [Wed, 9 May 2018 01:10:30 +0000 (03:10 +0200)]
osd: clean legacy syntax in ceph-osd-run.sh.j2

Quick clean on a legacy syntax due to e0a264c7e

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoMake sure the restart_mds_daemon script is created with the correct MDS name
Simone Caronni [Thu, 5 Apr 2018 14:14:23 +0000 (16:14 +0200)]
Make sure the restart_mds_daemon script is created with the correct MDS name

7 years agocommon: enable Tools repo for rhcs clients
Sébastien Han [Tue, 8 May 2018 14:11:14 +0000 (07:11 -0700)]
common: enable Tools repo for rhcs clients

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1574458
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoFix install of nfs-ganesha-ceph for Debian/SuSE v3.1.0beta9
Andy McCrae [Thu, 22 Mar 2018 12:19:22 +0000 (12:19 +0000)]
Fix install of nfs-ganesha-ceph for Debian/SuSE

The Debian and SuSE installs for nfs-ganesha on the non-rhcs repository
requires you to allow_unauthenticated for Debian, and disable_gpg_check
for SuSE. The nfs-ganesha-rgw package already does this, but the
nfs-ganesha-ceph package will fail to install because of this same
issue.

This PR moves the installations to happen when the appropriate flags are
set to True (nfs_obj_gw & nfs_file_gw), but does it per distro (one for
SuSE and one for Debian) so that the appropriate flag can be passed to
ignore the GPG check.

7 years agoplaybook: improve facts gathering
Guillaume Abrioux [Thu, 3 May 2018 16:41:16 +0000 (18:41 +0200)]
playbook: improve facts gathering

there is no need to gather facts with O(N^2) way.
Only one node should gather facts from other node.

Fixes: #2553
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoceph-nfs: disable attribute caching
Ramana Raja [Thu, 3 May 2018 12:10:13 +0000 (17:40 +0530)]
ceph-nfs: disable attribute caching

When 'ceph_nfs_disable_caching' is set to True, disable attribute
caching done by Ganesha for all Ganesha exports.

Signed-off-by: Ramana Raja <rraja@redhat.com>
7 years agocommon: copy iso files if rolling_update
Sébastien Han [Thu, 3 May 2018 14:54:53 +0000 (16:54 +0200)]
common: copy iso files if rolling_update

If we are in a middle of an update we want to get the new package
version being installed so the task that copies the repo files should
not be skipped.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1572032
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoMove apt cache update to individual task per role
Andy McCrae [Thu, 26 Apr 2018 09:42:11 +0000 (10:42 +0100)]
Move apt cache update to individual task per role

The apt-cache update can fail due to transient issues related to the
action being a network operation. To reduce the impact of these
transient failures this patch adds a retry to the update_cache task.

However, the apt_repository tasks which would perform an apt_update
won't retry the apt_update on a failure in the same way, as such this PR
moves the apt_update into an individual task, once per role.

Finally, the apt_repository tasks no longer have a changed_when: false,
and the apt_cache update is only performed once per role, if the
repositories change. Otherwise the cache is updated on the "apt" install
tasks if the cache_timeout has been reached.

7 years agoclient: fix pool creation
Guillaume Abrioux [Mon, 30 Apr 2018 18:53:42 +0000 (20:53 +0200)]
client: fix pool creation

the value in `docker_exec_client_cmd` doesn't allow to check for
existing pools because it's set with a wrong value for the entrypoint
that is going to be used.
It means the check were going to fail anyway even if pools actually exist.

Using jinja syntax to set `docker_exec_cmd` allows to handle the case
where you don't have monitors in your inventory.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agomon: change application pool support
Sébastien Han [Thu, 26 Apr 2018 17:55:48 +0000 (19:55 +0200)]
mon: change application pool support

If openstack_pools contains an application key it will be used to apply
this application pool type to a pool.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1562220
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agocheck if pools already exist before creating them
Guillaume Abrioux [Fri, 27 Apr 2018 12:48:33 +0000 (14:48 +0200)]
check if pools already exist before creating them

Add a task to check if pools already exist before we create them.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agotests: update the type for the rule used in pools
Guillaume Abrioux [Wed, 25 Apr 2018 15:33:35 +0000 (17:33 +0200)]
tests: update the type for the rule used in pools

As of ceph 12.2.5 the type of the parameter `type` is not a name anymore but
an id, therefore an `int` is expected otherwise it will fail with the
following error

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: fix ceph_uid fact for osd
Guillaume Abrioux [Wed, 25 Apr 2018 12:20:35 +0000 (14:20 +0200)]
switch: fix ceph_uid fact for osd

In addition to b324c17 this commit fix the ceph uid for osd role in the
switch from non containerized to containerized playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: resolve device path so we can umount the osd data dir
Sébastien Han [Thu, 19 Apr 2018 12:45:03 +0000 (14:45 +0200)]
switch: resolve device path so we can umount the osd data dir

If we don't do this, umounting devices declared like this
/dev/disk/by-id/ata-QEMU_HARDDISK_QM00001

will fail like:

umount: /dev/disk/by-id/ata-QEMU_HARDDISK_QM000011: mountpoint not found

Since we append '1' (partition 1), this won't work.
So we need to resolved the link to get something like /dev/sdb and then
append 1 to /dev/sdb1

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
7 years agoswitch: fix ceph_uid fact
Sébastien Han [Thu, 19 Apr 2018 08:28:56 +0000 (10:28 +0200)]
switch: fix ceph_uid fact

Latest is now centos not ubuntu anymore so the condition was wrong.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoRevert "add .vscode/ to gitignore"
Sébastien Han [Fri, 27 Apr 2018 11:19:25 +0000 (13:19 +0200)]
Revert "add .vscode/ to gitignore"

This reverts commit 3c4319ca4b5355d69b2925e916420f86d29ee524.

7 years agomon/client: honor key mode when copying it to other nodes v3.1.0beta8
Sébastien Han [Mon, 23 Apr 2018 08:02:16 +0000 (10:02 +0200)]
mon/client: honor key mode when copying it to other nodes

The last mon creates the keys with a particular mode, while copying them
to the other mons (first and second) we must re-use the mode that was
set.

The same applies for the client node, the slurp preserves the initial
'item' so we can get the mode for the copy.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: bump client nodes to 2
Sébastien Han [Mon, 23 Apr 2018 08:01:23 +0000 (10:01 +0200)]
ci: bump client nodes to 2

In order to test the key distribution is correct we must have 2 client
nodes.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: remove redundant copy task
Sébastien Han [Mon, 23 Apr 2018 07:52:18 +0000 (09:52 +0200)]
mon: remove redundant copy task

We had twice the same task, also one was overriding the mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon/client: remove acl code
Sébastien Han [Fri, 20 Apr 2018 14:44:41 +0000 (16:44 +0200)]
mon/client: remove acl code

Applying ACL on the keyrings is not used anymore so let's remove this
code.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon/client: apply mode from ceph_key
Sébastien Han [Fri, 20 Apr 2018 14:37:05 +0000 (16:37 +0200)]
mon/client: apply mode from ceph_key

Do not use a dedicated task for this but use the ceph_key module
capability to set file mode.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoceph_key: ability to apply a mode to a file
Sébastien Han [Fri, 20 Apr 2018 14:35:39 +0000 (16:35 +0200)]
ceph_key: ability to apply a mode to a file

You can now create keys and set file mode on them. Use the 'mode'
parameter for that, mode must be in octal so 0644.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoadd AArch64 to supported architecture
Di Xu [Mon, 23 Apr 2018 02:08:48 +0000 (10:08 +0800)]
add AArch64 to supported architecture

works on AArch64 platform

7 years agomon: remove mgr key from ceph_config_keys
Sébastien Han [Thu, 19 Apr 2018 16:54:53 +0000 (18:54 +0200)]
mon: remove mgr key from ceph_config_keys

This key is created after the last mon is up so there is no need to try
to push it from the first mon. The initia mon container is not creating
the mgr key, ansible does. So this key will never exist.
The key will go into the fetch dir once the last mon is up, then when
the ceph-mgr plays it will try to get it from the fetch directory.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomon: remove mon map from ceph_config_keys
Sébastien Han [Thu, 19 Apr 2018 16:40:16 +0000 (18:40 +0200)]
mon: remove mon map from ceph_config_keys

During the initial bootstrap of the first mon, the monmap file is
destroyed so it's not available and ansible will never find it.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoconfig_template: resync with upstream
Sébastien Han [Sat, 31 Mar 2018 10:43:42 +0000 (12:43 +0200)]
config_template: resync with upstream

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoci: test ansible 2.5
Sébastien Han [Wed, 28 Mar 2018 19:52:40 +0000 (21:52 +0200)]
ci: test ansible 2.5

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoExpose /var/run/ceph
Sébastien Han [Thu, 12 Apr 2018 13:52:30 +0000 (15:52 +0200)]
Expose /var/run/ceph

Useful for softwares that do data collection/monitoring like collectd.
They can connect to the socket and then retrieve information.

Even though the sockets are exposed now, I'm keeping the docker exec to
check the socket, this will allow newer version of ceph-ansible to work
with older versions.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1563280
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agodefault: extent ceph_uid and gid
Sébastien Han [Fri, 13 Apr 2018 17:42:17 +0000 (19:42 +0200)]
default: extent ceph_uid and gid

We now have the ability to detect the uid/gid of the ceph user depending
on the distribution we are running on and so we are doing non-container
deployements.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agomove create ceph initial directories to default
Sébastien Han [Fri, 13 Apr 2018 15:56:06 +0000 (17:56 +0200)]
move create ceph initial directories to default

This is needed for both non-container and container deployments.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoshrink-osd: ability to shrink NVMe drives
Sébastien Han [Fri, 20 Apr 2018 09:13:51 +0000 (11:13 +0200)]
shrink-osd: ability to shrink NVMe drives

Now if the service name contains nvme we know we need to remove the last
2 character instead of 1.

If nvme then osd_to_kill_disks is nvme0n1, we need nvme0
If ssd or hdd then osd_to_kill_disks is sda1, we need sda

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1561456
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoselinux: remove chcon calls
Sébastien Han [Tue, 17 Apr 2018 13:32:53 +0000 (15:32 +0200)]
selinux: remove chcon calls

We know bindmount with the :z option at the end of the -v command so
this will basically run the exact same command as we used to run. So to
speak:

chcon -Rt svirt_sandbox_file_t /var/lib/ceph

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: add a --rm option to run the container
Sébastien Han [Tue, 17 Apr 2018 12:16:41 +0000 (14:16 +0200)]
client: add a --rm option to run the container

This fixes the case where the playbook died and never removed the
container. So now, once the container exits it will remove itself from
the container list.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: import the key in ceph is copy_admin_key is true v3.1.0beta7
Sébastien Han [Wed, 18 Apr 2018 13:44:36 +0000 (15:44 +0200)]
client: import the key in ceph is copy_admin_key is true

If the user has set copy_admin_key to true we assume he/she wants to
import the key in Ceph and not only create the key on the filesystem.

Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoclient: add quotes to the dict values
Sébastien Han [Wed, 18 Apr 2018 13:11:55 +0000 (15:11 +0200)]
client: add quotes to the dict values

ceph-authtool does not support raw arguements so we have to quote caps
declaration like this allow 'bla bla' instead of allow bla bla

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
Signed-off-by: Sébastien Han <seb@redhat.com>
7 years agoAdd support for --diff in config_template
Andy McCrae [Wed, 21 Mar 2018 15:57:00 +0000 (15:57 +0000)]
Add support for --diff in config_template

Add support for the Ansible --diff mode in config_template. This will
show the before/after for config_template changes, in the same way as
the base copy and template modules do.

To utilise this run your playbooks with "--diff --check".