]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
6 years agogather-ceph-logs: fix logs list generation
Dimitri Savineau [Mon, 13 May 2019 14:12:42 +0000 (10:12 -0400)]
gather-ceph-logs: fix logs list generation

The shell module doesn't have a stdout_lines attributes. Instead of
using the shell module, we can use the find modules.

Also adding `become: false` to the local tmp directory creation
otherwise we won't have enough right to fetch the files into this
directory.

Resolves: #3966

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-nfs: fixed condition for "stable repos specific tasks"
Bruceforce [Sun, 12 May 2019 09:40:05 +0000 (11:40 +0200)]
ceph-nfs: fixed condition for "stable repos specific tasks"

The old condition would resolve to
"when": "nfs_ganesha_stable - ceph_repository == 'community'"

now it is
"when": [
          "nfs_ganesha_stable",
          "ceph_repository == 'community'"
        ]

Please backport to stable-4.0

Signed-off-by: Bruceforce <markus.greis@gmx.de>
6 years agoUpdate RHCS version with Nautilus
Dimitri Savineau [Fri, 10 May 2019 19:28:18 +0000 (15:28 -0400)]
Update RHCS version with Nautilus

RHCS 4 will be based on Nautilus and only usable on RHEL 8.
Updated the default ceph_rhcs_version to 4 and update the rhcs
repositories to rhcs 4 with RHEL 8.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoSet the rgw_create_pools pools application to rgw
Kevin Coakley [Fri, 10 May 2019 13:32:00 +0000 (06:32 -0700)]
Set the rgw_create_pools pools application to rgw

Set the application to rgw for pools created from rgw_create_pools. On Ceph Nautilus the heath is set to HEALTH_WARN with the message "application not enabled on X pool(s)" if an application isn't specified for a pool.

Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
6 years agoigw: Fix rolling update service ordering
Mike Christie [Thu, 9 May 2019 19:52:08 +0000 (14:52 -0500)]
igw: Fix rolling update service ordering

We must stop tcmu-runner after the other rbd-target-* services
because they may need to interact with tcmu-runner during shutdown.
There is also a bug in some kernels where IO can get stuck in the
kernel and by stopping rbd-target-* first we can make sure all IO is
flushed.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1659611

Signed-off-by: Mike Christie <mchristi@redhat.com>
6 years agoceph-rbd-mirror: refactor tasks/main.yml
Rishabh Dave [Wed, 24 Apr 2019 09:19:04 +0000 (14:49 +0530)]
ceph-rbd-mirror: refactor tasks/main.yml

Use blocks for similar tasks in main.yml. And move when keywords before
block keywords.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoceph-mds: group similar tasks in create_mds_filesystem.yml
Rishabh Dave [Wed, 24 Apr 2019 09:08:15 +0000 (14:38 +0530)]
ceph-mds: group similar tasks in create_mds_filesystem.yml

Group similar tasks together using block keyword.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotox: Refact lvm_osds scenario
Dimitri Savineau [Wed, 3 Apr 2019 20:22:47 +0000 (16:22 -0400)]
tox: Refact lvm_osds scenario

The current lvm_osds only tests filestore on one OSD node.
We also have bs_lvm_osds to test bluestore and encryption.
Let's use only one scenario to test filestore/bluestore and with or
without dmcrypt on four OSD nodes.
Also use validate_dmcrypt_bool_value instead of types.boolean on
dmcrypt validation via notario.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agofacts: fix external cluster bug
Guillaume Abrioux [Tue, 7 May 2019 14:42:49 +0000 (16:42 +0200)]
facts: fix external cluster bug

running an external ceph cluster deployment with (obviously) no
monitors defined in inventory breaks with an undefined error because
`_monitor_addresses` never get defined.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707460
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-mgr: create keys for MGRs
Rishabh Dave [Thu, 2 May 2019 12:48:00 +0000 (08:48 -0400)]
ceph-mgr: create keys for MGRs

Add code in ceph-mgr for creating a keyring for manager in so that
managers can be deployed on a separate node too.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoallow adding a manager to a deployed cluster
Rishabh Dave [Sat, 9 Feb 2019 07:46:12 +0000 (13:16 +0530)]
allow adding a manager to a deployed cluster

Add a playbook that deploys manager on a new node and adds that node to
the already deployed Ceph cluster.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoremove infrastructure-playbooks/rgw-standalone.yml
Rishabh Dave [Tue, 7 May 2019 10:58:36 +0000 (16:28 +0530)]
remove infrastructure-playbooks/rgw-standalone.yml

We don't need infrastructure-playbooks/rgw-standalone.yml since
site.yml.sample and site-cotainer.yml.sample can add a new RGW node to
an already deployed Ceph cluster.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agodon't access other node's docker_exec_cmd variable
Rishabh Dave [Sun, 28 Apr 2019 16:42:45 +0000 (22:12 +0530)]
don't access other node's docker_exec_cmd variable

Except for some corner case, it's not correct to access some other
node's copy of variable docker_exec_cmd. Therefore replace
"hostvars[groups[mon_group_name][0]]['docker_exec_cmd']" by
"docker_exec_cmd".

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoallow adding a RGW to already deployed cluster
Rishabh Dave [Sun, 7 Apr 2019 06:36:31 +0000 (02:36 -0400)]
allow adding a RGW to already deployed cluster

Add a tox scenario that adds a new RGW node as a part of already
deployed Ceph cluster and deploys RGW there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoFix comment content
letterwuyu [Sun, 28 Apr 2019 09:56:29 +0000 (17:56 +0800)]
Fix comment content

Signed-off-by: lishuhao letterwuyu@gmail.com
6 years agoFix check mode support
Gaudenz Steinlin [Mon, 6 May 2019 08:14:36 +0000 (10:14 +0200)]
Fix check mode support

Adds "check_mode: no" to commands which register cluster state in a
variable and don't modify anything. These commands have to run in order
to support running the playbook in check mode.

Signed-off-by: Gaudenz Steinlin <gaudenz.steinlin@cloudscale.ch>
6 years agoallow adding a RBD mirror to already deployed cluster
Rishabh Dave [Sun, 7 Apr 2019 06:14:05 +0000 (02:14 -0400)]
allow adding a RBD mirror to already deployed cluster

Add a tox scenario that adds a new RBD mirror node as a part of already
deployed Ceph cluster and deploys RBD mirror there.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1677431
Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoansible: remove private and static attribute
Dimitri Savineau [Thu, 2 May 2019 13:57:19 +0000 (09:57 -0400)]
ansible: remove private and static attribute

This will be removed in ansible 2.8 and breaks the playbook execution
with this release.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-mds: Increase cpu limit to 4
Dimitri Savineau [Tue, 23 Apr 2019 19:54:38 +0000 (15:54 -0400)]
ceph-mds: Increase cpu limit to 4

In containerized deployment the default mds cpu quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695850
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-osd: Increase cpu limit to 4
Dimitri Savineau [Fri, 5 Apr 2019 13:45:28 +0000 (09:45 -0400)]
ceph-osd: Increase cpu limit to 4

In containerized deployment the default osd cpu quota is too low
for production environment using NVMe devices.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1695880
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agovalidate: check custom repository config options
Jugwan Eom [Mon, 21 Jan 2019 08:08:39 +0000 (08:08 +0000)]
validate: check custom repository config options

This adds missing configuration options when the 'custom'
 repository is used.

Signed-off-by: Jugwan Eom <zugwan@gmail.com>
6 years agoceph-iscsi: start tcmu-runner for non-container
Dimitri Savineau [Tue, 23 Apr 2019 14:08:30 +0000 (10:08 -0400)]
ceph-iscsi: start tcmu-runner for non-container

Only rbd-target-api and rbd-target-gw were started/enabled for non
containerized deployment.
The issue doesn't happen with containerized setup.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: group and parametrize tests
Dimitri Savineau [Thu, 18 Apr 2019 21:08:13 +0000 (17:08 -0400)]
tests: group and parametrize tests

Instead of creating a dedicated test and using the same testinfra
module we can group them into a single test to avoid multiple ansible
connections and testinfra module execution.
This patch also adds parametrize pytest decorator when possible.
Finally fixing some flake minor issue.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotox: Remove update scenario reference
Dimitri Savineau [Tue, 23 Apr 2019 20:33:46 +0000 (16:33 -0400)]
tox: Remove update scenario reference

update scenario is now handled by tox-update.ini file so we shoudn't
have update reference in tox.ini file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoUpdate group_vars according to defaults
Dimitri Savineau [Tue, 23 Apr 2019 20:19:00 +0000 (16:19 -0400)]
Update group_vars according to defaults

b2f2426 didn't use the generate_group_vars_sample.sh script so we
currently have a difference between the content in group_vars and the
ceph-defaults/defaults directories.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorolling_update: restart all ceph-iscsi services
Dimitri Savineau [Tue, 23 Apr 2019 18:58:37 +0000 (14:58 -0400)]
rolling_update: restart all ceph-iscsi services

Currently only rbd-target-gw service is restarted during an update.
We also need to restart tcmu-runner and rbd-target-api services
during the ceph iscsi upgrade.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1659611
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agovalidate: fix a typo
Guillaume Abrioux [Tue, 23 Apr 2019 14:04:27 +0000 (16:04 +0200)]
validate: fix a typo

5aa27794615e7d4521b1dbf1444b61388aacb852 introduced a typo.
This commit fixes it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoimprove coding style
Rishabh Dave [Mon, 1 Apr 2019 15:46:15 +0000 (21:16 +0530)]
improve coding style

Keywords requiring only one item shouldn't express it by creating a
list with single item.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agovalidate: fix notario error
Guillaume Abrioux [Tue, 23 Apr 2019 13:19:26 +0000 (15:19 +0200)]
validate: fix notario error

Typical error:

```
AttributeError: 'Invalid' object has no attribute 'message'
```

As of python 2.6, `BaseException.message` has been deprecated.
When using python3, it fails because it has been removed.

Let's use `str(error)` instead so we don't hit this error when using
python3.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoAllow CephFS pool to be created with specific rule_name, erasure_profile just like...
Radu Toader [Thu, 18 Apr 2019 19:12:55 +0000 (22:12 +0300)]
Allow CephFS pool to be created with specific rule_name, erasure_profile just like rbd pools

Signed-off-by: Radu Toader <radu.m.toader@gmail.com>
6 years agoceph-container-common: modify requirement flow
Dimitri Savineau [Tue, 16 Apr 2019 13:33:02 +0000 (09:33 -0400)]
ceph-container-common: modify requirement flow

Until now it was not possible to install a specific container package
because it was somehow hardcoded.
This patch allows to override the container package name (docker.io
vs docker-ce) and refacts the package installation. This could be
achieve via the container_package_name variable.
Instead of using one task per distribution we can set the package and
service name in vars. This allows to have a unified package task.
Also refactorize the debian_prerequisites tasks because the content
was outdated.

https://docs.docker.com/install/linux/docker-ce/debian/
https://docs.docker.com/install/linux/docker-ce/ubuntu/

Resolves: #3609

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodoc: update index.rst with current information for stable-4.0
Florian Haas [Thu, 18 Apr 2019 13:59:11 +0000 (15:59 +0200)]
doc: update index.rst with current information for stable-4.0

With the stable-4.0 branch nearing release, update
docs/source/index.rst with current information about which Ceph
releases are supported, and which Ansible versions are required, for
each branch.

Signed-off-by: Florian Haas <florian@citynetwork.eu>
6 years agomds: remove legacy task
Guillaume Abrioux [Thu, 18 Apr 2019 08:44:41 +0000 (10:44 +0200)]
mds: remove legacy task

this task has nothing to do in stable-4.0 and after.
Let's remove it since stable-4.0 and after aren't intended to deploy
luminous.

Closes: #3873
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorgw: add cpuset support
Kyle Bader [Thu, 21 Mar 2019 18:54:34 +0000 (11:54 -0700)]
rgw: add cpuset support

1/ The OSD already supports cpuset to be used for containerized deployments
through the use of the ceph_osd_docker_cpuset_cpus variable. This adds similar
support to the RGW service for containerized deployments by setting a new
variable named ceph_rgw_docker_cpuset_cpus. Like the OSD, there are times where
using distinct cores has advantages over using the CFS in kernel scheduler.

ceph_rgw_docker_cpuset_cpus accepts a comma delimited set of CPU ids

2/ Add support for specifying --cpuset-mem variable to restrict the cgroup's memory
allocations to a particular numa node, which should typically correspond with
the cpu ids of that numa node that were provided with --cpuset-cpus. To ensure
the correct cpu ids are used one can run `numactl --hardware`  to list the nodes
and which cpu ids correspond to each.

Signed-off-by: Kyle Bader <kbader@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-mgr: Add extra module packages
Dimitri Savineau [Mon, 15 Apr 2019 16:15:49 +0000 (12:15 -0400)]
ceph-mgr: Add extra module packages

Since Nautilus there's mgr extra modules not present in ceph-mgr
package but in dedicated packages.

Resolves: #3860

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoupdate: ensure tasks are executed on an upgraded mon
Guillaume Abrioux [Wed, 17 Apr 2019 12:02:06 +0000 (14:02 +0200)]
update: ensure tasks are executed on an upgraded mon

These tasks must be run from a monitor which is upgraded otherwise it
might fail.
See: https://tracker.ceph.com/issues/39355

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoupdate: ensure ceph command returns 0
Guillaume Abrioux [Wed, 17 Apr 2019 11:57:29 +0000 (13:57 +0200)]
update: ensure ceph command returns 0

these commands could return something else than 0.
Let's ensure all retries have been done before actually failing.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoupdate: set osd flags before upgrading any mon
Guillaume Abrioux [Wed, 17 Apr 2019 06:47:25 +0000 (08:47 +0200)]
update: set osd flags before upgrading any mon

Typical error:

```
failed: [mon0 -> mon2] (item=noout) => changed=true
  cmd:
  - ceph
  - --cluster
  - ceph
  - osd
  - set
  - noout
  delta: '0:00:00.293756'
  end: '2019-04-17 06:31:57.552386'
  item: noout
  msg: non-zero return code
  rc: 1
  start: '2019-04-17 06:31:57.258630'
  stderr: |-
    Traceback (most recent call last):
      File "/bin/ceph", line 1222, in <module>
        retval = main()
      File "/bin/ceph", line 1146, in main
        sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs
        cmd['sig'] = parse_funcsig(cmd['sig'])
      File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig
        raise JsonFormat(s)
    ceph_argparse.JsonFormat: unknown type CephBool
  stderr_lines:
  - 'Traceback (most recent call last):'
  - '  File "/bin/ceph", line 1222, in <module>'
  - '    retval = main()'
  - '  File "/bin/ceph", line 1146, in main'
  - '    sigdict = parse_json_funcsigs(outbuf.decode(''utf-8''), ''cli'')'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 788, in parse_json_funcsigs'
  - '    cmd[''sig''] = parse_funcsig(cmd[''sig''])'
  - '  File "/usr/lib/python2.7/site-packages/ceph_argparse.py", line 728, in parse_funcsig'
  - '    raise JsonFormat(s)'
  - 'ceph_argparse.JsonFormat: unknown type CephBool'
  stdout: ''
  stdout_lines: <omitted>
```

Having mixed versions of monitors seems to cause this error.
Moving these tasks before any monitor gets upgraded seems to be enough
to get around this issue.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoupdate: refact msgr2 migration
Guillaume Abrioux [Tue, 16 Apr 2019 08:31:44 +0000 (10:31 +0200)]
update: refact msgr2 migration

this commit refact the msgr2 protocol introduction.

If it's a fresh install, let's go with v2 only.
If we upgrade to nautilus, we should go with v2+v1 syntax to ensure
nothing breaks.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorolling_update: ceph commands should use --cluster
Andrew Schoen [Thu, 28 Mar 2019 21:05:09 +0000 (16:05 -0500)]
rolling_update: ceph commands should use --cluster

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agorolling_update: set num_osds to the number of running osds
Andrew Schoen [Thu, 28 Mar 2019 19:34:48 +0000 (14:34 -0500)]
rolling_update: set num_osds to the number of running osds

We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agoceph-osd: do not run lvm batch tasks during update
Andrew Schoen [Thu, 28 Mar 2019 19:02:54 +0000 (14:02 -0500)]
ceph-osd: do not run lvm batch tasks during update

When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agotests: adds the migrate_ceph_disk_to_ceph_volume scenario
Andrew Schoen [Wed, 27 Mar 2019 19:36:51 +0000 (14:36 -0500)]
tests: adds the migrate_ceph_disk_to_ceph_volume scenario

This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agorolling_update: migrate ceph-disk osds to ceph-volume
Andrew Schoen [Tue, 19 Mar 2019 20:08:32 +0000 (15:08 -0500)]
rolling_update: migrate ceph-disk osds to ceph-volume

When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
6 years agoceph-iscsi-gw: Remove library directory
Dimitri Savineau [Wed, 17 Apr 2019 15:37:03 +0000 (11:37 -0400)]
ceph-iscsi-gw: Remove library directory

The library directory that contain the custom ceph modules in present
in the ceph-ansible root directory.
All igw_* mocules are already present there so we don't need the one
present in roles/ceph-iscsi-gw/library.
Also remove the associated spec file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotest_osds: remove scenario leftover
Dimitri Savineau [Tue, 16 Apr 2019 20:23:51 +0000 (16:23 -0400)]
test_osds: remove scenario leftover

Since there's only only scenario available we don't need lvm_scenario
and no_lvm_scenario.
Also add missing assert for ceph-volume tests.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoallow using ansible 2.8
Dimitri Savineau [Wed, 17 Apr 2019 14:22:59 +0000 (10:22 -0400)]
allow using ansible 2.8

Currently we only support ansible 2.7
We plan to use 2.8 when it will be release so we have to support both
2.7 and 2.8.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1700548
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests/functional/setup: change mount options
Dimitri Savineau [Fri, 12 Apr 2019 14:46:20 +0000 (10:46 -0400)]
tests/functional/setup: change mount options

In the CI jobs we can change the mount options of the main partition
to avoid extra operations on disk.
Adding jmespath to tests/requirements.txt due to the json_query
filter usage.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotest_mons: test mon listening on port 3300
Dimitri Savineau [Tue, 16 Apr 2019 20:52:42 +0000 (16:52 -0400)]
test_mons: test mon listening on port 3300

Since nautilus and msgr2 the monitors also bind on port 3300 in
addition of 6789.
This patch updates test_mons to reflect that change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodefaults: refact package dependencies installation.
Guillaume Abrioux [Tue, 16 Apr 2019 07:58:52 +0000 (09:58 +0200)]
defaults: refact package dependencies installation.

Because 5c98e361df5241fbfa5bd0a2ae1317219b7e1244 could be seen as a non
backward compatible change this commit reverts it and bring back package
dependencies installation support.
Let's just modify the default value instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agodefaults: remove some package dependencies
Guillaume Abrioux [Mon, 15 Apr 2019 14:38:50 +0000 (16:38 +0200)]
defaults: remove some package dependencies

These packages aren't needed anymore.
They were needed for ceph-init-detect buti as of ceph-init-detect doesn't exist
anymore.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1683885
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoallow adding a monitor to a deployed cluster
Rishabh Dave [Thu, 8 Nov 2018 13:47:51 +0000 (08:47 -0500)]
allow adding a monitor to a deployed cluster

Add a playbook that deploys a new monitor on a new node, adds that node
to the Ceph cluster and the monitor to the quorum and updates the ceph
configuration file on OSD nodes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agocheck if mon daemon is installed before restarting it
Rishabh Dave [Sat, 6 Apr 2019 06:15:31 +0000 (02:15 -0400)]
check if mon daemon is installed before restarting it

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agomon: check if an initial monitor keyring already exists
Guillaume Abrioux [Wed, 30 Jan 2019 09:11:26 +0000 (10:11 +0100)]
mon: check if an initial monitor keyring already exists

When adding a new monitor, we must reuse the existing initial monitor
keyring. Otherwise, the new monitor will issue its 'mkfs' with a new
monitor keyring and it will result with a mismatch between them. The
new monitor will be unable to join the quorum in the end.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Rishabh Dave <ridave@redhat.com>
6 years agopurge-cluster: remove python-ceph-argparse package
Dimitri Savineau [Fri, 12 Apr 2019 19:30:35 +0000 (15:30 -0400)]
purge-cluster: remove python-ceph-argparse package

When using purge-cluster playbook with nautilus, there's still the
python-ceph-argparse package installed on the host preventing to
reinstall a ceph cluster with a different version (like luminous or
mimic)

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agodocs: Update ceph.conf supported section
Dimitri Savineau [Thu, 11 Apr 2019 21:09:04 +0000 (17:09 -0400)]
docs: Update ceph.conf supported section

[rgw] isn't a valide section.
[client.rgw.{instance_name] should be used instead.

Resolves: #3841

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoswitch-from-non-containerized: stop all osds
Dimitri Savineau [Thu, 11 Apr 2019 20:20:41 +0000 (16:20 -0400)]
switch-from-non-containerized: stop all osds

e6bfb84 introduced a regression in the switch from non containerized
to container deployment.
We need to stop all previous OSDs services. We just don't need the
ceph-disk pattern in the regex.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agopurge: remove references to ceph-disk
Guillaume Abrioux [Thu, 11 Apr 2019 15:03:44 +0000 (17:03 +0200)]
purge: remove references to ceph-disk

as of stable-4.0, ceph-disk is no longer supported.
These tasks aren't needed anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoshrink-osd: remove legacy playbook
Guillaume Abrioux [Thu, 11 Apr 2019 15:01:39 +0000 (17:01 +0200)]
shrink-osd: remove legacy playbook

as of stable-4.0, ceph-disk is no longer supported.
Let's remove this legacy version of the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoswitch_to_containers: remove ceph-disk references
Guillaume Abrioux [Thu, 11 Apr 2019 15:00:58 +0000 (17:00 +0200)]
switch_to_containers: remove ceph-disk references

as of stable-4.0, ceph-disk is no longer supported.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove legacy file
Guillaume Abrioux [Thu, 11 Apr 2019 14:51:03 +0000 (16:51 +0200)]
osd: remove legacy file

this file is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: pass osd_scenario value to lvm_setup.yml
Guillaume Abrioux [Thu, 11 Apr 2019 15:18:02 +0000 (17:18 +0200)]
tests: pass osd_scenario value to lvm_setup.yml

we must pass the value of osd_scenario from the stable-3.2 branch which
is used for the initial deployment.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: remove test_journal_collocation.py in OSD testing
Guillaume Abrioux [Thu, 11 Apr 2019 12:57:56 +0000 (14:57 +0200)]
tests: remove test_journal_collocation.py in OSD testing

this test is related to ceph-disk which is dropped as of stable-4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoresync sample file
Guillaume Abrioux [Thu, 11 Apr 2019 08:13:17 +0000 (10:13 +0200)]
resync sample file

d17b1b48b6 introduced a change that hasn't been reported in sample files

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove ceph-disk scenarios files
Guillaume Abrioux [Thu, 11 Apr 2019 08:09:31 +0000 (10:09 +0200)]
osd: remove ceph-disk scenarios files

these files aren't needed anymore since we only use lvm scenario.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove dedicated_devices variable
Guillaume Abrioux [Thu, 11 Apr 2019 08:08:22 +0000 (10:08 +0200)]
osd: remove dedicated_devices variable

This variable was related to ceph-disk scenarios.
Since we are entirely dropping ceph-disk support as of stable-4.0, let's
remove this variable.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove variable osd_scenario
Guillaume Abrioux [Thu, 11 Apr 2019 08:01:15 +0000 (10:01 +0200)]
osd: remove variable osd_scenario

As of stable-4.0, the only valid scenario is `lvm`.
Thus, this makes this variable useless.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosd: remove legacy file
Guillaume Abrioux [Wed, 10 Apr 2019 11:33:57 +0000 (13:33 +0200)]
osd: remove legacy file

ceph_disk_cli_options_facts.yml is not used anymore, let's remove it.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agovalidate: only check device when they are devices
Sébastien Han [Fri, 12 Oct 2018 16:32:40 +0000 (18:32 +0200)]
validate: only check device when they are devices

We only validate the devices that are passed if there is a list of
devices to validate.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoplugin: validate.py do not check osd_scenario
Sébastien Han [Thu, 11 Oct 2018 16:01:10 +0000 (18:01 +0200)]
plugin: validate.py do not check osd_scenario

osd_scenario now defaults to lvm and should not be changed. So we don't
need to test it.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoplugin: validate lint
Sébastien Han [Thu, 11 Oct 2018 15:59:31 +0000 (17:59 +0200)]
plugin: validate lint

Make python linter happy.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agodoc: update osd scenario
Sébastien Han [Wed, 10 Oct 2018 19:38:27 +0000 (15:38 -0400)]
doc: update osd scenario

This commits adds documentation for the lvm scenario and the deprecation
of collocated and non-collocated scenario.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoosd: default osd_scenario to lvm
Sébastien Han [Wed, 10 Oct 2018 19:17:38 +0000 (15:17 -0400)]
osd: default osd_scenario to lvm

osd_scenario has become obsolete and defaults to lvm. With lvm there is
no such things has collocated and non-collocated.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agovalidate: print a message for old scenarios
Sébastien Han [Wed, 10 Oct 2018 19:16:43 +0000 (15:16 -0400)]
validate: print a message for old scenarios

ceph-disk is not supported anymore, so all the newly created OSDs will
be configured using ceph-volume.

Signed-off-by: Sébastien Han <seb@redhat.com>
6 years agoosd: remove ceph-disk support
Sébastien Han [Tue, 2 Oct 2018 21:54:57 +0000 (23:54 +0200)]
osd: remove ceph-disk support

We don't support the preparation of OSD with ceph-disk. ceph-volume is
only supported. However, the start operation of OSD is still supported.
So let's say you change a config option, the handlers will be able to
restart all the OSDs via their respective systemd unit files.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: Add debug to ceph-override.json
Dimitri Savineau [Tue, 9 Apr 2019 16:20:35 +0000 (12:20 -0400)]
tests: Add debug to ceph-override.json

It's usefull to have logs in debug mode enabled in order to have
more information for developpers.
Also reindent to json file.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests/functional: use ceph-override.json symlink
Dimitri Savineau [Tue, 9 Apr 2019 16:18:43 +0000 (12:18 -0400)]
tests/functional: use ceph-override.json symlink

We don't need to have multiple ceph-override.json copies. We
currently already have symlink to all_daemons/ceph-override.json so
we can do it for all scenarios.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-mds: Set application pool to cephfs
Dimitri Savineau [Thu, 4 Apr 2019 13:33:05 +0000 (09:33 -0400)]
ceph-mds: Set application pool to cephfs

We don't need to use the cephfs variable for the application pool
name because it's always cephfs.
If the cephfs variable is set to something else than the default
value it will break the appplication pool task.

Resolves: #3790

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoupdate: fix undefined error when no mgr group is declared
Guillaume Abrioux [Thu, 11 Apr 2019 07:16:28 +0000 (09:16 +0200)]
update: fix undefined error when no mgr group is declared

if mgr group isn't defined in inventory, that task will fail with
undefined error.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoosds: allow passing devices by path
Guillaume Abrioux [Wed, 10 Apr 2019 15:16:21 +0000 (17:16 +0200)]
osds: allow passing devices by path

ceph-volume didn't work when the devices where passed by path.
Since it now support it, let's allow this feature in ceph-ansible

Closes: #3812
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomon: remove useless delegate_to
Guillaume Abrioux [Tue, 9 Apr 2019 15:38:01 +0000 (17:38 +0200)]
mon: remove useless delegate_to

Let's use a condition to run this task only on the first mon.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agorgw: change default frontend on nautilus
Dimitri Savineau [Tue, 26 Feb 2019 14:16:37 +0000 (09:16 -0500)]
rgw: change default frontend on nautilus

As discussed in ceph/ceph#26599, beast is now the default frontend
for rados gateway with nautilus release.
Add rgw_thread_pool_size variable with 512 as default value and keep
backward compatibility with num_threads option when using civetweb.
Update radosgw_civetweb_num_threads to reflect rgw_thread_pool_size
change.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agocontainer-common: Enable docker on boot for ubuntu
Dimitri Savineau [Mon, 1 Apr 2019 16:12:52 +0000 (12:12 -0400)]
container-common: Enable docker on boot for ubuntu

docker daemon is automatically started during package installation
but the service isn't enabled on boot.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agorolling_update: Remove ceph aliases
Dimitri Savineau [Fri, 15 Mar 2019 14:18:48 +0000 (10:18 -0400)]
rolling_update: Remove ceph aliases

ceph aliases have been introduced in stable-3.2 during the ceph
deployment. On master this has been removed but we don't handle
this removal in the upgrade from stable-3.2 to master via the
rolling_update playbook.
Also remove the task from purge-docker-cluster missing from
d9e7835

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoUCA: Uncomment UCA variables in defaults, fix consequent breakage
Matthew Vernon [Wed, 27 Mar 2019 13:34:47 +0000 (13:34 +0000)]
UCA: Uncomment UCA variables in defaults, fix consequent breakage

The Ubuntu Cloud Archive-related (UCA) defaults in
roles/ceph-defaults/defaults/main.yml were commented out, which means
if you set `ceph_repository` to "uca", you get undefined variable
errors, e.g.

```
The task includes an option with an undefined variable. The error was: 'ceph_stable_repo_uca' is undefined

The error appears to have been in '/nfs/users/nfs_m/mv3/software/ceph-ansible/roles/ceph-common/tasks/installs/debian_uca_repository.yml': line 6, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

- name: add ubuntu cloud archive repository
  ^ here

```

Unfortunately, uncommenting these results in some other breakage,
because further roles were written that use the fact of
`ceph_stable_release_uca` being defined as a proxy for "we're using
UCA", so try and install packages from the bionic-updates/queens
release, for example, which doesn't work. So there are a few `apt` tasks
that need modifying to not use `ceph_stable_release_uca` unless
`ceph_origin` is `repository` and `ceph_repository` is `uca`.

Closes: #3475
Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
6 years agoadd_mdss: change number of processes for testing to 8
Rishabh Dave [Tue, 9 Apr 2019 09:15:57 +0000 (14:45 +0530)]
add_mdss: change number of processes for testing to 8

Run tests with 8 processes/cores since other scenarios do the same.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agotests: update ceph_release_num in conftest.py
Guillaume Abrioux [Tue, 9 Apr 2019 07:49:03 +0000 (09:49 +0200)]
tests: update ceph_release_num in conftest.py

add nautilus and octopus releases.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoceph-facts: use last ipv6 address for mon/rgw
Dimitri Savineau [Fri, 5 Apr 2019 19:04:45 +0000 (15:04 -0400)]
ceph-facts: use last ipv6 address for mon/rgw

When using monitor_address_block or radosgw_address_block variables
to configure the mon/rgw address we're getting the first ip address
from the ansible facts present in that cidr.
When there's VIP on that network the first filter could return the
wrong value.
This seems to affect only IPv6 setup because the VIP addresses are
added to the ansible facts at the beginning of the list. This is the
opposite (at the end) when using IPv4.
This causes the mon/rgw processes to bind on the VIP address.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680155
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agoceph-rgw: Fix bad paths which depend on the clustername
François Lafont [Sat, 6 Apr 2019 09:44:03 +0000 (11:44 +0200)]
ceph-rgw: Fix bad paths which depend on the clustername

The path of the RGW environment file (in the /var/lib/ceph/radosgw/
directory) depends on the Ceph clustername. It was not taken into
account in the Ansible role `ceph-rgw`.

Signed-off-by: flaf <francois.lafont.1978@gmail.com>
6 years agomgr: manage mgr modules when mgr and mon are collocated
Guillaume Abrioux [Mon, 8 Apr 2019 11:56:01 +0000 (13:56 +0200)]
mgr: manage mgr modules when mgr and mon are collocated

When mgrs are implicitly collocated on monitors (no mgrs in mgrs group).
That include was skipped because of this condition :

`inventory_hostname == groups[mgr_group_name][0]`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agomgr: wait for all mgr to be available
Guillaume Abrioux [Mon, 8 Apr 2019 11:34:59 +0000 (13:34 +0200)]
mgr: wait for all mgr to be available

before managing mgr modules, we must ensure all mgr are available
otherwise we can hit failure like following:

```
stdout:Error ENOENT: all mgr daemons do not support module 'restful', pass --force to force enablement
```

It happens because all mgr are not yet available when trying to manage
with mgr modules.

Closes: #3100
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: run switch_to_containers against nautilus
Guillaume Abrioux [Thu, 4 Apr 2019 12:41:46 +0000 (14:41 +0200)]
tests: run switch_to_containers against nautilus

even on master, force the release to be nautilus.
this scenarios is failing because at multiple times this scenario is
actually downgrading the ceph version.
It might happen that the latest-master image is older than what was
deployed in the first step of the scenario (the RPM deployment).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 41b0fa15ddf2a45402d17faa3bd1e817692fc1d2)

6 years agoallow adding a MDS to already deployed cluster
Rishabh Dave [Tue, 12 Feb 2019 03:15:44 +0000 (08:45 +0530)]
allow adding a MDS to already deployed cluster

Add a tox scenario that adds an new MDS node as a part of already
deployed Ceph cluster and deploys MDS there.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agoadd-osds: don't hardcode group names
Rishabh Dave [Thu, 28 Mar 2019 07:45:53 +0000 (13:15 +0530)]
add-osds: don't hardcode group names

Instead of hardcoding group names, import ceph-defaults earlier. Also,
rectify a minor mistake in vagrant_varaibles.yml for containerized
version of add_osds.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
6 years agorgw multisite: add more than 1 rgw to the master or secondary zone
Ali Maredia [Thu, 31 Jan 2019 20:43:21 +0000 (20:43 +0000)]
rgw multisite: add more than 1 rgw to the master or secondary zone

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1664869
Signed-off-by: Ali Maredia <amaredia@redhat.com>
6 years agoradosgw: Raise cpu limit to 8
Dimitri Savineau [Tue, 2 Apr 2019 14:39:42 +0000 (10:39 -0400)]
radosgw: Raise cpu limit to 8

In containerized deployment the default radosgw quota is too low
for production environment.
This is causing performance degradation compared to bare-metal.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1680171
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
6 years agotests: add back testinfra testing
Guillaume Abrioux [Thu, 4 Apr 2019 02:09:12 +0000 (04:09 +0200)]
tests: add back testinfra testing

136bfe0 removed testinfra testing on all scenario excepted all_daemons

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agotests: pin pytest-xdist to 1.27.0
Guillaume Abrioux [Thu, 4 Apr 2019 02:01:01 +0000 (04:01 +0200)]
tests: pin pytest-xdist to 1.27.0

looks like newer version of pytest-xdist requires pytest>=4.4.0

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
6 years agoCheck ceph_health_raw.stdout value as string during mon bootstrap
fpantano [Wed, 3 Apr 2019 16:35:10 +0000 (18:35 +0200)]
Check ceph_health_raw.stdout value as string during mon bootstrap

According to rdo testing https://review.rdoproject.org/r/#/c/18721
a check on the output of the ceph_health value is added to
allow the playbook to make several attempts (according to the
retry/delay variables) when waiting the cluster quorum or
when the container bootstrap is not ended.
It avoids the failure of the command execution when it doesn't
receive a valid json object to decode (because cluster is too
slow to boostrap compared to ceph-ansible task execution).

Signed-off-by: fpantano <fpantano@redhat.com>
6 years agotests: retry to fire up VMs on vagrant failure
Guillaume Abrioux [Tue, 2 Apr 2019 12:53:19 +0000 (14:53 +0200)]
tests: retry to fire up VMs on vagrant failure

Add a script to retry several times to fire up VMs to avoid vagrant
failures.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Co-authored-by: Andrew Schoen <aschoen@redhat.com>