]> git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
ceph-ansible.git
5 years agoceph-dashboard: copy TLS cert/key on monitor
Dimitri Savineau [Fri, 17 Jul 2020 14:38:02 +0000 (10:38 -0400)]
ceph-dashboard: copy TLS cert/key on monitor

The ceph-dashboard role is executed on the mgr nodes so the TLS cert/key
files are copied to those nodes.
But we are running importing the cert/key files into the ceph
configuration on the monitor.

Closes: #5557
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm: add playbook
Dimitri Savineau [Fri, 10 Jul 2020 21:52:38 +0000 (17:52 -0400)]
cephadm: add playbook

This adds a new playbook for deploying ceph via cephadm.

This also adds a new dedicated tox file for CI purpose.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: delegate task for orch apply
Dimitri Savineau [Wed, 15 Jul 2020 22:25:57 +0000 (18:25 -0400)]
cephadm-adopt: delegate task for orch apply

This is a partial revert of b38019e because we don't want to execute
the whole play on the monitor otherwise if we have some empty group
like rgws or mdss then the orchestrator commands will still be
executed.
Instead we should keep the real target group name at play level and
delegate the orchestator commands to the monitor. The whole play
will be skipped is the group is empty.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: inform users about cephadm
Dimitri Savineau [Wed, 15 Jul 2020 19:21:25 +0000 (15:21 -0400)]
cephadm-adopt: inform users about cephadm

Print a message at the end of the playbook to inform users that they
don't have to user ceph-ansible playbooks anymore as everything else
need to be done via cephadm (day 2 operation).

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: refresh the service/daemon list
Dimitri Savineau [Wed, 15 Jul 2020 19:15:06 +0000 (15:15 -0400)]
cephadm-adopt: refresh the service/daemon list

When reporting the orchestrator service/daemon list at the end of the
playbook, we can use the --refresh option otherwise we could have
an outdated output.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRevert "cephadm-adopt: remove the cephadm script"
Dimitri Savineau [Wed, 15 Jul 2020 19:14:23 +0000 (15:14 -0400)]
Revert "cephadm-adopt: remove the cephadm script"

This reverts commit c3bbc6b13cee5e566b277f3146e9e6bc4cec2f52.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_key: fix bug in 'info' feature
Guillaume Abrioux [Thu, 9 Jul 2020 14:24:15 +0000 (16:24 +0200)]
ceph_key: fix bug in 'info' feature

Fix 'info' feature from ceph_key.py module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocephadm-adopt: wait for monitor in quorum
Dimitri Savineau [Fri, 10 Jul 2020 21:41:32 +0000 (17:41 -0400)]
cephadm-adopt: wait for monitor in quorum

After adopting a monitor we need to wait that monitor to join back
the quorum before moving to the next node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: add osd flags during adoption
Dimitri Savineau [Fri, 10 Jul 2020 19:24:24 +0000 (15:24 -0400)]
cephadm-adopt: add osd flags during adoption

Like rolling_update or switch2container playbooks, we need to set/unset
some osd flags before and after the OSD daemons adoption.
This also adds a task for waiting for clean pgs at then of an OSd node.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: add iscsi support
Dimitri Savineau [Fri, 10 Jul 2020 18:59:06 +0000 (14:59 -0400)]
cephadm-adopt: add iscsi support

The iSCSI support has been added recently in cephadm.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: remove the cephadm script
Dimitri Savineau [Fri, 10 Jul 2020 18:45:51 +0000 (14:45 -0400)]
cephadm-adopt: remove the cephadm script

At the end of the process when don't need the cephadm script.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: show orchestrator status
Dimitri Savineau [Fri, 10 Jul 2020 18:13:15 +0000 (14:13 -0400)]
cephadm-adopt: show orchestrator status

At the end of the playbook we can show the orchestrator status like
we do with the ceph status in initial deployment.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: use placement parameter
Dimitri Savineau [Fri, 10 Jul 2020 14:42:02 +0000 (10:42 -0400)]
cephadm-adopt: use placement parameter

It's better to use the --placement parameter when using ceph orch apply
commands to avoid confusion in the parameters.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: use custom dashboard images
Dimitri Savineau [Thu, 9 Jul 2020 22:38:17 +0000 (18:38 -0400)]
cephadm-adopt: use custom dashboard images

cephadm uses default value for dashboard container images which need to
be customized by ansible for upstream or downstream purpose.
This feature wasn't present when cephadm-adopt.yml has been designed.
Also set the container_image_base variable for upgrade purpose.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: run orch apply from monitors
Dimitri Savineau [Thu, 9 Jul 2020 22:28:49 +0000 (18:28 -0400)]
cephadm-adopt: run orch apply from monitors

It looks like we can't run the ceph orch apply commands on nodes other
than monitors even if it used to work in the past.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: don't fail on systemd reset-failed
Dimitri Savineau [Thu, 9 Jul 2020 15:23:33 +0000 (11:23 -0400)]
cephadm-adopt: don't fail on systemd reset-failed

If the systemd service exists successfully then we don't need to reset
the failed state.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agocephadm-adopt: copy client.admin keyring
Dimitri Savineau [Thu, 9 Jul 2020 15:19:41 +0000 (11:19 -0400)]
cephadm-adopt: copy client.admin keyring

The ceph config assimilate-conf command requires the client.admin
keyring which isn't present on all nodes most of the time.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotox: add cephadm_adopt scenario
Dimitri Savineau [Mon, 6 Jul 2020 18:27:50 +0000 (14:27 -0400)]
tox: add cephadm_adopt scenario

This adds an optional cephadm_adopt scenario which is based on
all_daemons.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoplay: followup on cc0d969
Guillaume Abrioux [Thu, 9 Jul 2020 11:53:28 +0000 (13:53 +0200)]
play: followup on cc0d969

Remove two other pattern 'iscsigws' in main playbook that have been
missed in cc0d9697c554e459af11965fc2710e42abef4e13

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agorgw: set container memory limit to 4g
Guillaume Abrioux [Thu, 9 Jul 2020 11:07:32 +0000 (13:07 +0200)]
rgw: set container memory limit to 4g

This commit changes the container memory limit for rgw daemons.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: refact `ceph_uid` fact
Guillaume Abrioux [Wed, 8 Jul 2020 13:49:47 +0000 (15:49 +0200)]
facts: refact `ceph_uid` fact

There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_volume: fix regression
Guillaume Abrioux [Tue, 7 Jul 2020 23:04:10 +0000 (01:04 +0200)]
ceph_volume: fix regression

do not skip zapping if osd_fsid is passed

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: add docker hub authentication in jobs
Guillaume Abrioux [Tue, 7 Jul 2020 15:11:27 +0000 (17:11 +0200)]
tests: add docker hub authentication in jobs

This commit makes all jobs authenticating to docker hub in order to
avoid the rate limit.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoplay: remove backward compatibility group name
Guillaume Abrioux [Wed, 8 Jul 2020 11:51:14 +0000 (13:51 +0200)]
play: remove backward compatibility group name

It's time to remove this old group name.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-nfs: change ganesha devel source
Dimitri Savineau [Thu, 2 Jul 2020 19:23:09 +0000 (15:23 -0400)]
ceph-nfs: change ganesha devel source

The download.nfs-ganesha.org source for nfs-ganesha on CentOS isn't
available anymore.
Let's switch back to shaman since we have builds available now.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: remove nfs_ganesha_stable_branch variable
Dimitri Savineau [Mon, 6 Jul 2020 14:45:21 +0000 (10:45 -0400)]
tests: remove nfs_ganesha_stable_branch variable

We don't need to override this variable in the group_vars but use the
default value instead.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: update nfs-ganesha to V3.3-stable
Guillaume Abrioux [Sun, 5 Jul 2020 14:54:36 +0000 (16:54 +0200)]
tests: update nfs-ganesha to V3.3-stable

not really needed in master, commit intended to be backported in octopus
branch.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodoc: add a note about deprecated branches
Guillaume Abrioux [Fri, 3 Jul 2020 05:14:57 +0000 (07:14 +0200)]
doc: add a note about deprecated branches

This commit adds a note about `stable-3.0` `stable-3.1` branches which
are deprecated and not maintained anymore.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodoc: add a note about containerized deployments
Guillaume Abrioux [Fri, 3 Jul 2020 04:58:49 +0000 (06:58 +0200)]
doc: add a note about containerized deployments

This commit updates the documentation to add a note about containerized
deployments.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodoc: fix warning treated as an error
Guillaume Abrioux [Fri, 3 Jul 2020 07:14:13 +0000 (09:14 +0200)]
doc: fix warning treated as an error

Typical error:

```
Warning, treated as error:
/home/jenkins-build/build/workspace/ceph-ansible-docs-pull-requests/docs/source/day-2/upgrade.rst:2:Title underline too short.
```

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-defaults: update nfs-ganesha to 3.3
Dimitri Savineau [Thu, 2 Jul 2020 18:29:22 +0000 (14:29 -0400)]
ceph-defaults: update nfs-ganesha to 3.3

nfs-ganesha 3.3 is the latest 3.x release available for octopus so we
should update to this version.

https://download.ceph.com/nfs-ganesha/rpm-V3.3-stable/octopus

This will also match the version used in RHCS 5.

Ceph container already uses that version too.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agofacts: explicitly disable facter and ohai
Dimitri Savineau [Tue, 30 Jun 2020 14:13:42 +0000 (10:13 -0400)]
facts: explicitly disable facter and ohai

By default, ansible gathers facts from facter and ohai if installed on
the remote nodes, given we don't need them, let's exclude these facts
from our facts gathering

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoradosgw: remove INST_PORT environment variable
Dimitri Savineau [Thu, 2 Jul 2020 14:47:45 +0000 (10:47 -0400)]
radosgw: remove INST_PORT environment variable

This variable isn't consumed by the container so we can remove it.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw: fix multi instances scaleout
Guillaume Abrioux [Wed, 1 Jul 2020 08:47:45 +0000 (10:47 +0200)]
rgw: fix multi instances scaleout

When rgw and osd are collocated, the current workflow prevents from
scaling out the radosgw_num_instances parameter when rerunning the
playbook.

The environment file used in the rgw systemd template is rendered when
executing the `ceph-rgw` role but during a new run of the playbook (in
order to scale out rgw instances), handlers are triggered from `ceph-osd`
role which is run before `ceph-rgw`, therefore it tries to start the new
rgw daemon whereas its corresponding environment file hasn't been
rendered yet and fails like following:

```
ceph-radosgw@rgw.ceph4osd3.rgw1.service failed to run 'start-pre' task: No such file or directory
```

This commit moves the tasks generating this file in `ceph-config` role
so it is generated early.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851906
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: enforce pytest-rerunfailures version
Guillaume Abrioux [Wed, 1 Jul 2020 16:22:00 +0000 (18:22 +0200)]
tests: enforce pytest-rerunfailures version

This commit enforces the pytest-rerunfailures installed so it's <9.0

This is to avoid the following error:

```
ERROR: pytest-rerunfailures 9.0 has requirement pytest>=5.0, but you'll have pytest 4.6.11 which is incompatible.
```

latest version of pytest-rerunfailures isn't compatible with the version
of pytest we are using.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agovagrant: update centos image to 8.2
Dimitri Savineau [Tue, 16 Jun 2020 19:31:08 +0000 (15:31 -0400)]
vagrant: update centos image to 8.2

CentOS 8.2 (2004) has been relesed so we should switch to this image
when using vagrant.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-volume.py: add support for batch refactored code
Jan Fajerski [Fri, 26 Jun 2020 08:29:24 +0000 (10:29 +0200)]
ceph-volume.py: add support for batch refactored code

See https://github.com/ceph/ceph/pull/34740 for the batch changes.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
5 years agoceph-common: remove copr and sepia repositories
Dimitri Savineau [Wed, 17 Jun 2020 18:15:32 +0000 (14:15 -0400)]
ceph-common: remove copr and sepia repositories

All EL8 dependencies are now present on EPEL 8 so we don't need the
additional repositories that were only a temporary solution.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: add any_errors_fatal
Guillaume Abrioux [Mon, 29 Jun 2020 14:52:28 +0000 (16:52 +0200)]
rolling_update: add any_errors_fatal

If a failure occurs in ceph-validate, the upgrade playbook keeps running
where we expect it to fail.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoAdd container settings for Ubuntu 20 (the same as Ubuntu 18)
George Shuklin [Mon, 29 Jun 2020 13:01:07 +0000 (16:01 +0300)]
Add container settings for Ubuntu 20 (the same as Ubuntu 18)

Signed-off-by: George Shuklin <george.shuklin@gmail.com>
5 years agoAdd playbook for converting cluster to cephadm 5475/head
Dimitri Savineau [Thu, 9 Apr 2020 21:50:54 +0000 (17:50 -0400)]
Add playbook for converting cluster to cephadm

The commit adds a new playbook for converting an existing ceph cluster
deployed by ceph-ansible to the cephadm orchestrator.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: configure mgr backend before restart
Dimitri Savineau [Fri, 26 Jun 2020 17:28:04 +0000 (13:28 -0400)]
dashboard: configure mgr backend before restart

We need to set the mgr dashboard server ip address before restarting the
dashboard module otherwise we can try to bind the dashboard module on an
already used address.
We already do this configuration for the dashboard port value and ssl
setup so we should do the same for server address too.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1851455
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoAnsible tests are not filters
Jonathan Rosser [Thu, 18 Jun 2020 12:39:26 +0000 (13:39 +0100)]
Ansible tests are not filters

The use of "| success" and "| changed" are not valid syntax for modern
ansible releases.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
5 years agoInstall python routes package as a dependancy rather than directly
Jonathan Rosser [Thu, 18 Jun 2020 15:40:52 +0000 (16:40 +0100)]
Install python routes package as a dependancy rather than directly

This is now a dependancy of ceph-mgr so will be installed automatically
and does not need a specific task.

This change means that ceph-mgr installs correctly on Ubuntu Focal where
the python3-routes package is necessary.

Signed-off-by: Jonathan Rosser <jonathan.rosser@rd.bbc.co.uk>
5 years agodashboard: copy self-signed generated crt to mons
Guillaume Abrioux [Tue, 23 Jun 2020 09:11:06 +0000 (11:11 +0200)]
dashboard: copy self-signed generated crt to mons

This commit makes the playbook copying self-signed generated certificate
to monitors.
When mons and mgrs are deployed on dedicated nodes the playbook will
fail when trying to import certificate and key files since they are
generated on mgrs whereas we try to import them from a monitor.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846995
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agopodman: Add Type and PIDFile value to unit files
Dimitri Savineau [Mon, 22 Jun 2020 16:58:56 +0000 (12:58 -0400)]
podman: Add Type and PIDFile value to unit files

This changes the way we are running the podman containers via systemd.
They are now in dettached mode and Type/PIDFile set.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1834974
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph_volume: make zap function idempotent
Guillaume Abrioux [Fri, 19 Jun 2020 13:09:04 +0000 (15:09 +0200)]
ceph_volume: make zap function idempotent

This commit makes the zap function idempotent, especially when using
lvm_volumes variable.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1845668
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker: Add Requires on docker service
Dimitri Savineau [Mon, 22 Jun 2020 17:58:10 +0000 (13:58 -0400)]
docker: Add Requires on docker service

When using docker container engine then the systemd unit scripts only
use a dependency on the docker daemon via the After parameter.
But if docker is restarted on a live system then the ceph systemd units
should wait for the docker daemon to be fully restarted.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1846830
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocker2podman: make images pulling optional
Guillaume Abrioux [Mon, 22 Jun 2020 12:35:16 +0000 (14:35 +0200)]
docker2podman: make images pulling optional

This commit makes the images pulling skipped if podman isn't installed
on the machine.

In OSP context, the podman installation is done later in the workflow,
it means all `podman pull` commands will fail.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1849559
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotravis: use tests/requirements.txt 5967/head
Dimitri Savineau [Fri, 19 Jun 2020 22:04:55 +0000 (18:04 -0400)]
travis: use tests/requirements.txt

Explicitly install ansible-lint pytest pytest-cov via pip results of a
specific pytest version (4.3.1) which is not supported for pytest-cov
(2.10).
Because we are already defining a specific pytest version in the tests
requirements then we can install all the python dependencies from that
file and remove this from the pip install command.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorequirements: exclude ansible 2.9.10
Guillaume Abrioux [Fri, 19 Jun 2020 17:29:13 +0000 (19:29 +0200)]
requirements: exclude ansible 2.9.10

ansible 2.9.10 seems to have introduced a bug.

See https://github.com/ansible/ansible/issues/70168

This commit excludes this version from ceph-ansible requirements.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocs: Add upgrade operation.
Dimitri Savineau [Mon, 25 May 2020 13:44:12 +0000 (09:44 -0400)]
docs: Add upgrade operation.

This commit adds a chapter about the ceph upgrade process.

Closes: #5393
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-osd: remove ceph-osd-run.sh script
Dimitri Savineau [Tue, 9 Jun 2020 19:19:57 +0000 (15:19 -0400)]
ceph-osd: remove ceph-osd-run.sh script

Since we only have one scenario since nautilus then we can just move
the container start command from ceph-osd-run.sh to the systemd unit
service.
As a result, the ceph-osd-run.sh.j2 template and the
ceph_osd_docker_run_script_path variable are removed.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agolibrary/ceph_pool: set name parameter as required
Dimitri Savineau [Fri, 22 May 2020 20:11:20 +0000 (16:11 -0400)]
library/ceph_pool: set name parameter as required

The name parameter is required.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodebian/uca: remove the handler notification
Dimitri Savineau [Wed, 10 Jun 2020 14:32:53 +0000 (10:32 -0400)]
debian/uca: remove the handler notification

The "update apt cache" in the ceph-handler role was never called and the
handler trigger after adding the uca repository doesn't exist at all.
Instead of using a handler for that we can just set the update_cache
parameter to true like the other apt_repository tasks.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_containers: don't set noup flag
Guillaume Abrioux [Tue, 16 Jun 2020 15:43:13 +0000 (17:43 +0200)]
switch_to_containers: don't set noup flag

We shouldn't set this flag when running switch_to_containers playbook.
Otherwise the playbook fails waiting for pgs to be clean.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843569
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agolvm_setup: lookup device from inventory, default to /dev/sd* names
Jan Fajerski [Mon, 10 Feb 2020 09:09:14 +0000 (10:09 +0100)]
lvm_setup: lookup device from inventory, default to /dev/sd* names

This fixes a long standing fail in ceph-volumes lvm test suite.
Otherwise the default behaviour should not change.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
5 years agocontainer: inspect Id field instead of RepoDigests
Dimitri Savineau [Fri, 5 Jun 2020 20:42:20 +0000 (16:42 -0400)]
container: inspect Id field instead of RepoDigests

When a container image managed by podman isn't tag anymore then the
RepoDigests field when inspecting the image doesn't return any value.
This is different from docker workflow and it breaks the ceph-ansible
container upgrade when collocated multiple services and using a non
fix container tag (like latest or 4).

$ podman images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     957 MB
<none>                  <none>   011ee108bfc9   2 months ago   1.01 GB

$ podman inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:20cf789235e23ddaf38e109b391d1496bb88011239d16862c4c106d0e05fea9e"
$ podman inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
null

Because this field returns "null" then the ansible task trying to
determine this value is failing

-----------------------------
fatal: [foo]: FAILED! =>
  msg: |-
    The task includes an option with an undefined variable. The error
    was: None has no element 0

    The error appears to be in
    'roles/ceph-container-common/tasks/fetch_image.yml': line 137,
    column 3, but may be elsewhere in the file depending on the exact
    syntax problem.

    The offending line appears to be:

    - name: set_fact ceph_osd_image_repodigest_before_pulling
      ^ here
-----------------------------

We don't have this behaviour with docker.

$ docker images
REPOSITORY              TAG      IMAGE ID       CREATED        SIZE
docker.io/ceph/daemon   latest   680c9c0d38c3   8 days ago     928 MB
docker.io/ceph/daemon   <none>   011ee108bfc9   2 months ago   986 MB

$ docker inspect 680c9c0d38c3 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:45e6f28bb67c81b826acb64fad5c0da1cac3dffb41a88992fe4ca2be79575fa6"
$ docker inspect 011ee108bfc9 | jq .[0].RepoDigests[0]
"docker.io/ceph/daemon@sha256:b393a73309d72e43ca7d65cd3519036007947671e373eb59aa75a46185c52231"

Instead we should just get the Id field.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1844496
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_container: fix osd systemd regex
Dimitri Savineau [Thu, 4 Jun 2020 20:57:17 +0000 (16:57 -0400)]
switch_to_container: fix osd systemd regex

The systemd LOAD and ACTIVE fileds could have more than one space between
both values.
This update the systemd regex the same way we're using it in different
part of the code.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1843500
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgw multisite: add master zone endpoints to zonegroup
Ali Maredia [Fri, 5 Jun 2020 21:21:27 +0000 (21:21 +0000)]
rgw multisite: add master zone endpoints to zonegroup

We were only adding the endpoints to the master zone but not to the
zonegroup.
This patch fixes the issue.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1839228
Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agomergify: remove merge on skip ci
Dimitri Savineau [Tue, 9 Jun 2020 13:23:04 +0000 (09:23 -0400)]
mergify: remove merge on skip ci

This rule will probably never be applyied and at the moment this is
creating a cancelled job in the CI status.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorgwloadbalancer undefined index variable
Ansible Deployment User [Tue, 26 May 2020 11:18:03 +0000 (13:18 +0200)]
rgwloadbalancer undefined index variable

The vrrp_instances variable is using a loop with index but the index_var
wasn't defined.
As a result, the fact task was failing on this undefined index variable.

The task includes an option with an undefined variable. The error was:
'index' is undefined

Closes: #5395
Signed-off-by: Florian Faltermeier <florian.faltermeier@uibk.ac.at>
5 years agoceph-nfs: add stable noarch repository
Dimitri Savineau [Fri, 15 May 2020 15:20:08 +0000 (11:20 -0400)]
ceph-nfs: add stable noarch repository

When using the stable nfs ganesha repository, we need have both arch
and noarch repositories enabled.
Currently the noarch repository is missing which cause the non
containerized deployment to fail.

Closes: #5375
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoswitch_to_container: refact wait for pg check
Guillaume Abrioux [Fri, 15 May 2020 08:58:40 +0000 (10:58 +0200)]
switch_to_container: refact wait for pg check

There is no need to make this check with several steps.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: report coverage status for unittests
Guillaume Abrioux [Tue, 12 May 2020 18:13:35 +0000 (20:13 +0200)]
tests: report coverage status for unittests

This commit adds pytest-cov usage in unittests

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: add tests
Guillaume Abrioux [Tue, 12 May 2020 12:39:20 +0000 (14:39 +0200)]
ceph_pool: add tests

Add unit tests for ceph_pool module

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: support setting application at pool creation
Guillaume Abrioux [Tue, 12 May 2020 12:33:36 +0000 (14:33 +0200)]
ceph_pool: support setting application at pool creation

This commit adds the required changes in order to support
setting application pool at initial pool creation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph_pool: refact exec_commands()
Guillaume Abrioux [Tue, 12 May 2020 11:54:29 +0000 (13:54 +0200)]
ceph_pool: refact exec_commands()

We never multiple ceph command at a time, so there's no need to have this design.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests: update pools definitions
Guillaume Abrioux [Wed, 29 Apr 2020 23:23:20 +0000 (01:23 +0200)]
tests: update pools definitions

setting attributes with empty string is a bad user input.
Also, removing `rule_name` attribute when creating a code erasure pool.
(this rule isnt intended for code erasure pool type).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: introduce ceph_pool module calls
Guillaume Abrioux [Tue, 28 Apr 2020 16:08:59 +0000 (18:08 +0200)]
common: introduce ceph_pool module calls

This commits calls the `ceph_pool` module for creating ceph pools
everywhere it's needed in the playbook.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agolibrary: add ceph_pool module
Guillaume Abrioux [Sat, 4 Apr 2020 01:44:05 +0000 (03:44 +0200)]
library: add ceph_pool module

This commit adds a new module `ceph_pool`

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agocommon: fix target_size_ratio task enablement
Guillaume Abrioux [Thu, 14 May 2020 09:00:12 +0000 (11:00 +0200)]
common: fix target_size_ratio task enablement

The condition on this task is wrong, we have to check whether
`target_size_ratio` is set in the pool definition instead.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofacts: always set ceph_run_cmd and ceph_admin_command
Guillaume Abrioux [Thu, 14 May 2020 09:06:41 +0000 (11:06 +0200)]
facts: always set ceph_run_cmd and ceph_admin_command

always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotests/library: parametrize ceph_volume objecstore
Dimitri Savineau [Tue, 31 Mar 2020 20:51:55 +0000 (16:51 -0400)]
tests/library: parametrize ceph_volume objecstore

This adds the objectstore testing for both filestore and bluestore on
the ceph_volume module.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests/library: define container cmd once
Dimitri Savineau [Tue, 31 Mar 2020 20:31:40 +0000 (16:31 -0400)]
tests/library: define container cmd once

In containerized deployment, the ceph_volume module will always uses
the same container command prefix for all actions.
Instead of duplicate this code in all container tests we can define it
once.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: force using the more recent build
Guillaume Abrioux [Thu, 14 May 2020 14:24:59 +0000 (16:24 +0200)]
tests: force using the more recent build

We should use  `latest-master-devel` for switch_to_containers job.
Otherwise it might happen we actually downgrade the ceph version when
the image used is older than the rpm initially used for installing ceph.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotest: set sitepackages=false in tox
Guillaume Abrioux [Wed, 13 May 2020 15:49:07 +0000 (17:49 +0200)]
test: set sitepackages=false in tox

Otherwise it might try to use the system installed version of ansible
when there's one available.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agodocker2podman: manage dashboard nodes
Dimitri Savineau [Thu, 16 Apr 2020 16:17:12 +0000 (12:17 -0400)]
docker2podman: manage dashboard nodes

The dashboard nodes (alertmanager, grafana, node-exporter, and prometheus)
were not manage during the docker to podman migration.

This adds the systemd container template of those services to a dedicated
file (systemd.yml) in order to include it in the docker2podman playbook.

This also adds the dashboard container images pull from docker to podman.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1829389
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocker2podman: pull images from docker daemon
Dimitri Savineau [Thu, 16 Apr 2020 15:30:11 +0000 (11:30 -0400)]
docker2podman: pull images from docker daemon

The docker2podman playbook only installs the podman package and updates
the systemd units with the right container_binary value.

We never pull the container image so if one service is restarted then
the container image will be pulled first before the service can start
which could cause longer downstream.

To avoid to download the container image from internet again we can just
pull it from the local docker daemon.

The container_{binding,package,service}_name variables are removed
because they are only used in the ceph-container-engine role which
isn't call in this playbook.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agorolling_update: fix rbdmirror group name
Dimitri Savineau [Thu, 30 Apr 2020 20:06:55 +0000 (16:06 -0400)]
rolling_update: fix rbdmirror group name

The rbdmirror group name was using the wrong variable definition.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodashboard: allow disabling grafana api ssl verify
Dimitri Savineau [Tue, 28 Apr 2020 17:31:01 +0000 (13:31 -0400)]
dashboard: allow disabling grafana api ssl verify

When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.

Closes: #5324
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: bind mount ganesha log directory
Dimitri Savineau [Mon, 4 May 2020 22:39:05 +0000 (18:39 -0400)]
ceph-nfs: bind mount ganesha log directory

The current ganesha log directory is only present in the container
and not bind mount on the host.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-validate: Expand templates in rgw_create_pools
Benoît Knecht [Mon, 11 May 2020 14:21:55 +0000 (16:21 +0200)]
ceph-validate: Expand templates in rgw_create_pools

Same fix as `ceph-rgw` for `rgw_create_pools` pool names that contain Jinja
templates.

See #5348 for details.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-rgw: Make sure pool name templates are expanded
Benoît Knecht [Mon, 11 May 2020 13:49:32 +0000 (15:49 +0200)]
ceph-rgw: Make sure pool name templates are expanded

It is common to set templated pool names in `rgw_create_pools`, e.g.

```yaml
rgw_create_pools:
  "{{ rgw_zone }}.rgw.buckets.index":
    pg_num: 16
    size: 3
    type: replicated
```

This worked fine with Ansible 2.8, but broke in Ansible 2.9 due to a change in
the way `with_dict` works [1].

This commit replaces the use of `with_dict` with

```yaml
loop: "{{ rgw_create_pools | dict2items }}"
```

which works as intended and expands the template in the pool name.

[1]: https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.9.html#loops

Closes #5348

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agodocs: minor fixes to README-MULTISITE.md
Ali Maredia [Mon, 27 Apr 2020 22:04:58 +0000 (18:04 -0400)]
docs: minor fixes to README-MULTISITE.md

Make all of the hosts start at 1 and not 0,
also make some minor changes in scenario 3 to
remova an inconsistency.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agoceph-validate: Fix "fail on unsupported CentOS release"
Benoît Knecht [Fri, 8 May 2020 12:39:52 +0000 (14:39 +0200)]
ceph-validate: Fix "fail on unsupported CentOS release"

The `dashboard_enabled` condition used a `true` filter (which doesn't exist)
instead of the `bool` filter.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
5 years agoceph-rgw: use match instead of equalto from jinja2
Dimitri Savineau [Wed, 6 May 2020 17:32:18 +0000 (13:32 -0400)]
ceph-rgw: use match instead of equalto from jinja2

The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:

The error was: TemplateRuntimeError: no test named '=='

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1747206
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix internal ganesha deployment
Dimitri Savineau [Wed, 6 May 2020 13:31:34 +0000 (09:31 -0400)]
ceph-nfs: fix internal ganesha deployment

Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoceph-nfs: fix keyring copy for external ganesha
Dimitri Savineau [Tue, 5 May 2020 14:46:14 +0000 (10:46 -0400)]
ceph-nfs: fix keyring copy for external ganesha

Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831285
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agonfs: fix 2 typo
Guillaume Abrioux [Thu, 30 Apr 2020 14:21:14 +0000 (16:21 +0200)]
nfs: fix 2 typo

The condition is missing an index here which makes the playbook failing.

Typical error:
```
The conditional check 'not item.get('skipped', False)' failed. The error was: error while evaluating conditional (not item.get('skipped', False)): 'list object' has no attribute 'get'",
```

Also, adds the missing '/keyring' on the `exec_cmd_nfs` fact.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831342
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agoceph-facts: fix IPv6 _radosgw_address interface
Dimitri Savineau [Mon, 27 Apr 2020 20:01:24 +0000 (16:01 -0400)]
ceph-facts: fix IPv6 _radosgw_address interface

When using radosgw_interface and IPv6 setup then the _radosgw_address
fact doesn't use square brackets compared to the radosgw_address and
radosgw_address_block configuration.

Closes: #5325
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agoRefresh ceph dashboard user role
fmount [Fri, 10 Apr 2020 13:04:52 +0000 (15:04 +0200)]
Refresh ceph dashboard user role

This change allows the operator to refresh the
ceph dashboard admin role on multiple ceph-ansible
executions.
In the current state the role is set only when the
user is created, and there's no way to change it if
the user exists.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826002
Signed-off-by: fmount <fpantano@redhat.com>
5 years agoceph-dashboard: fix mgr dashboard IPv6 fact
Dimitri Savineau [Thu, 23 Apr 2020 18:34:39 +0000 (14:34 -0400)]
ceph-dashboard: fix mgr dashboard IPv6 fact

15ed9ee introduced a regression for the mgr dashboard daemon using
IPv6 since the mgr dashboard configuration doesn't support brackets.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1827299
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agodocs: fix multisite docs add endpoints var in rgw_instances section
Ali Maredia [Thu, 23 Apr 2020 15:12:13 +0000 (11:12 -0400)]
docs: fix multisite docs add endpoints var in rgw_instances section

+ Mention of this variable was missing in the original version.

+ Minor revisions around the concept of secondary zone.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
5 years agoReadd CentOS 7 with conditions
Dimitri Savineau [Thu, 2 Apr 2020 19:58:11 +0000 (15:58 -0400)]
Readd CentOS 7 with conditions

The CentOS 7 distribution could still be used be deploying ceph if
  - it's a containerized deployment
  - it's a non containerized deployment without the dashboard (due to
missing python3 libraries).

The ceph_stable_redhat_distro variable has been remove because we can
rely on the ansible_distribution_major_version fact instead.

The copr el8 repository configuration is only applied for CentOS 8.

The ceph-mgr-dashboard package is only installed when the
dashboard_enabled variable is set to true.

Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
5 years agotests: add back nfs testing on master
Guillaume Abrioux [Thu, 26 Mar 2020 07:21:25 +0000 (08:21 +0100)]
tests: add back nfs testing on master

This commit adds back nfs testing on master branch (containerized
scenario only).

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agomds: don't enable application pool on cephfs pools
Guillaume Abrioux [Tue, 21 Apr 2020 08:29:23 +0000 (10:29 +0200)]
mds: don't enable application pool on cephfs pools

this commit removes the task which enable application on cephfs pools.

See: https://tracker.ceph.com/issues/43761

Fixes: #5278
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agotypo: updating type check on rc
ianwatsonrh [Tue, 21 Apr 2020 13:14:46 +0000 (14:14 +0100)]
typo: updating type check on rc

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826884
Signed-off-by: ianwatsonrh <ianwatson@redhat.com>
5 years agodoc: add day-2 operations documentation
Guillaume Abrioux [Tue, 21 Apr 2020 07:50:27 +0000 (09:50 +0200)]
doc: add day-2 operations documentation

This commit is the first of a serie in order to describe all day-2 operations
that are possible via ceph-ansible using a set of playbook provided in
`infrastructure-playbooks` directory.

Fixes: #5061
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
5 years agofilestore-to-bluestore: fix py2 on skipped tasks
Dimitri Savineau [Mon, 20 Apr 2020 13:47:31 +0000 (09:47 -0400)]
filestore-to-bluestore: fix py2 on skipped tasks

When using skipped variables with from_json filter and python2 then we
need to have a default value otherwise the skipped task will fail.

Unexpected templating type error occurred on
({{ (ceph_volume_lvm_list.stdout | from_json) }}): expected string or
buffer

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1790472
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>