The systemd LOAD and ACTIVE fileds could have more than one space between
both values.
This update the systemd regex the same way we're using it in different
part of the code.
When using an untrusted TLS certificate (like self-signed) on grafana
then the grafana dashboards update subcommand will fail.
One solution could be to trust the TLS certificate.
The other one is to disable the TLS verification on the grafana API.
facts: always set ceph_run_cmd and ceph_admin_command
always set these facts on monitor nodes whatever we run with `--limit`.
Otherwise, playbook will fail when using `--limit` on nodes where these
facts are used on a delegated task to monitor.
The docker2podman playbook only installs the podman package and updates
the systemd units with the right container_binary value.
We never pull the container image so if one service is restarted then
the container image will be pulled first before the service can start
which could cause longer downstream.
To avoid to download the container image from internet again we can just
pull it from the local docker daemon.
The container_{binding,package,service}_name variables are removed
because they are only used in the ceph-container-engine role which
isn't call in this playbook.
Dimitri Savineau [Tue, 12 May 2020 15:38:47 +0000 (11:38 -0400)]
docker-to-podman: conditional docker commands
The docker commands should be based on the container_binary variable
otherwise running the playbook on a host without docker (like podman
only) will failed.
When using radosgw_interface and IPv6 setup then the _radosgw_address
fact doesn't use square brackets compared to the radosgw_address and
radosgw_address_block configuration.
This change allows the operator to refresh the
ceph dashboard admin role on multiple ceph-ansible
executions.
In the current state the role is set only when the
user is created, and there's no way to change it if
the user exists.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1826002 Signed-off-by: fmount <fpantano@redhat.com>
(cherry picked from commit 5eb363e0331caf5d0be8e1242d1e57f4f5045812)
ceph-rgw: use match instead of equalto from jinja2
The '==' jinja2 operator (or 'equalto') has been introduced in jinja2
2.8.
On EL7, jinja2 version is 2.7 so the operator isn't present creating
templating error like:
The error was: TemplateRuntimeError: no test named '=='
Since ea2b654d9 we're not running the rados command from the monitor
nodes but from the ganesha node. Unfortunately we don't have the
required keyring on that node to run the rados command as we don't
import the right keyring.
This commit restores the workflow for internal ganesha deployment like
before ea2b654d9 but keeps the rados commands from the ganesha node for
external deployment until we have a better design.
Fix the condition on the keyring copy task that prevent the ganesha
keyring to be created in the /var/lib/ceph directory.
Also ensure that the directory exists first.
This commit is the first of a serie in order to describe all day-2 operations
that are possible via ceph-ansible using a set of playbook provided in
`infrastructure-playbooks` directory.
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of
using `result|version_compare` use `result is version_compare`. This
feature will be removed in version 2.9. Deprecation warnings can be
disabled by setting deprecation_warnings=False in ansible.cfg.
Rename 'version_compare' to the function 'version'.
version_compose was renamed to version since ansible 2.5
Rishabh Dave [Tue, 7 Apr 2020 11:50:35 +0000 (17:20 +0530)]
library/ceph_volume: look for error messages in stderr
Error message were moved to from stdout in stderr here -
https://github.com/ceph/ceph/commit/b8d6dcbe9f803c96c0af68da54f1262e9b6a9e77#diff-20f7c578a4e69ec61a5869d706567a24R137.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1793542 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4249d1e02d6da07466a4ddf1282cf4600a131773)
We were not testing the right ansible_distribution fact value for RHEL
distribution.
This commit also updates the minial RHEL version supported by RHCS.
Dimitri Savineau [Tue, 24 Mar 2020 18:28:51 +0000 (14:28 -0400)]
tests: update mgr dashboard socket listening test
Since 15ed9ee the ceph-mgr daemon binds on the IP address on the public
network instead of binding on all addresses.
This commit updates the testinfra code to reflect that change.
The dashboard SSO mgr module requires the saml python library to be
installed. This is only a valid scenario for RHCS deployment because
the saml python library isn't available in other classic repositories.
This package is present in RHCS Tools repository so we also need to
enable it on the mgr nodes.
With this change, the state `present` is enough to update a keyring.
If the keyring already exist, it will be updated if caps or secret
passed to the module are different.
If the keyring doen't exist, it will be created.
When `rule_name` isn't set in `crush_rules` the osd pool creation will
fail.
This commit adds a new fact `ceph_osd_pool_default_crush_rule_name` with
the default crush rule name.
osd: support changing default rule even when osd_crush_location isn't defined
Creating crush rules even with no crush hierarchy configuration is a
valid scenario so we shouldn't be bound to the first task result (which
configure crush hierarchy) to be able to add new crush rules.
Adding `--all` to the `systemctl list-units` command in order to get
*all* osds id on the node (including stoppped osds). Otherwise, it will
purge the cluster but there will be leftover after that.
Javier Pena [Wed, 23 Oct 2019 14:45:41 +0000 (16:45 +0200)]
Allow setting dist and mock configuration in Makefile
Curently, the dist and mock configurations are hardcoded in the
Makefile to be el8 and epel-7-x86_64, respectively. This commit
allows the user to override those settings using the DIST and
MOCK_CONFIG environment variables, falling back to the current
defaults if not set.
This provides additional flexibility when building the RPM directly
from the repository.
Dimitri Savineau [Fri, 27 Mar 2020 21:16:41 +0000 (17:16 -0400)]
ceph_volume: fix multiple db/wal devices
When using the lvm batch ceph-volume subcommand with dedicated devices
for bluestore (db/wal) then the list of devices is convert to a string
instead of being extended via an iterable.
This was working with only one dedicated device but starting with more
then the ceph_volume module fails.
Dimitri Savineau [Fri, 27 Mar 2020 19:53:07 +0000 (15:53 -0400)]
ceph-defaults: update container tag for nautilus
The latest Ceph stable release is now Octopus so the "latest" container
image tag is pointing to Octopus and not Nautilus anymore.
This commit updates the ceph_docker_image_tag with "latest-nautilus".
Dimitri Savineau [Tue, 24 Mar 2020 21:11:44 +0000 (17:11 -0400)]
ceph-facts: fix rgw_instances_all fact
The rgw_instances_all fact is supposed to be the list of all radosgw
instances from all rgw nodes.
But the fact is always using the local rgw_instances variable so this
won't work on multiple nodes.
Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.
When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.
Update this scenario due to recent changes. e361c0ff94de34dd0d627ffe6e6dd616f41c26e3 added more nodes in order to
cover code erasure pool creation in the CI. Therefore, we must add more
nodes in the 3.2 deployment too.
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]
rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]
shrink-rbdmirror: fix presence after removal
We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.
Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]
tox: update shrink scenario configuration
The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.
Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]
shrink: don't use localhost node
The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.
Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]
purge: stop rgw instances by iteration
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
It doesn't make sense to start validating configuration if the ansible
version isn't the good one.
This commit moves the check_system task as the first task in the
ceph-validate role.
The ansible version test tasks are moved at the top of this file.
Also moving the iscsi kernel tests from check_system to check_iscsi
file.
We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.