Dimitri Savineau [Wed, 18 Mar 2020 14:53:40 +0000 (10:53 -0400)]
dashboard: allow to set read-only admin user
This commit allows one to set the role for the admin user as read-only.
This can be controlled via the dashboard_admin_user_ro variable but the
default value is false for backward compatibility.
Dimitri Savineau [Tue, 17 Mar 2020 00:45:03 +0000 (20:45 -0400)]
ceph-defaults: add registry name on dashboard vars
We don't use the registry name when using the community dashboard
container images (grafana, prometheus, alertmanager & node exporter).
This commit adds the docker.io registry explicitly in the default
dashboard container image name values.
Dimitri Savineau [Mon, 16 Mar 2020 21:52:30 +0000 (17:52 -0400)]
ceph-defaults: update grafana container tag
Since 8e8aa73 we're using grafana 5.4.3 in RHCS 4.1 via [1].
We should also update the grafana container tag from docker.io when
using the community release.
Dimitri Savineau [Wed, 11 Mar 2020 02:41:27 +0000 (22:41 -0400)]
ceph-infra: open radosgw ports for multi instances
When using the radosgw multi instances configuration then the firewall
rules aren't adapted to that setup.
We only open the port according to the radosgw_frontend_port variable
so only the first radosgw instance port will be opened in the firewall
configuration.
We should instead iterate over the rgw_instances list.
Dimitri Savineau [Wed, 11 Mar 2020 00:50:55 +0000 (20:50 -0400)]
update osd pool set size command
Since [1] we can't use osd pool without replicas (size: 1) by default.
We now need to set the mon_allow_pool_size_one flag to true in the ceph
configuration and add the --yes-i-really-mean-it flag to the osd pool
set size cli.
This commit make that task retrying 5 times to start the service
firewalld to avoid failure like following:
```
TASK [ceph-infra : start firewalld] ********************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible-prs-centos-container-purge/roles/ceph-infra/tasks/configure_firewall.yml:22
Monday 09 March 2020 08:58:48 +0000 (0:00:00.963) 0:02:16.457 **********
fatal: [osd4]: FAILED! => changed=false
msg: |-
Unable to enable service firewalld: Created symlink from /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service to /usr/lib/systemd/system/firewalld.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/firewalld.service to /usr/lib/systemd/system/firewalld.service.
Failed to execute operation: Connection reset by peer
```
Nizamudeen [Fri, 6 Mar 2020 10:38:31 +0000 (16:08 +0530)]
doc: Fixed a minor typo in the document
In the Demo part of the Document, for both Vagrant and Bare metal the description was mentioned as "Deployment from scratch on bare metal machines".
Changed "bare metal" to "vagrant" for Vagrant section
Ali Maredia [Fri, 4 Oct 2019 19:31:25 +0000 (19:31 +0000)]
rgw multisite: enable more than 1 realm per cluster
Make it so that more than one realm, zonegroup,
or zone can be created during a run of the rgw
multisite ansible playbooks.
The rgw hosts now need to be grouped into zones
and realms in the inventory.
.yml files need to be created in group_vars
for the realms and zones. Sample yaml files
are available.
Also remove multsite destroy playbook
and add --cluster before radosgw-admin commands
remove manually added rgw_zone_endpoints var
and have ceph-ansible automatically add the
correct endpoints of all the rgws in a rgw_zone
from the information provided in that rgws hostvars.
This commit changes the value passed for the attribute 'rule_name' in
openstack_pools definition. It doesn't make sense to have emptry string
as passed value here.
Dimitri Savineau [Fri, 28 Feb 2020 16:37:14 +0000 (11:37 -0500)]
shrink-rbdmirror: fix presence after removal
We should add retry/delay to check the presence of the rbdmirror daemon
in the cluster status because the status takes some time to be updated.
Also the metadata.hostname isn't a good key to check because it doesn't
reflect the ansible_hostname fact. We should use metadata.id instead.
Dimitri Savineau [Wed, 26 Feb 2020 16:03:32 +0000 (11:03 -0500)]
tox: update shrink scenario configuration
The shrink scenarios don't need the docker variables (except for OSD).
Removing pytest for shrink-mgr.
Adding environment variables for xxx_to_kill ansible variable.
Dimitri Savineau [Wed, 26 Feb 2020 15:20:05 +0000 (10:20 -0500)]
shrink: don't use localhost node
The ceph-facts are running on localhost so if this node is using a
different OS/release that the ceph node we can have a mismatch between
docker/podman container binary.
This commit also reduces the scope of the ceph-facts role because we only
need the container_binary tasks.
Dimitri Savineau [Fri, 28 Feb 2020 14:42:44 +0000 (09:42 -0500)]
ceph-validate: add key format validation
If the user provides manually the key value for a specific keyring then
there's not valation on the content which could lead to unexpected
failures in the ceph_key module.
Dimitri Savineau [Fri, 31 Jan 2020 15:42:10 +0000 (10:42 -0500)]
purge: stop rgw instances by iteration
It looks like that the service module doesn't support wildcard anymore
for stopping/disabling multiple services.
fatal: [rgw0]: FAILED! => changed=false
msg: 'This module does not currently support using glob patterns,
found ''*'' in service name: ceph-radosgw@*'
...ignoring
Instead we should iterate over the rgw_instances list.
We need to call ceph-facts only for setting `container_binary`.
Since this task has been isolated we can use `tasks_from` to only execute the
needed task.
If the filestore configuration was using a dedicated journal with either
a partition or a LV/VG then we need to reuse this for bluestore DB.
When filestore is using a raw devices then we shouldn't destroy
everything (data + journal) but only data otherwise the journal
partition won't exist anymore.
Configure ceph dashboard backend and dashboard_frontend_vip
This change introduces a new set of tasks to configure the
ceph dashboard backend and listen just on the mgr related
subnet (and not on '*'). For the same reason the proper
server address is added in both prometheus and alertmanger
systemd units.
This patch also adds the "dashboard_frontend_vip" parameter
to make sure we're able to support the HA model when multiple
grafana instances are deployed.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1792230 Signed-off-by: Francesco Pantano <fpantano@redhat.com>
Dimitri Savineau [Mon, 17 Feb 2020 20:46:54 +0000 (15:46 -0500)]
ceph-dashboard: update create/get rgw user tasks
Since [1] if a rgw user already exists then the radosgw-admin user create
command will return an error instead of modifying the current user.
We were already doing separated tasks for create and get operation but
only for multisite configuration but it's not enough.
Instead we should do the get task first and depending on the result
execute the create.
This commit also adds missing run_once and delegate_to statement.
Sam Choraria [Tue, 3 Dec 2019 12:23:13 +0000 (12:23 +0000)]
ceph-rgw: allow SSL certificate content to supplied
Allow SSL certificate & key contents to be written to the path
specified by radosgw_frontend_ssl_certificate. This permits a
certificate to be deployed & renewal of expired certificates
through ceph-ansible.
Signed-off-by: Sam Choraria <sam.choraria@bbc.co.uk>
The ad7a5da commit introduced a regression when using TLS on haproxy
via the haproxy_frontend_ssl_certificate variable.
This cause the "stats socket" and the "tune.ssl.default-dh-param"
parameters to be on the same line resulting haproxy failing to start.
Dimitri Savineau [Thu, 16 Jan 2020 21:58:35 +0000 (16:58 -0500)]
purge-cluster: update package list to remove
We only support python3 so renaming all ceph python packages.
Some ceph packages were missing from the list (ceph-mon, ceph-osd or
rbd-mirror) or didn't exist anymore (ceph-fs-common, libcephfs1).
John Fulton [Thu, 6 Feb 2020 02:23:54 +0000 (21:23 -0500)]
The _filtered_clients list should intersect with ansible_play_batch
Client configuration with --limit fails without this patch
because certain tasks are only done to the first host in the
_filtered_clients list and it's likely that first host will
not be included in what's sepcified with --limit. To fix this
the _filtered_clients list should be built from all clients
in the inventory that are also in the running play.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1798781 Signed-off-by: John Fulton <fulton@redhat.com>
Using ceph_dev_branch and ceph_dev_sha1 for configuring ceph-iscsi
repositories from shaman doesn't make sense because the ceph devel
branches and sha1 aren't compatible with ceph-iscsi devel.
Instead we could rely on the master branch and the latest sha1.
Currently it's not possible to using a custom ceph branch/sha1 value
with iscsi setup otherwise the repository setup will fail.
Dimitri Savineau [Mon, 10 Feb 2020 16:06:48 +0000 (11:06 -0500)]
ceph-nfs: fix ceph_nfs_ceph_user variable
The ceph_nfs_ceph_user variable is a string for the ceph-nfs role but a
list in ceph-client role. 6a6785b introduced a confusion between both variable type in the ceph-nfs
role for external ceph with ganesha.
Dimitri Savineau [Mon, 10 Feb 2020 18:43:31 +0000 (13:43 -0500)]
ceph-{mon,osd}: move default crush variables
Since ed36a11 we move the crush rules creation code from the ceph-mon to
the ceph-osd role.
To keep the backward compatibility we kept the possibility to set the
crush variables on the mons side but we didn't move the default values.
As a result, when using crush_rule_config set to true and wanted to use
the default values for crush_rules then the crush rule ansible task
creation will fail.
"msg": "'ansible.vars.hostvars.HostVarsVars object' has no attribute
'crush_rules'"
This patch move the default crush variables from ceph-mon to ceph-osd
role but also use those default values when nothing is defined on the
mons side.
Dimitri Savineau [Wed, 12 Feb 2020 15:38:25 +0000 (10:38 -0500)]
ceph-grafana: fix grafana_{crt,key} condition
The grafana_{crt,key} aren't boolean variables but strings. The default
value is an empty string so we should do the conditional on the string
length instead of the bool filter
When using multiple grafana hosts then we push set the grafana and
prometheus URL and push the dashboard layout to a single node.
grafana_server_addrs is the list of all grafana nodes and used during
the ceph-dashboard role (on mgr/mon nodes).
grafana_server_addr is the current grafana node used during the
ceph-grafana and ceph-prometheus role (on grafana-server nodes).
We don't have the grafana_server_addr fact duplication code between
external vs collocated nodes.
switch_to_containers: increase health check values
This commit increases the default values for the following variable
consumed in switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
This also moves these variables in `ceph-defaults` role so the user can
set different values if needed.
Mike Christie [Tue, 28 Jan 2020 22:31:55 +0000 (16:31 -0600)]
iscsi: Fix crashes during rolling update
During a rolling update we will run the ceph iscsigw tasks that start
the daemons then run the configure_iscsi.yml tasks which can create
iscsi objects like targets, disks, clients, etc. The problem is that
once the daemons are started they will accept confifguration requests,
or may want to update the system themself. Those operations can then
conflict with the configure_iscsi.yml tasks that setup objects and we
can end up in crashes due to the kernel being in a unsupported state.
This could also happen during creation, but is less likely due to no
objects being setup yet, so there are no watchers or users accessing the
gws yet. The fix in this patch works for both update and initial setup.
Dimitri Savineau [Wed, 29 Jan 2020 03:31:04 +0000 (22:31 -0500)]
ceph-container-engine: lvm2 on OSD nodes only
Since de8f2a9 the lvm2 package installation has been moved from ceph-osd
role to ceph-container-engine role.
But the scope wasn't limited to the OSD nodes only.
This commit fixes this behaviour.
Dimitri Savineau [Tue, 28 Jan 2020 15:27:34 +0000 (10:27 -0500)]
ceph-defaults: remove rgw from ceph_conf_overrides
The [rgw] section in the ceph.conf file or via the ceph_conf_overrides
variable doesn't exist and has no effect.
To apply overrides to all radosgw instances we should use either the
[global] or [client] sections.
Overrides per radosgw instance should still use the
[client.rgw.{instance-name}] section.
Dimitri Savineau [Wed, 29 Jan 2020 02:34:24 +0000 (21:34 -0500)]
ceph-facts: fix _container_exec_cmd fact value
When using different name between the inventory_hostname and the
ansible_hostname then the _container_exec_cmd fact will get a wrong
value based on the inventory_hostname instead of the ansible_hostname.
This happens when the ceph cluster is already running (update/upgrade).
Later the container exec commands will fail because the container name
is wrong.
We should always set the _container_exec_cmd based on the
ansible_hostname fact.
If the playbook is used on a host running bluestore OSDs then the
osd_fsid_list won't be filled because the bluestore OSDs are reported
with 'type: block' via ceph-volume lvm list command but we are looking
for 'type: data' (filestore).
fix calls to `container_exec_cmd` in ceph-osd role
We must call `container_exec_cmd` from the right monitor node otherwise
the value of the fact might mistmatch between the delegated node and the
node being played.
Dimitri Savineau [Thu, 23 Jan 2020 21:58:14 +0000 (16:58 -0500)]
filestore-to-bluestore: skip bluestore osd nodes
If the OSD node is already using bluestore OSDs then we should skip
all the remaining tasks to avoid purging OSD for nothing.
Instead we warn the user.
Some ganesha packages do not create ganesha log directories
while it's expected to be created while changing it's permissions.
Additionally it's no much sense in doing that as a separate task,
so directory is created as correct permissions are set with creation of
the rest required directories.
Dimitri Savineau [Wed, 22 Jan 2020 19:45:38 +0000 (14:45 -0500)]
site-container: don't skip ceph-container-common
On HCI environment the OSD and Client nodes are collocated. Because we
aren't running the ceph-container-common role on the client nodes except
the first one (for keyring purpose) then the ceph-role execution fails
due to undefined variables.
because we are trying to run `ceph-config` on this node, it doesn't make
sense so we should simply run this play on all groups except
`[grafana-server]`.
Dimitri Savineau [Tue, 21 Jan 2020 21:37:10 +0000 (16:37 -0500)]
filestore-to-bluestore: fix osd_auto_discovery
When osd_auto_discovery is set then we need to refresh the
ansible_devices fact between after the filestore OSD purge
otherwise the devices fact won't be populated.
Also remove the gpt header on ceph_disk_osds_devices because
the devices is empty at this point for osd_auto_discovery.
Adding the bool filter when needed.
common: add a default value for ceph_directories_mode
Since this variable makes it possible to customize the mode for ceph
directories, let's make it a bit more explicit by adding a default value
in ceph-defaults.
Dimitri Savineau [Mon, 20 Jan 2020 21:40:58 +0000 (16:40 -0500)]
filestore-to-bluestore: --destroy with raw devices
We still need --destroy when using a raw device otherwise we won't be
able to recreate the lvm stack on that device with bluestore.
Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-bdc67a84-894a-4687-b43f-bcd76317580a /dev/sdd
stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
Unable to add physical volume '/dev/sdd' to volume group 'ceph-b7801d50-e827-4857-95ec-3291ad6f0151'
/dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes
Dimitri Savineau [Mon, 20 Jan 2020 16:24:08 +0000 (11:24 -0500)]
ceph-osd: set container objectstore env variables
Because we need to manage legacy ceph-disk based OSD with ceph-volume
then we need a way to know the osd_objectstore in the container.
This was done like this previously with ceph-disk so we should also
do it with ceph-volume.
Note that this won't have any impact for ceph-volume lvm based OSD.
Rename docker_env_args fact to container_env_args and move the container
condition on the include_tasks call.
Remove OSD_DMCRYPT env variable from the ceph-osd template because it's
now included in the container_env_args variable.