]>
git.apps.os.sepia.ceph.com Git - ceph-ansible.git/log
Guillaume Abrioux [Thu, 24 Sep 2020 02:20:34 +0000 (04:20 +0200)]
ansible.cfg: remove cfg file in infrastructure-playbooks
There's no need ot have a copy of this file in infrastructure-playbooks
directory.
playbooks in that directory can be run from the root dir of
ceph-ansible.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f906caa6dada4a88bfaca1ab9a42ed63358e8ae3 )
Guillaume Abrioux [Thu, 24 Sep 2020 01:51:56 +0000 (03:51 +0200)]
ansible.cfg: set force_valid_group_names param
As of 2.10, group names containing a dash are invalid.
However, setting this option makes it still possible to use a dash in
group names and prevent this warning to show up.
It might need to be definitely addressed in a future ansible release.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1880476
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
6938ed13021201cc1cfcd3f3087849fed3157a69 )
Kefu Chai [Thu, 24 Sep 2020 16:46:30 +0000 (00:46 +0800)]
docs: update URLs to point to the RTD links
Fixes #5798
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit
f3a78371d9e1336595f5ce8ae7932a5f97004bbe )
Dmitriy Rabotyagov [Wed, 23 Sep 2020 13:06:33 +0000 (16:06 +0300)]
Remove libjemalloc1 installation task
libjemalloc1 package is not required neither for ganesha dependency nor
for the package build process. So this task can be simply dropped.
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@ya.ru>
(cherry picked from commit
297532ca411dbdc6ec96258875058b323008abfe )
Dimitri Savineau [Wed, 23 Sep 2020 16:00:30 +0000 (12:00 -0400)]
library/ceph_key: set no_log on secret
We don't need to show this information during the module execution.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
a3f4e2b4d11d3185f4064be5ab2969f0df894ff2 )
Guillaume Abrioux [Wed, 8 Jul 2020 13:49:47 +0000 (15:49 +0200)]
facts: refact `ceph_uid` fact
There's no need to set this fact with a `set_fact`
We can achieve this in `ceph-defaults`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
bcc673f66c22364766beb4b5ebb971bd3f517693 )
Dimitri Savineau [Fri, 18 Sep 2020 14:03:13 +0000 (10:03 -0400)]
container: quote registry password
When using a quote in the registry password then we have the following
error:
The error was: ValueError: No closing quotation
To fix this we need to use the quote filter.
Close: https://bugzilla.redhat.com/show_bug.cgi?id=
1880252
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
6dcfdf17d43635fcd0dc658c199702945a1228dd )
Guillaume Abrioux [Fri, 18 Sep 2020 07:09:57 +0000 (09:09 +0200)]
facts: fix 'set_fact rgw_instances with rgw multisite'
the current condition doesn't work, as soon as the first iteration is
done the condition makes next iterations skip since `rgw_instances` got
set with the first iteration.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859872
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
ff19c1d851ebf0bd7bd8a744367e8893dc6103a8 )
Dimitri Savineau [Thu, 17 Sep 2020 18:11:22 +0000 (14:11 -0400)]
ceph-infra: include iscsi nodes for logrotate
The iscsi nodes aren't included in the logrotate condition.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
85643edfe382d34a77fabbd97a4d937b8b74d4e6 )
Guillaume Abrioux [Tue, 15 Sep 2020 07:48:31 +0000 (09:48 +0200)]
infra: support log rotation for tcmu-runner
This commit adds the log rotation support for tcmu-runner.
ceph-container related PR: ceph/ceph-container#1726
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1873915
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f576c02ff7b15c207b77b3f206a3213184b89889 )
Dimitri Savineau [Tue, 15 Sep 2020 00:13:13 +0000 (20:13 -0400)]
container: add optional http(s) proxy option
When using a http(s) proxy with either docker or podman we can rely on
the HTTP_PROXY, HTTPS_PROXY and NO_PROXY environment variables.
But with ansible, even if those variables are defined in a source file
then they aren't loaded during the container pull/login tasks.
This implements the http(s) proxy support with docker/podman.
Both implementations are different:
1/ docker doesn't rely en the environment variables with the CLI.
Thos are needed by the docker daemon via systemd.
2/ podman uses the environment variables so we need to add them to
the login/pull tasks.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1876692
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
bda3581294c8f29eda598522c331a4c009243884 )
Dimitri Savineau [Tue, 15 Sep 2020 13:30:42 +0000 (09:30 -0400)]
ceph-prometheus: update pool stat counter
Since [1] The bytes_used pool counter in prometheus has been renamed
to stored.
Closes: #5781
[1] https://github.com/ceph/ceph/commit/
71fe9149
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
e54b924eaf05a7223ec7525657d14e8892ce8957 )
Dimitri Savineau [Tue, 15 Sep 2020 13:59:06 +0000 (09:59 -0400)]
switch2container: chown symlink for devices
If the OSD directory is using symlinks for referencing devices (like
block, db, wal for bluestore and journal for filestore) then the chown
command could fail to change the owner:group on some system.
$ ls -hl /var/lib/ceph/osd/ceph-0/
total 28K
lrwxrwxrwx 1 ceph ceph 92 Sep 15 01:53 block -> /dev/ceph-
45113532 -95ca-471b-bd75-
51de46f1339c /osd-data-
570a1aee -60c0-44c9-8036-
ffed7d67a4e6
-rw------- 1 ceph ceph 37 Sep 15 01:53 ceph_fsid
-rw------- 1 ceph ceph 37 Sep 15 01:53 fsid
-rw------- 1 ceph ceph 55 Sep 15 01:53 keyring
-rw------- 1 ceph ceph 6 Sep 15 01:53 ready
-rw------- 1 ceph ceph 3 Sep 15 02:00 require_osd_release
-rw------- 1 ceph ceph 10 Sep 15 01:53 type
-rw------- 1 ceph ceph 2 Sep 15 01:53 whoami
$ find /var/lib/ceph/osd/ceph-0 -not -user 167 -execdir chown 167:167 {} +
chown: cannot dereference './block': Permission denied
$ find /var/lib/ceph/osd/ceph-0 -not -user 167
/var/lib/ceph/osd/ceph-0/block
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
da4280e243f50114e1ae6455a46360012feb8f3d )
Dimitri Savineau [Tue, 15 Sep 2020 13:46:30 +0000 (09:46 -0400)]
switch2container: remove deb systemd units
When running the switch2container playbook on a Debian based system
then the systemd unit path isn't the same than Red Hat based system.
Because the systemd unit files aren't removed then the new container
systemd unit isn't take in count.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
c1af69a7e79a5909903490028f7ae13e519c98e0 )
Guillaume Abrioux [Fri, 11 Sep 2020 15:30:33 +0000 (17:30 +0200)]
purge: remove potential socket leftover
This commit ensure we remove any socket left by ceph and the
`ceph-osd-run.sh` script.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861755
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
5e91e0f3e24da0492b6f5dd2bc808215b5066ddc )
Guillaume Abrioux [Mon, 14 Sep 2020 13:14:24 +0000 (15:14 +0200)]
tests: do not run node_exporter test on clients
We need to skip these tests on client nodes since we don't deploy
node_exporter on them anymore
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
5650a6d7d0a0e2b2fa0ceb080e7d582dc9ceb447 )
Dimitri Savineau [Fri, 11 Sep 2020 15:25:57 +0000 (11:25 -0400)]
node-exporter: exclude client nodes
We don't need to install node-exporter on client node because there's
no ceph services running on them.
This also makes sure we use the group name variables in the prometheus
service template instead of hardcoding the values.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
b105549ed858eb034d97f5fcad4890e17ee2ebfd )
Dimitri Savineau [Mon, 24 Aug 2020 20:05:57 +0000 (16:05 -0400)]
dashboard: use run_once at block level
Instead of using run_once: true on each tasks in a block section, we
can use the run_once statement at the block level.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
2c4af70abd0433beabaf3a7fe727014455e5ba22 )
Dimitri Savineau [Fri, 11 Sep 2020 13:34:05 +0000 (09:34 -0400)]
ceph_key: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
abb4023d762305c368facd3fab5a5b7e3a839d66 )
Dimitri Savineau [Thu, 10 Sep 2020 00:44:54 +0000 (20:44 -0400)]
ceph_pool: set state as optional
Most ansible module using a state parameter default to the present
value (when available) instead of using it as a mandatory option.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
3a05aeb6cbcabef2e72ea6cdf1abe2f26be05eaa )
Dimitri Savineau [Mon, 14 Sep 2020 19:13:03 +0000 (15:13 -0400)]
tests/library: rename ceph_dashboard_user class
Rename the test class with the right information.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
3ba11c1434b9cad2c8276ed9baabdf23d9e30b36 )
Dimitri Savineau [Fri, 4 Sep 2020 18:49:07 +0000 (14:49 -0400)]
library: add ceph_dashboard_user module
This adds the ceph_dashboard_user ansible module for replacing the
command module usage with the ceph dashboard ac-user-xxx command.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
ee6f0547ba17ec5111dae0d25dda5aed5631aa60 )
Guillaume Abrioux [Mon, 17 Aug 2020 08:31:11 +0000 (10:31 +0200)]
facts: refact and optimize memory consumption
there's no need to run this task on all nodes.
This uses too much memory for nothing.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1856981
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f0fe193d8ec48414447aa4a7d50b1a9859c71295 )
Dimitri Savineau [Wed, 9 Sep 2020 22:38:33 +0000 (18:38 -0400)]
ceph-rgw: use ceph_pool module
Since [1] we can use the ceph_pool module instead of using the command
module combined with ceph osd pool commands.
[1]
bddcb439ce1b46735946e9fd5d147bc6604bcda3
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
8dacbce68f84cb45005c35ded4c21f1d4e62df7d )
Dimitri Savineau [Thu, 10 Sep 2020 15:27:37 +0000 (11:27 -0400)]
container: run engine/common roles on first client
We already do this in the site-container.yml playbook because we don't
need docker/podman installed on all client nodes and having the
container image only on the first client node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
8ecbdc6ede7e26d053f87acde99986fddb0fe070 )
Dimitri Savineau [Thu, 10 Sep 2020 14:12:13 +0000 (10:12 -0400)]
ceph-facts: only get fsid when monitor are present
When running the rolling_update playbook with an inventory without
monitor nodes defined (like external scenario) then we can't retrieve
the cluster fsid from the running monitor.
In this scenario we have to pass this information manually (group_vars
or host_vars).
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1877426
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
f63022dfec701dadc28359b1a4978f8a7ab00e03 )
Dimitri Savineau [Tue, 8 Sep 2020 14:36:20 +0000 (10:36 -0400)]
tests: use grafana from quay.io
This changes the grafana container image regitry from docker.io to
quay.io to avoid rate limit.
This also adds the missing container image values for docker2podman
and podman scenarios.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
98c9afceb936753ebcf3f2c036b73fedbdfa6e3d )
Guillaume Abrioux [Fri, 4 Sep 2020 14:50:26 +0000 (16:50 +0200)]
tests: migrate to quay.ceph.io registry
in order to avoid docker.io rate limiting
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
2cbb7de3b2aadd25c8ec0bf0f1e434868a1e4611 )
Francesco Pantano [Mon, 7 Sep 2020 12:02:06 +0000 (14:02 +0200)]
Add --cluster option on ceph require-osd-release command
On DCN environments, or when multiple ceph cluster are configured,
we need to specify the cluster name before running the command or
the rolling_update playbook will fail during minor updates.
Closes: https://bugzilla.redhat.com/1876447
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit
cb64df30b687d95704bac76ed0b4f83dfc3ca992 )
Francesco Pantano [Tue, 8 Sep 2020 11:16:33 +0000 (13:16 +0200)]
Fix hosts field in rolling_update playbook when mds are processed
In the OSP context, during the rolling update the playbook fails
with the following error:
'''
ERROR! The field 'hosts' has an invalid value, which includes an
undefined variable. The error was: list object has no element 0
'''
This PR just change the hosts field providing a valid mons group
value.
Closes: https://bugzilla.redhat.com/1876803
Signed-off-by: Francesco Pantano <fpantano@redhat.com>
(cherry picked from commit
e65f9a5c720eeeef72b6eef59bb239e6ed04cdbe )
Niko Smeds [Thu, 5 Mar 2020 22:24:56 +0000 (14:24 -0800)]
Enable HAProxy backend checks for Ceph RGW
Add the `check` option to server definitions to enable basic HAProxy health
checks for Ceph RADOS gateway backends.
Currently traffic will be forwarded to unhealthly `radosgw.service` servers.
These changes resolve the issue.
Signed-off-by: Niko Smeds nikosmeds@gmail.com
(cherry picked from commit
a951c1a3f0a34e086964f52b0bbf7a8d89481aad )
Guillaume Abrioux [Fri, 21 Aug 2020 08:51:22 +0000 (10:51 +0200)]
rolling_update: remove 'ignore_errors'
There's no need to use `ignore_errors: true` on these tasks.
Using a loop on the task stopping mon daemons allows us to avoid
duplicating this task, the `ignore_errors` isn't needed here because it
won't fail the playbook if one of the ID doesn't exist (shortname vs. fqdn)
Using the right condition on the task starting the mgr daemon allows us
to avoid using an `ignore_errors: true` as well.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
cec994b973343e05252b4aa0f0be0dd0ee39405b )
Guillaume Abrioux [Tue, 4 Aug 2020 11:53:24 +0000 (13:53 +0200)]
ceph_key: refact the code and minor fixes
This commit refactors the code to remove a duplicate condition and it
makes the `state: absent` code idempotent
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
13e2311cbe78b7b51930a3a1210629bf036a20c5 )
Guillaume Abrioux [Wed, 15 Jul 2020 15:28:51 +0000 (17:28 +0200)]
tests: add more coverage for test_ceph_key
This commit adds more coverage regarding the testing of ceph_key module
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
27ca884d99c42b48ff2d918af27c58bc41483374 )
Guillaume Abrioux [Wed, 19 Aug 2020 21:33:51 +0000 (23:33 +0200)]
dashboard: refact admin user creation task
this commit splits this task in order to avoid using a `shell` module.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
54d3e9650f77466ae4207502e0a2da638d82954d )
Dimitri Savineau [Mon, 6 Jul 2020 15:04:13 +0000 (11:04 -0400)]
tests: reenable nfs-ganesha testing
This re-adds the nfs-ganesha testing in non containerized deployment.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
6c11695fbe52595e2860655ce3994d9b538fd4b6 )
George Shuklin [Mon, 13 Jul 2020 10:40:17 +0000 (13:40 +0300)]
Make 'disable ssl for dashboard task' idempotent.
This should reduce number of 'changed' tasks during convergence test.
Signed-off-by: George Shuklin <george.shuklin@gmail.com>
(cherry picked from commit
73d4bb6bd6b560de7f2b3042bdc7d17c901e815a )
Rafał Wądołowski [Thu, 20 Aug 2020 08:13:43 +0000 (10:13 +0200)]
Comment out ceph_custom_key
Since there is a check if ceph_custom_key is defined, there is no reason
to define it by default.
Signed-off-by: Rafał Wądołowski <rwadolowski@cloudferro.com>
(cherry picked from commit
55cd6e83e475ab9ad8d684b88da5325d869e9d1c )
Guillaume Abrioux [Tue, 11 Aug 2020 13:26:16 +0000 (15:26 +0200)]
tests: move erasure pool testing in lvm_osds
This commit moves the erasure pool creation testing from `all_daemons`
to `lvm_osds` so we can decrease the number of osd nodes we spawn so the
OVH Jenkins slaves aren't less overwhelmed when a `all_daemons` based
scenario is being tested.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
8476beb5b1f673d8b0925293d9273041c99a9bac )
John Fulton [Tue, 18 Aug 2020 14:41:42 +0000 (10:41 -0400)]
Set default permission for prometheus config files
Regardless of the outcome of Ansible 2.9.12 issue 71200
we can set a default permission for these files.
Closes: https://github.com/ceph/ceph-ansible/issues/5677
Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit
95dee6f1cad71cddb69f7bcddbd199ebcad45d8c )
Guillaume Abrioux [Tue, 18 Aug 2020 18:35:17 +0000 (20:35 +0200)]
shrink-mds: use mds_to_kill_hostname instead
When using fqdn in inventory host file, this task will fail because the
mds is registered with its shortname.
It means we must use `mds_to_kill_hostname` in this task.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1869837
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
51c382677dfa5db8fc39ca9c3c4898e017f3c189 )
Guillaume Abrioux [Thu, 13 Aug 2020 18:37:11 +0000 (20:37 +0200)]
infra: only install logrotate on right nodes
For intsance, there is no need to install logrotate on clients nodes.
This also ensure logrotate is installed only for containerized
deployments since the packaging has an explicit dependency to logrotate
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
8ed11ea3ee9ea006ecf723c070fa18d3b318f580 )
Guillaume Abrioux [Tue, 18 Aug 2020 13:37:08 +0000 (15:37 +0200)]
travis: enforce ansible-lint 4.2.0
Let's pin to 4.2.0
(because of ansible/ansible-lint/issues/966)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
04d77dcaebb52734a1c6d1838ecfa669bf8f3c67 )
Dimitri Savineau [Mon, 17 Aug 2020 17:55:47 +0000 (13:55 -0400)]
ceph-rgw: allow specifying crush rule on pool
We already support specifiying a custom crush rule during pool creation
in ceph-osd role but not in ceph-rgw role.
This patch adds the missing code to implement this feature.
Note this is only available for replicated pool not erasure. The rule
must also exist prior the pool creation.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1855439
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
cb8f0237e1fe7b890d20d47b5d023a6c618cbd4c )
Dimitri Savineau [Mon, 17 Aug 2020 18:56:17 +0000 (14:56 -0400)]
container: don't install the engine on all clients
We only need the container engine to be installed on the first clients
node in order to execute the pools/keys operation. We already do the
same worflow with the ceph-container-common role which pull the ceph
container image.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
9805589ef94230c67439787cb19ffa7e3d5f2b3d )
Guillaume Abrioux [Tue, 4 Aug 2020 15:29:41 +0000 (17:29 +0200)]
purge-cluster: use sysfs method for unmapping rbd devices
This way we keep consistency with purge-container-cluster.yml playbook.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f77fa6e2a4d4c2b4522582f53713c2e49fecbe12 )
Ali Maredia [Thu, 4 Jun 2020 21:00:16 +0000 (21:00 +0000)]
rgw: allow rgws to be concurrently with or without multisite
Allows rgws in a ceph cluster to be run with
multisite and without multisite at the same time.
Signed-off-by: Ali Maredia <amaredia@redhat.com>
(cherry picked from commit
5c1f4b1a1eff8c77c4bdc816debbbc4043efc644 )
Guillaume Abrioux [Thu, 13 Aug 2020 13:29:28 +0000 (15:29 +0200)]
infra: add missing tag
This commit adds the missing `with_pkg` tag on the logrotate
installation task.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
e1cb385740b8b32600eda90910dcff20208f8945 )
Guillaume Abrioux [Thu, 6 Aug 2020 07:46:12 +0000 (09:46 +0200)]
purge: import ceph-defaults in purge osd play
Otherwise, `ceph_volume_debug` variable is undefined
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
33a544644a671c8b9ffd7c5e761276c1a1ac574d )
Guillaume Abrioux [Tue, 4 Aug 2020 23:47:04 +0000 (01:47 +0200)]
infra: add log rotation support (containers)
This commit adds the log rotation support via logrotate in containerized
deployments.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1848388
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f1aa6cea21ca5423bb0404eae6437a19eaae2653 )
Guillaume Abrioux [Wed, 5 Aug 2020 16:02:48 +0000 (18:02 +0200)]
common: don't enable debug log on ceph-volume calls by default
ceph-volume can generate large logs at some point.
debug logs by definition should be enabled only when debugging.
Let's make it customizable with a variable which is set to `False` by
default.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
448cc280b7919ac7d19d854a92e3ed367b361ccc )
Guillaume Abrioux [Fri, 7 Aug 2020 08:12:50 +0000 (10:12 +0200)]
nfs: do not copy rgw keyring when `nfs_obj_gw` is true
This keyring shouldn't be copied when `nfs_obj_gw` is `True` if the
cluster doesn't contain a rgw node, which can be the case given we are
using `nfs_obj_gw` instead of `nfs_file_gw` (cephfs vs. object), the
deployment will fail trying to copy a key that doesn't exist.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
dd4b5b0328d585d62103a84e02ca728b588a50f3 )
raul [Mon, 3 Aug 2020 10:58:50 +0000 (12:58 +0200)]
rgw: support 1+ rgw instance in `radosgw_frontend_port`
Change the radosgw_frontend_port to take in account more than 1 RGW instance,
in it's original form `radosgw_frontend_port: radosgw_frontend_port | int`,
it configured the 8080 port to all instances, with the following modification
`radosgw_frontend_port: radosgw_frontend_port | int + item|int` we increase in
1 the port count.
Co-authored-by: Daniel Parkes <dparkes@redhat.com>
Signed-off-by: raul <rmahique@redhat.com>
(cherry picked from commit
110eaf5f9f8a2fe26993e2e663849a74531da9d2 )
Guillaume Abrioux [Wed, 12 Aug 2020 14:56:30 +0000 (16:56 +0200)]
tests: test iscsigw against stable build
This commit makes the ci using stable build for testing iscsigw in
stable-5.0
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Benoît Knecht [Fri, 31 Jul 2020 06:11:31 +0000 (08:11 +0200)]
purge-cluster: check if rbdmap exists
When running `infrastructure-playbooks/purge-cluster.yml` twice, it fails the
second time on the `ensure rbd devices are unmapped` task, because `rbdmap`
isn't installed anymore at that point.
This commit adds a check that ensures `rbdmap` is available, and skips the
`ensure rbd devices are unmapped` task if it isn't.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit
a57fd7a0900e5c3d04e8b6c997c819d340565967 )
Kevin Coakley [Mon, 3 Aug 2020 17:03:34 +0000 (10:03 -0700)]
Remove ceph-radosgw.target when switching to containerize daemons
The task "remove old systemd unit file" under "switching from
non-containerized to containerized ceph rgw" only removes
the ceph-radosgw@.service file. The task should also remove
the ceph-radosgw.target file, like the "remove old systemd unit
files" tasks for the mons, mgrs, osds, etc, in order to clean up
all of the unused systemd unit files.
Signed-off-by: Kevin Coakley <kcoakley@sdsc.edu>
(cherry picked from commit
d19e6033b227c621b6c794db4f571151e5bbf9c4 )
Guillaume Abrioux [Wed, 22 Jul 2020 14:08:15 +0000 (16:08 +0200)]
shrink_osd: remove osd data directory
Otherwise it leaves an empty directory.
When shrinking and redeploying multiple OSDs you have no guarantee it
will reuse the same osd id.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
8933bfde33b8aa6ad6f0a0f29531699922d9bf75 )
Guillaume Abrioux [Tue, 21 Jul 2020 07:27:10 +0000 (09:27 +0200)]
tests: refact shrink_osd scenario
This adds more coverage on the shrink_osd scenario.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
7efea219d62792c599b9d66035395323334beeaa )
Guillaume Abrioux [Wed, 22 Jul 2020 09:38:55 +0000 (11:38 +0200)]
tox: split shrink_osd scenario
Let's split this scenario with a dedicated tox ini file.
This is for testing in two ways:
1/ shrinking OSDs one by one
2/ shrinking multiple OSDs with a single call of the playbook
ceph-build related PR: ceph/ceph-build#1629
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
78e4faf077e2710b8245acb3b63dd49f0875291a )
Benoît Knecht [Tue, 28 Jul 2020 11:47:26 +0000 (13:47 +0200)]
shrink-osd: various fixes
This handles missing /etc/ceph/osd, by ensuring we actually found files in
`/etc/ceph/osd` before trying to slurp their content.
This also add a missing `| default(False)` to avoid fowlloing error:
```
fatal: [ceph01]: FAILED! =>
msg: |-
The conditional check 'ceph_osd_data_json[item.2]['encrypted'] | bool' failed. The error was: error while evaluating conditional (ceph_osd_data_json[item.2]['encrypted'] | bool): 'dict object' has no attribute 'encrypted'
```
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1862416
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit
fe8fbd3ee2d877c9ca3b08412a8b12f64a111c18 )
Dimitri Savineau [Wed, 5 Aug 2020 19:03:49 +0000 (15:03 -0400)]
pytest: register ceph_crash mark
Otherwise we see some pytest warning.
PytestUnknownMarkWarning: Unknown pytest.mark.ceph_crash - is this a typo?
You can register custom marks to avoid this warning - for details,
see https://docs.pytest.org/en/latest/mark.html
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
03d46202691514639ff10666a488169bf8b4d150 )
Guillaume Abrioux [Thu, 23 Jul 2020 19:12:46 +0000 (21:12 +0200)]
config: only add related rgw section
there's no need to add each rgw section on all rgw nodes.
With this commit, only related rgw section are rendered.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
0a581a6e6007812cdad935e4f65909b4306046b2 )
Dimitri Savineau [Thu, 30 Jul 2020 16:04:18 +0000 (12:04 -0400)]
dashboard: allow remote TLS cert/key copy
When using TLS on the ceph dashboard or grafana services, we can provide
the TLS certificate and key.
Those files should be present on the ansible controller and they will be
copyied to the right node(s).
In some situation, the TLS certificate and key could be already present
on the target node and not on the ansible controller.
For this scenario, we just need to copy the files locally (on each remote
host).
This patch adds the dashboard_tls_external variable (with default to
false) to allow users to achieve this scenario when configuring this
variable to true.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1860815
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
0d0f1e71df33484d6619aeaa97eb21d7dfc0ea48 )
Dimitri Savineau [Wed, 29 Jul 2020 13:44:15 +0000 (09:44 -0400)]
rolling_update: restart mds after the upgrade
In addition of
155e2a2 , the active mds daemons isn't stop/start
correctly as opposed as the other services so that daemon doesn't come
back after the upgrade.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1861688
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
ec0a37a74ffbefcce42582c57c6726cc001f98ab )
Dimitri Savineau [Fri, 24 Jul 2020 15:21:54 +0000 (11:21 -0400)]
rolling_update: refact dashboard workflow
The dashboard upgrade workflow should do the same process than the ceph
upgrade otherwise any systemd unit modification won't be apply on the
monitoring/dashboard stack.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
a6209bd957e40c9c42370eab73ae99e73d53c8a4 )
Dimitri Savineau [Tue, 21 Jul 2020 18:51:20 +0000 (14:51 -0400)]
rolling_update: stop/start instead of restart
During the daemon upgrade we're
- stopping the service when it's not containerized
- running the daemon role
- start the service when it's not containerized
- restart the service when it's containerized
This implementation has multiple issue.
1/ We don't use the same service workflow when using containers
or baremetal.
2/ The explicity daemon start isn't required since we'are already
doing this in the daemon role.
3/ Any non backward changes in the systemd unit template (for
containerized deployment) won't work due to the restart usage.
This patch refacts the rolling_update playbook by using the same service
stop task for both containerized and baremetal deployment at the start
of the upgrade play.
It removes the explicit service start task because it's already included
in the dedicated role.
The service restart tasks for containerized deployment are also
removed.
Finally, this adds the missing service stop task for ceph crash upgrade
workflow.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1859173
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
155e2a23d54ea29ccbf5414cb93cdc748c516e79 )
Dimitri Savineau [Tue, 21 Jul 2020 19:22:26 +0000 (15:22 -0400)]
ceph-handler: remove iscsigws restart scripts
The iscsigws restart scripts for tcmu-runner and rbd-target-{api,gw}
services only call the systemctl restart command.
We don't really need to copy a shell script to do it when we can use
the ansible service module instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
cbe79428e687e383f9764668a56171e5582451be )
Dimitri Savineau [Tue, 21 Jul 2020 13:32:50 +0000 (09:32 -0400)]
podman: always remove container on start
In case of failure, the systemd ExecStop isn't executed so the container
isn't removed. After a reboot of a failed node, the container doesn't
start because the old container is still present in created state.
We should always try to remove the container in ExecStartPre for this
situation.
A normal reboot doesn't trigger this issue and this also doesn't affect
nodes running containers via docker.
This behaviour was introduced by
d43769d .
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1858865
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
47b7c00287f310ab38e442ba2a147e9f7faab1ee )
Dimitri Savineau [Tue, 21 Jul 2020 19:27:59 +0000 (15:27 -0400)]
ceph-facts: remove mds_name fact
The mds_name fact always gets the ansible_hostname value so we don't
need to have a dedicated fact for this and use the ansible_hostname fact
instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
4e84b4beedc54b94c0263212a926a1919c329b91 )
Dimitri Savineau [Tue, 21 Jul 2020 19:14:30 +0000 (15:14 -0400)]
ceph-handler: add missing condition on ceph-crash
The ceph-crash tasks present in the ceph-handler role don't need to be
executed on all nodes.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
18e3c7a0a2f5ff1f2482e519178a00cec0c81420 )
Guillaume Abrioux [Tue, 21 Jul 2020 18:27:28 +0000 (20:27 +0200)]
crash: rm container in ExecPreStart even with docker
We should ensure the container is removed in `ExecPreStart` even when
`{{ container_binary }}` is docker.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
39bb279a53deaf87053cae4014c7143780016536 )
Guillaume Abrioux [Fri, 3 Jul 2020 08:21:49 +0000 (10:21 +0200)]
ceph-crash: introduce new role ceph-crash
This commit introduces a new role `ceph-crash` in order to deploy
everything needed for the ceph-crash daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
9d2f2108e1c9b6ae42b3133bb9ac37d4765e5e07 )
Guillaume Abrioux [Wed, 22 Jul 2020 05:28:34 +0000 (07:28 +0200)]
tests: lvm_setup.yml, add carriage return
This commit adds crlf between each task.
It makes the playbook more readable.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
8ef9fb68bc975f92069f362775f4f281c3b03531 )
Guillaume Abrioux [Tue, 21 Jul 2020 23:51:20 +0000 (01:51 +0200)]
tests: (lvm_setup.yml), don't shrink lvol
when rerunning lvm_setup.yml on existing cluster with OSDs already
deployed, it fails like following:
```
fatal: [osd0]: FAILED! => changed=false
msg: Sorry, no shrinking of data-lv2 to 0 permitted.
```
because we are asking `lvol` module to create a volume on an empty VG
with size extents = `100%FREE`.
The default behavior of `lvol` is to shrink the volume if the LV's current
size is greater than the requested size.
Given the requested size is calculated like this:
`size_requested = size_percent * this_vg['free'] / 100`
in our case, it is similar to:
`size_requested = 100 * 0 / 100` which basically means `0`
So the current LV size is well greater than the requested size which
leads the module to attempt to shrink it to 0 which isn't obviously now
allowed.
Adding `shrink: false` to the module calls fixes this issue.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
218f4ae361b53fd7632d17495ba54fb6e2afed10 )
Guillaume Abrioux [Mon, 13 Jul 2020 07:42:25 +0000 (09:42 +0200)]
facts: fix broken facts when using --limit
This commit fixes these tasks when --limit is used.
It makes sure the fact is set on right nodes even when the playbook is
run with `--limit`
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f8a951f50c6a64ab3d60a1bf66ca9d2db2f6bc35 )
Dimitri Savineau [Fri, 17 Jul 2020 14:38:02 +0000 (10:38 -0400)]
ceph-dashboard: copy TLS cert/key on monitor
The ceph-dashboard role is executed on the mgr nodes so the TLS cert/key
files are copied to those nodes.
But we are running importing the cert/key files into the ceph
configuration on the monitor.
Closes: #5557
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
2b8ebf14574e927bfabd939cc6263eb27a65afb3 )
Dimitri Savineau [Mon, 20 Jul 2020 14:41:53 +0000 (10:41 -0400)]
cephadm: set the command as a fact
Set the cephadm cmd as a fact instead of rewriting the same command
over and over.
This also fix an issue when using docker as container engine because
the --docker cephadm parameter should be use before the subcommand
not after.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
5ef965c4dc0dd88a28addbe238da6c4d31f1df21 )
Dimitri Savineau [Fri, 10 Jul 2020 21:52:38 +0000 (17:52 -0400)]
cephadm: add playbook
This adds a new playbook for deploying ceph via cephadm.
This also adds a new dedicated tox file for CI purpose.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
957903d561680d366493cb5d6fd36b19aa3fb539 )
Dimitri Savineau [Wed, 15 Jul 2020 22:25:57 +0000 (18:25 -0400)]
cephadm-adopt: delegate task for orch apply
This is a partial revert of
b38019e because we don't want to execute
the whole play on the monitor otherwise if we have some empty group
like rgws or mdss then the orchestrator commands will still be
executed.
Instead we should keep the real target group name at play level and
delegate the orchestator commands to the monitor. The whole play
will be skipped is the group is empty.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
95964949119f029849337d22189d1f99dad67535 )
Dimitri Savineau [Wed, 15 Jul 2020 19:21:25 +0000 (15:21 -0400)]
cephadm-adopt: inform users about cephadm
Print a message at the end of the playbook to inform users that they
don't have to user ceph-ansible playbooks anymore as everything else
need to be done via cephadm (day 2 operation).
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
75ae1b7e90ffaada4e7263a427b87222fcec439e )
Dimitri Savineau [Wed, 15 Jul 2020 19:15:06 +0000 (15:15 -0400)]
cephadm-adopt: refresh the service/daemon list
When reporting the orchestrator service/daemon list at the end of the
playbook, we can use the --refresh option otherwise we could have
an outdated output.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
7164426456a1bb29b3054d109e83cacf1966bd42 )
Dimitri Savineau [Wed, 15 Jul 2020 19:14:23 +0000 (15:14 -0400)]
Revert "cephadm-adopt: remove the cephadm script"
This reverts commit
c3bbc6b13cee5e566b277f3146e9e6bc4cec2f52 .
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
ceac81cd24095066898a5c061b22deffe29ca05b )
Guillaume Abrioux [Thu, 9 Jul 2020 14:24:15 +0000 (16:24 +0200)]
ceph_key: fix bug in 'info' feature
Fix 'info' feature from ceph_key.py module
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
9417ecf0c5c05da2b12572ef36304edbe2cf0ae1 )
Dimitri Savineau [Fri, 10 Jul 2020 21:41:32 +0000 (17:41 -0400)]
cephadm-adopt: wait for monitor in quorum
After adopting a monitor we need to wait that monitor to join back
the quorum before moving to the next node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
0c3a2b72ff1efe7c5a1592b12bd2d6690b6096a7 )
Dimitri Savineau [Fri, 10 Jul 2020 19:24:24 +0000 (15:24 -0400)]
cephadm-adopt: add osd flags during adoption
Like rolling_update or switch2container playbooks, we need to set/unset
some osd flags before and after the OSD daemons adoption.
This also adds a task for waiting for clean pgs at then of an OSd node.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
d3b3c8948ef47d79cb32c328eb8df275fe589ac1 )
Dimitri Savineau [Fri, 10 Jul 2020 18:59:06 +0000 (14:59 -0400)]
cephadm-adopt: add iscsi support
The iSCSI support has been added recently in cephadm.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
9fe26947113e076488781d84f2b83db98cf7c0ec )
Dimitri Savineau [Fri, 10 Jul 2020 18:45:51 +0000 (14:45 -0400)]
cephadm-adopt: remove the cephadm script
At the end of the process when don't need the cephadm script.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
c3bbc6b13cee5e566b277f3146e9e6bc4cec2f52 )
Dimitri Savineau [Fri, 10 Jul 2020 18:13:15 +0000 (14:13 -0400)]
cephadm-adopt: show orchestrator status
At the end of the playbook we can show the orchestrator status like
we do with the ceph status in initial deployment.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
381201a3943997e666845de638623d3933c2552b )
Dimitri Savineau [Fri, 10 Jul 2020 14:42:02 +0000 (10:42 -0400)]
cephadm-adopt: use placement parameter
It's better to use the --placement parameter when using ceph orch apply
commands to avoid confusion in the parameters.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
91a6c79e41f4e2ae70973f5b96871a28ebfd511f )
Dimitri Savineau [Thu, 9 Jul 2020 22:38:17 +0000 (18:38 -0400)]
cephadm-adopt: use custom dashboard images
cephadm uses default value for dashboard container images which need to
be customized by ansible for upstream or downstream purpose.
This feature wasn't present when cephadm-adopt.yml has been designed.
Also set the container_image_base variable for upgrade purpose.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
f2d997396ea1a87e0304d37502d149570433ae63 )
Dimitri Savineau [Thu, 9 Jul 2020 22:28:49 +0000 (18:28 -0400)]
cephadm-adopt: run orch apply from monitors
It looks like we can't run the ceph orch apply commands on nodes other
than monitors even if it used to work in the past.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
b38019e3ca374419344b8a441c2a60e5053ddb31 )
Dimitri Savineau [Thu, 9 Jul 2020 15:23:33 +0000 (11:23 -0400)]
cephadm-adopt: don't fail on systemd reset-failed
If the systemd service exists successfully then we don't need to reset
the failed state.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
27efcbc0e5d216a3a211b930497769ec06526369 )
Dimitri Savineau [Thu, 9 Jul 2020 15:19:41 +0000 (11:19 -0400)]
cephadm-adopt: copy client.admin keyring
The ceph config assimilate-conf command requires the client.admin
keyring which isn't present on all nodes most of the time.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
fd36433826b73d145b37902dcfe3e8b6d473b909 )
Dimitri Savineau [Mon, 6 Jul 2020 18:27:50 +0000 (14:27 -0400)]
tox: add cephadm_adopt scenario
This adds an optional cephadm_adopt scenario which is based on
all_daemons.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
14eed639213056721879cd23718526b3a68bd439 )
Guillaume Abrioux [Thu, 9 Jul 2020 11:07:32 +0000 (13:07 +0200)]
rgw: set container memory limit to 4g
This commit changes the container memory limit for rgw daemons.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1707488
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
86edae724f98036ada845a138c7f586df395cd3a )
Guillaume Abrioux [Tue, 7 Jul 2020 15:11:27 +0000 (17:11 +0200)]
tests: add docker hub authentication in jobs
This commit makes all jobs authenticating to docker hub in order to
avoid the rate limit.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
40307f810c76f22b7152cb1f4113089a22a84274 )
Guillaume Abrioux [Tue, 7 Jul 2020 23:04:10 +0000 (01:04 +0200)]
ceph_volume: fix regression
do not skip zapping if osd_fsid is passed
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit
f402ab2b87813f0f9c3fba661a52f5afebc19723 )
Guillaume Abrioux [Tue, 7 Jul 2020 13:02:49 +0000 (15:02 +0200)]
doc: add stable-5.0 note in release section
This commit adds the missing stable-5.0 details about what it is
supported in this branch regarding ceph/ansible.
Fixes: #5519
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Dimitri Savineau [Thu, 2 Jul 2020 19:23:09 +0000 (15:23 -0400)]
ceph-nfs: change ganesha devel source
The download.nfs-ganesha.org source for nfs-ganesha on CentOS isn't
available anymore.
Let's switch back to shaman since we have builds available now.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
1438ca012069a1b977268359c8b0f2f5496fa4bd )
Dimitri Savineau [Mon, 6 Jul 2020 14:45:21 +0000 (10:45 -0400)]
tests: remove nfs_ganesha_stable_branch variable
We don't need to override this variable in the group_vars but use the
default value instead.
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
(cherry picked from commit
fc599ed9f5399d2f13bfe964d2aef3d0b1fc7154 )