Fix in regular expression matching OSD ID on non-contenerized
deployment.
restart_osd_daemon.sh is used to discover and restart all OSDs on a
host. To do it the scripts loops the list of ceph-osd@ services in the
system. This commit fixes bug in the regular expression responsile for
extraction of OSDs - prior version uses `[0-9]{1,2}` expression
which is ignoring all OSDS which numbers are greater than 99 (thus
longer than 2 digits). Fix removed upper limit of digits in the number.
This problem existed in two places in the script.
defaults: backward compatibility with fqdn deployments
This commit ensures we are backward compatible with fqdn deployments.
Since ceph-container enforces deployment to be done with shortname, we
must keep backward compatibility with clusters already deployed with
fqdn configuration
Sébastien Han [Mon, 23 Jul 2018 12:56:20 +0000 (14:56 +0200)]
rolling_update: set osd sortbitwise
upgrade RHCS 2 -> RHCS 3 will fail if cluster has still set
sortnibblewise,
it stay stuck on "TASK [waiting for clean pgs...]" as RHCS 3 osds will
not start if nibblewise is set.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1600943 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit b3266c5be2f88210589cfa56a5fe0a5092f79ee6)
Sébastien Han [Mon, 30 Jul 2018 16:29:00 +0000 (18:29 +0200)]
config: enforce socket name
This was introduced by
https://github.com/ceph/ceph/commit/59ee2e8d3b14511e8d07ef8325ac8ca96e051784
and made our socket checks impossible to run. The PID could be found,
but the cctid cannot.
This happens during upgrade to mimic and on cluster running on mimic.
So let's force the admin socket the way it was so we can properly check
for existing instances also the line $cluster-$name.$pid.$cctid.asok
is only needed when running multiple instances of the same daemon,
thing ceph-ansible cannot do at the time of writing
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1610220 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ea9e60d48d6631631ac9294d4ef291f8d7a30d78)
tests: support update scenarios in test_rbd_mirror_is_up()
`test_rbd_mirror_is_up()` is failing on update scenarios because it
assumes the `ceph_stable_release` is still set to the value of the
original ceph release, it means it won't enter in the right part of the
condition and fails.
Mike Christie [Wed, 25 Jul 2018 18:13:17 +0000 (13:13 -0500)]
igw: fix image removal during purge
We were not passing in the ceph conf info into the rbd image removal
command, so if the clustername was not the default igw purge would fail
due to the rbd rm command failing.
This just fixes the bug by passing in the ceph conf info which has the
clustername to use.
This fixes Red Hat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1601949
Mike Christie [Thu, 26 Jul 2018 18:52:44 +0000 (13:52 -0500)]
igw: do not fail purge on rbd removal errors
Instead of failing the entire purge operation when the rbd command fails
just log an error. This will allow the higher level target and config
cleanup to complete, and the user only has to manually delete the rbd
images.
Sébastien Han [Fri, 27 Jul 2018 14:52:19 +0000 (16:52 +0200)]
osd: do not remove expose_partition container
The container runs with --rm which means it will be deleted by Docker
when exiting. Also 'docker rm -f' is not idempotent and returns 1 if the
container does not exist.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1609007 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2ca8c519066555e06a261d5dee3fb46ce5daad0b)
jewel used to create a default `rbd` pool in the default crush root
`default`, we need to have at least 1 osd to satisfy the PGs for this
created pool, otherwise the cluster will be in HEALTH_ERR state because
of `pgs stuck unclean`/`pgs stuck inactive`
Mike Christie [Thu, 26 Jul 2018 18:30:36 +0000 (13:30 -0500)]
ceph ansible 3.1 igw: fix rbd-target-gw startup
The problem is rbd-target-gw needs the rbd pool to be created, keyring
to be copied over, and the iscsi-gateway.cfg to be setup before starting
the rbd-target-gw service.
In the master branch this is fixed by this commit:
since no latest-bis-jewel exists, it's using latest-bis which points to
ceph mimic. In our testing, using it for idempotency/handlers tests
means upgrading from jewel to mimic which is not what we want do.
client: do not rely on copy_admin_key to import keys
Relying on `copy_admin_key` to import created keys on client nodes makes
us obliged to copy admin key on those nodes which is not something we might
want.
We should use the fact `condition_copy_admin_key` which will be set to
`True` when the delegated node is a mon which means we can import keys
without taking care of admin keyring.
We must check in the generated fact `_disabled_ceph_mgr_modules` to
enable disabled mgr module.
Otherwise, this task will be skipped because it's not comparing the
right list.
tests: skip rgw_tuning_pools_are_set when rgw_create_pools is not defined
since ooo_collocation scenario is supposed to be the same scenario than the
one tested by OSP and they are not passing `rgw_create_pools` the test
`test_docker_rgw_tuning_pools_are_set` will fail:
```
> pools = node["vars"]["rgw_create_pools"]
E KeyError: 'rgw_create_pools'
```
skipping this test if `node["vars"]["rgw_create_pools"]` is not defined
fixes this failure.
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
tests: fix broken test when collocated daemons scenarios
At the moment, a lot of tests are skipped when daemons are collocated.
Our tests consider a node belong to only 1 group while it's possible for
certain scenario it can belong to multiple groups.
Also pinning to pytest 3.6.1 so we can use `request.node.iter_markers()`
tests: fix `_get_osd_id_from_host()` in TestOSDs()
We must initialize `children` variable in `_get_osd_id_from_host()`,
otherwise, if for any reason the deployment has failed and result with
an osd host with no OSD registered, we won't enter in the condition,
therefore, `children` is never set and the function tries to return
something undefined.
Typical error:
```
E UnboundLocalError: local variable 'children' referenced before assignment
```
these tests are skipped on bluestore osds scenarios.
they were going to fail anyway since they are run on mon nodes and
`devices` is defined in inventory for each osd node. It means
`num_devices * num_osd_hosts` returns `0`.
The result is that the test expects to have 0 OSDs up.
The idea here is to move these tests so they are run on OSD nodes.
Each OSD node checks their respective OSD to be UP, if an OSD has 2
devices defined in `devices` variable, it means we are checking for 2
OSD to be up on that node, if each node has all its OSD up, we can say
all OSD are up.
tests: factorize docker tests using docker_exec_cmd logic
avoid duplicating test unnecessarily just because of docker exec syntax.
Using the same logic than in the playbook with `docker_exec_cmd` allow us
to execute the same test on both containerized and non containerized environment.
The idea is to set a variable `docker_exec_cmd` with the
'docker exec <container-name>' string when containerized and
set it to '' when non containerized.
This part has changed from mimic and became:
```
"servicemap": {
"epoch": 2,
"modified": "2018-07-04 09:54:36.164786",
"services": {
"rbd-mirror": {
"daemons": {
"summary": "",
"14151": {
"start_epoch": 2,
"start_stamp": "2018-07-04 09:54:35.541272",
"gid": 14151,
"addr": "192.168.1.80:0/240942528",
"metadata": {
"arch": "x86_64",
"ceph_release": "mimic",
"ceph_version": "ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)",
"ceph_version_short": "13.2.0",
"cpu": "Intel(R) Xeon(R) CPU X5650 @ 2.67GHz",
"distro": "centos",
"distro_description": "CentOS Linux 7 (Core)",
"distro_version": "7",
"hostname": "ceph-rbd-mirror0",
"id": "ceph-rbd-mirror0",
"instance_id": "14151",
"kernel_description": "#1 SMP Wed May 9 18:05:47 UTC 2018",
"kernel_version": "3.10.0-862.2.3.el7.x86_64",
"mem_swap_kb": "1572860",
"mem_total_kb": "1015548",
"os": "Linux"
}
}
}
}
}
}
```
This patch modifies the function `test_rbd_mirror_is_up()` in
`test_rbd_mirror.py` so it works with `mimic` and keeps backward compatibility
with `luminous`
On containerized deployment, if a mon is stopped, the socket is not
purged and can cause failure when a cluster is redeployed after the
purge playbook has been run.
Typical error:
```
fatal: [osd0]: FAILED! => {}
MSG:
'dict object' has no attribute 'osd_pool_default_pg_num'
```
the fact is not set because of this previous failure earlier:
Sébastien Han [Thu, 5 Jul 2018 12:10:33 +0000 (14:10 +0200)]
ceph-config: do not log cluster log on container
The container image recently merged both cluster and mon log into a
single stream. Following this, we now see this warning coming from the
container image:
2018-06-19 13:44:01.542990 7ff75b024700 1 mon.vm02@1(peon).log
v57928205 unable to write to '/var/log/ceph/ceph.log' for channel
'cluster': (2) No such file or directory
So we now tell the mon to not log cluster log on the filesystem.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591771 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 713b9fcf9b825ba84b07781d05c967238ee96c14)
Sébastien Han [Fri, 29 Jun 2018 10:10:16 +0000 (12:10 +0200)]
ceph-client: do not kill the dummy container
The container runs for 300 sec, then dies and removes itself thanks to
the '--rm' option, so there is no point of removing it. Also this is
causing failure under some circonstances.
Sébastien Han [Thu, 28 Jun 2018 07:53:03 +0000 (09:53 +0200)]
systemd: remove changed_when: false
When using a module there is no need to apply this Ansible option. The
module will handle the idempotency on its own. So the module decides
wether or not the task has changed during the execution.
Sébastien Han [Thu, 28 Jun 2018 07:54:24 +0000 (09:54 +0200)]
ceph-osd: trigger osd container restart on script change
The script ceph-osd-run.sh holds the config options to start the
container, if one of these options are modified we must restart the
container. This was not the case before becauase the 'notify' flag
wasn't present.
Sébastien Han [Wed, 27 Jun 2018 09:23:00 +0000 (11:23 +0200)]
mon: honour mon_docker_net_host option
--net=host was hardcoded in the startup line so even though
mon_docker_net_host was set to False the net option would always be
activated.
mon_docker_net_host is set to True by default so this commit does not
change the behaviour.
Sébastien Han [Fri, 15 Jun 2018 19:53:47 +0000 (15:53 -0400)]
common: start firewalld if configure_firewall
Currently we expect that if configure_firewall is set to True to have
firewalld enabled and running. Let's enforce that.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bea4027f0c2b5beebf409a00e7b61e923f4fea0c) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 15 Jun 2018 19:39:34 +0000 (15:39 -0400)]
mon/osd: bump container memory limit
As discussed with the cores, the current limits are too low and should
be bumped to higher value.
So now by default monitors get 3GB and OSDs get 5GB.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1591876 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a9ed3579ae680949b4f53aee94003ca50d1ae721) Signed-off-by: Sébastien Han <seb@redhat.com>
client: try to kill dummy container only on first client node
The 'dummy' container is created only on first client node, it means we
must seek to destroy this container only on this node, otherwise this
can cause failure like following :
```
fatal: [192.168.24.8]: FAILED! => {"changed": false, "cmd": ["docker", "rm",
"-f", "ceph-create-keys"], "delta": "0:00:00.023692", "end": "2018-06-12
20:56:07.261278", "msg": "non-zero return code", "rc": 1, "start":
"2018-06-12 20:56:07.237586", "stderr": "Error response from daemon: No such
container: ceph-create-keys", "stderr_lines": ["Error response from daemon: No
such container: ceph-create-keys"], "stdout": "", "stdout_lines": []}
ceph-osd: set 'openstack_keys_tmp' only when 'openstack_config' is defined.
If 'openstack_config' is false this task shouldn't be executed.
Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
(cherry picked from commit 3a07568496f718ffe44077eac23d7397a17b3b09) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 11 Jun 2018 12:51:58 +0000 (14:51 +0200)]
common: ability to enable/disable fw configuration
Prior to this patch if you were running on a Red Hat system,
ceph-ansible would try to configure firewalld for you without the
operators's consent.
Now you can enable or disable the fw configuration by setting
configure_firewall to either true or false.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1589146 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2e8412734a81077c2b11de316e08bbecbed2de96)
tests: increase memory to 1024Mb for centos7_cluster scenario
we see more and more failure like `fatal: [mon0]: UNREACHABLE! => {}` in
`centos7_cluster` scenario, Since we have 30Gb RAM on hypervisors, we
can give monitors a bit more RAM. By the way, nodes on containerized cluster
testing scenario have already 1024Mb memory allocated.
client: keyrings aren't created when single client node
combining `run_once: true` with `inventory_hostname ==
groups.get(client_group_name) | first` might cause bug when the only
node being run is not the first in the group.
In a deployment with a single client node it might cause issue because
sometimes keyring won't be created since the task could be definitively
skipped.
Potential error if someone doesnt pass the mode in `keys` dict for
client nodes:
```
fatal: [client2]: FAILED! => {}
MSG:
The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'mode'
The error appears to have been in '/home/guits/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml': line 117, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: get client cephx keys
^ here
exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
exception: 'dict object' has no attribute 'mode'
```
adding a default value will avoid the deployment failing for this.
client: use dummy created container when there is no mon in inventory
the `docker_exec_cmd` fact set in client role when there is no monitor
in inventory is wrong, `ceph-client-{{ hostname }}` is never created so
it will fail anyway.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 7b156deb67a9e137962161829e008bcc32835fe8) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 5 Jun 2018 03:56:55 +0000 (11:56 +0800)]
test: do not always copy admin key
The admin key must be copied on the osd nodes only when we test the
shrink scenario. Shrink relies on ceph-disk commands that require the
admin key on the node where it's being executed.
Now we only copy the key when running on the shrink-osd scenario.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 41b4632abca51b4f1ab052e8b47d0bebd2e838e8) Signed-off-by: Sébastien Han <seb@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1583020 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 91f9da530f139cc6f378d1fc549870cbbc45d460) Signed-off-by: Sébastien Han <seb@redhat.com>
41b4632 has introduced a change in functionnals tests.
Since the admin keyring isn't copied on rgw nodes anymore in tests, let's use
the rgw keyring to achieve them.
Refact of 8704144e3157aa253fb7563fe701d9d434bf2f3e
There is no need to have duplicated tasks for this. The rgw pools
creation should be delegated on a monitor node se we don't have to care
if the admin keyring is present on rgw node.
By the way, only one task is needed to create the pools, we just need to
use the `docker_exec_cmd` fact already defined in `ceph-defaults` to
achieve it.
Erwan Velu [Fri, 1 Jun 2018 16:53:10 +0000 (18:53 +0200)]
ceph-defaults: Enable local epel repository
During the tests, the remote epel repository is generating a lots of
errors leading to broken jobs (issue #2666)
This patch is about using a local repository instead of a random one.
To achieve that, we make a preliminary install of epel-release, remove
the metalink and enforce a baseurl to our local http mirror.
That should speed up the build process but also avoid the random errors
we face.
This patch is part of a patch series that tries to remove all possible yum failures.
Andy McCrae [Mon, 19 Feb 2018 16:57:18 +0000 (16:57 +0000)]
Fix template reference for ganesha.conf
We can simply reference the template name since it exists within the
role that we are calling. We don't need to check the ANSIBLE_ROLE_PATH
or playbooks directory for the file.
Since the openstack_config.yml has been moved to `ceph-osd` we must move
this `set_fact` in ceph-osd otherwise the tasks in
`openstack_config.yml` using `openstack_keys` will actually use the
defaults value from `ceph-defaults`.
osds: wait for osds to be up before creating pools
This is a follow up on #2628.
Even with the openstack pools creation moved later in the playbook,
there is still an issue because OSDs are not all UP when trying to
create pools.
Adding a task which checks for all OSDs to be UP with a `retries/until`
condition should definitively fix this issue.
Fix a typo in `tag` target, double quote are missing here.
Without them, the `make tag` command fails like this:
```
if [[ "v3.0.35" == ]]; then \
echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; \
exit 1; \
fi
/bin/sh: -c: line 0: unexpected argument `]]' to conditional binary operator
/bin/sh: -c: line 0: syntax error near `;'
/bin/sh: -c: line 0: `if [[ "v3.0.35" == ]]; then echo "e5f2df8 on stable-3.0 is already tagged as v3.0.35"; exit 1; fi'
make: *** [tag] Error 2
```
Ken Dreyer [Thu, 10 May 2018 23:08:05 +0000 (17:08 -0600)]
Makefile: add "make tag" command
Add a new "make tag" command. This automates some common operations:
1) Automatically determine the next Git tag version number to create.
For example:
"3.2.0beta1 -> "3.2.0beta2"
"3.2.0rc1 -> "3.2.0rc2"
"3.2.0" -> "3.2.1"
2) Create the Git tag, and print instructions for the user to push it to
GitHub.
3) Sanity check that HEAD is a stable-* branch or master (bail on
everything else).
4) Sanity check that HEAD is not already tagged.
Note, we will still need to tag manually once each time we change the
format, for example when moving from tagging "betas" to tagging "rcs",
or "rcs" to "stable point releases".
Signed-off-by: Ken Dreyer <kdreyer@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit fcea56849578bd47e65b130ab6884e0b96f9d89d) Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
Sébastien Han [Mon, 16 Apr 2018 13:57:23 +0000 (15:57 +0200)]
rgw: container add option to configure multi-site zone
You can now use RGW_ZONE and RGW_ZONEGROUP on each rgw host from your
inventory and assign them a value. Once the rgw container starts it'll
pick the info and add itself to the right zone.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1551637 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 1c084efb3cb7e48d96c9cbd6bd05ca4f93526853) Signed-off-by: Sébastien Han <seb@redhat.com>
It should have been backported from 29a9dff but for better clarity I
think it's better to create a new commit for this.
c68126d6 aims to not make `pgs` attribute mandatory for each element of
`cephfs_pools`. Therefore, we must remove the check in
`roles/ceph-mon/tasks/check_mandatory_vars.yml`.
This task has been removed by 29a9dff but I've chosen to not backport
this commit since it's part of a bunch of commits belonging to a PR
implementing `ceph-validate` role.
When playing ceph-mds role, mon nodes have set a fact with the default
pg num for osd pools, we can simply default to this value for cephfs
pools (`cephfs_pools` variable).
At the moment the variable definition for `cephfs_pools` looks like:
and we have a task in `ceph-validate` to ensure `pgs` has been set to a
valid value.
We could simply avoid this check by setting the default value of `pgs`
to `hostvars[groups[mon_group_name][0]]['osd_pool_default_pg_num']` and
let to users the possibility to override this value.
in `ceph-osd` there is no need to set `docker_exec_cmd` since the only
place where this fact is used is in `openstack_config.yml` which
delegate all docker command to a monitor node. It means we need the
`docker_exec_cmd` fact that has been set referring to `ceph-mon-*`
containers, this fact is already set earlier in `ceph-defaults`.
By the way, when collocating an OSD with a MON it fails because the container
`ceph-osd-{{ ansible_hostname }}` doesn't exist.
Removing this task will allow to collocate an OSD with a MON.
For a few moment we can see failures in the CI for containerized
scenarios because VMs are running out of space at some point.
The default in the images used is to have only 3Gb for root partition
which doesn't sound like a lot.
Typical error seen:
```
STDERR:
failed to register layer: Error processing tar file(exit status 1): open /usr/share/zoneinfo/Atlantic/Canary: no space left on device
```
Indeed, on the machine we can see:
```
Every 2.0s: df -h Tue May 29 17:21:13 2018
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/atomicos-root 3.0G 3.0G 14M 100% /
```
The idea here is to expand this partition with all the available space
remaining by issuing an `lvresize` followed by an `xfs_growfs`.
```
-bash-4.2# lvresize -l +100%FREE /dev/atomicos/root
Size of logical volume atomicos/root changed from <2.93 GiB (750 extents) to 9.70 GiB (2484 extents).
Logical volume atomicos/root successfully resized.
```
In the CI we can see at many times failures like following:
`Failure talking to yum: Cannot find a valid baseurl for repo:
base/7/x86_64`
It seems the fastest mirror detection is sometimes counterproductive and
leads yum to fail.
This fix has been added in the `setup.yml`.
This playbook was used until now only just before playing `testinfra`
and could be used before running ceph-ansible so we can add some
provisionning tasks.
When collocating mds on monitor node, the cephpfs will fail
because `docker_exec_cmd` is reset to `ceph-mds-monXX` which is
incorrect because we need to delegate the task on `ceph-mon-monXX`.
In addition, it wouldn't have worked since `ceph-mds-monXX` container
isn't started yet.
Moving the task earlier in the `ceph-mds` role will fix this issue.
Paul Cuzner [Fri, 25 May 2018 00:13:20 +0000 (12:13 +1200)]
Add privilege escalation to iscsi purge tasks
Without the escalation, invocation from non-root
users with fail when accessing the rados config
object, or when attempting to log to /var/log
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1549004 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 2890b57cfc2e1ef9897a791ce60f4a5545011907) Signed-off-by: Sébastien Han <seb@redhat.com>
Since we fixed the `gather and delegate facts` task, this exception is
not needed anymore. It's a leftover that should be removed to save some
time when deploying a cluster with a large client number.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 828848017cefd981e14ca9e4690dd7d1320f0eef) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 23 May 2018 19:44:24 +0000 (12:44 -0700)]
group_vars: resync group_vars
The previous commit changed the content of roles/$ROLE/default/main.yml
so we have to re generate the group_vars files.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 3c32280ca1093f6c3abe0038f524ee3b88dd3672) Signed-off-by: Sébastien Han <seb@redhat.com>
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.
The idea here is to move cephfs pools creation in `ceph-mds` role.
let's move this variable in group_vars/all.yml in all testing scenarios
accordingly to this commit 1f15a81c480f60bc82bfc3a1aec3fe136e6d3bc4 so
we keep consistency between the playbook and the tests.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit a10e73d78d07179ff20ea7cabc2f2ccd1b1b967f) Signed-off-by: Sébastien Han <seb@redhat.com>
When deploying a large number of OSD nodes it can be an issue because the
protection check [1] won't pass since it tries to create pools before all
OSDs are active.
The idea here is to move openstack pools creation at the end of `ceph-osd` role.
Luigi Toscano [Tue, 22 May 2018 09:46:33 +0000 (11:46 +0200)]
ceph-radosgw: disable NSS PKI db when SSL is disabled
The NSS PKI database is needed only if radosgw_keystone_ssl
is explicitly set to true, otherwise the SSL integration is
not enabled.
It is worth noting that the PKI support was removed from Keystone
starting from the Ocata release, so some code paths should be
changed anyway.
Also, remove radosgw_keystone, which is not useful anymore.
This variable was used until fcba2c801a122b7ce8ec6a5c27a70bc19589d177.
Now profiles drives the setting of rgw keystone *.
Signed-off-by: Luigi Toscano <ltoscano@redhat.com>
(cherry picked from commit 43e96c1f98312734e2f12a1ea5ef29981e9072bd) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 4 May 2018 23:41:49 +0000 (01:41 +0200)]
rhcs: bump version to 3.0 for stable 3.1
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1519835 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit bf9593bcedea6bd0220eeeb4029c6632f5a8e6f6) Signed-off-by: Sébastien Han <seb@redhat.com>
Vishal Kanaujia [Wed, 16 May 2018 09:58:31 +0000 (15:28 +0530)]
Skip GPT header creation for lvm osd scenario
The LVM lvcreate fails if the disk already has a GPT header.
We create GPT header regardless of OSD scenario. The fix is to
skip header creation for lvm scenario.
Sébastien Han [Tue, 22 May 2018 23:52:40 +0000 (16:52 -0700)]
rolling_update: fix get fsid for containers
When running ansible2.4-update_docker_cluster there is an issue on the
"get current fsid" task. The current task only works for
non-containerized deployment but will run all the time (even for
containerized). This currently results in the following error:
This is not really representative on the real error since the 'ceph' cli is available on that machine.
On other environments we will have something like "command not found: ceph".
Fix restarting OSDs twice during a rolling update.
During a rolling update, OSDs are restarted twice currently. Once, by the
handler in roles/ceph-defaults/handlers/main.yml and a second time by tasks
in the rolling_update playbook. This change turns off restarts by the handler.
Further, the restart initiated by the rolling_update playbook is more
efficient as it restarts all the OSDs on a host as one operation and waits
for them to rejoin the cluster. The restart task in the handler restarts one
OSD at a time and waits for it to join the cluster.
Sébastien Han [Wed, 16 May 2018 15:37:10 +0000 (17:37 +0200)]
switch: disable ceph-disk units
During the transition from jewel non-container to container old ceph
units are disabled. ceph-disk can still remain in some cases and will
appear as 'loaded failed', this is not a problem although operators
might not like to see these units failing. That's why we remove them if
we find them.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1577846 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 49a47124859e6577fb99e6dd680c5244ccd6f38f) Signed-off-by: Sébastien Han <seb@redhat.com>