David Waiting [Mon, 10 Dec 2018 14:54:18 +0000 (09:54 -0500)]
ensure at least one osd is up
The existing task checks that the number of OSDs is equal to the number of up OSDs before continuing.
The problem is that if none of the OSDs have been discovered yet, the task will exit immediately and subsequent pool creation will fail (num_osds = 0, num_up_osds = 0).
In this change, we also check that at least one OSD is present. In our testing, this results in the task correctly waiting for all OSDs to come up before continuing.
Signed-off-by: David Waiting <david_waiting@comcast.com>
switch_to_containers: use ceph binary from container
use the ceph binary from the container instead of the host.
If the ceph CLI version isn't compatible between host and container
image, it can cause the CLI to hang.
instead of using `RuntimeDirectory` parameter in systemd unit files,
let's use a systemd `tmpfiles.d` to ensure `/run/ceph`.
Explanation:
`podman` doesn't create the `/var/run/ceph` if it doesn't exist the time
where the container is run while `docker` used to create it.
In case of `switch_to_containers` scenario, `/run/ceph` gets created by
a tmpfiles.d systemd file; when switching to containers, the systemd
unit file complains because `/run/ceph` already exists
The better fix would be to ensure `/usr/lib/tmpfiles.d/ceph-common.conf`
is removed and only rely on `RuntimeDirectory` from systemd unit file parameter
but we come from a non-containerized environment which is already running,
it means `/run/ceph` is already created and when starting the unit to
start the container, systemd will still complain and we can't simply
remove the directory if daemons are collocated.
switch_to_containers: do not try to redeploy monitors
`ceph-mon` tries to redeploy monitors because it assumes it was not yet
deployed since `mon_socket_stat` and `ceph_mon_container_stat` are
undefined (indeed, we stop the daemon before calling `ceph-mon` in the
switch_to_containers playbook).
Rishabh Dave [Tue, 12 Feb 2019 06:55:13 +0000 (12:25 +0530)]
fix mistake in task that aborts when ntpd is chosen on Atomic
Since it's already confusing whether ntp_daemon_type should be "ntp" or
"ntpd", fix the mistake in the title of the task that aborts if
ntp_daemon_type is set to "ntpd" and OS being used is Atomic.
Sébastien Han [Fri, 8 Feb 2019 15:05:20 +0000 (16:05 +0100)]
mon: do not hardcode ceph uid
167 is the ceph uid for Red Hat based system, thus trying to deploy a
monitor on Debian fail since the ceph user id on that system is 64045.
This commit uses the ceph_uid variable which contains the right uid
based on system/container detection.
Closes: https://github.com/ceph/ceph-ansible/issues/3589 Signed-off-by: Sébastien Han <seb@redhat.com>
Leah Neukirchen [Thu, 7 Feb 2019 17:09:21 +0000 (18:09 +0100)]
Fix uses of default(omit) with string concatenation
When {{omit}} is concatenated with another string, it expands to something
like __omit_place_holder__63eea0d96dd6ed867b95405e11d87dddf61f448d.
However, in these use-cases we need an empty string.
Typical error:
```
fatal: [iscsi-gw0]: FAILED! =>
msg: 'an error occurred while trying to read the file ''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'': [Errno 13] Permission denied: b''/home/guits/ceph-ansible/tests/functional/all_daemons/fetch/e5f4ab94-c099-4781-b592-dbd440a9d6f3/iscsi-gateway.key'''
```
`become: True` is not needed on the following task:
`copy crt file(s) to gateway nodes`.
Since it's already set in the main playbook (site.yml/site-container.yml)
The thing is that the files get generated in the 'fetch_directory' with
root user because there is a 'delegate_to' + we run the playbook with
`become: True` (from main playbook).
The idea here is to create files under ansible user so we can open them
later to copy them on the remote machine.
John Fulton [Thu, 31 Jan 2019 21:17:20 +0000 (16:17 -0500)]
Fix CNI error when net=host is not used in some podman calls
With 'podman version 1.0.0' on RHEL8 beta the 'get ceph version' and
'ceph monitor mkfs' commands fail [1] with "error configuring network
namespace for container Missing CNI default network".
When net=host is added these errors are resolved. net=host is used in
many other calls (grep -R net=host | wc -l --> 38).
when ceph-container-common notifies handlers because a new container
image has been pulled, ceph-handler will throw an error because of
undefined variables since they are set in ceph-facts role.
- also add `--foreground` which seems to fix some issue we are facing when
using timeout with `podman`.
- use this fact in the `is ceph running already?` task.
John Fulton [Fri, 1 Feb 2019 13:32:14 +0000 (08:32 -0500)]
Make python print statements python3 compatible
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.
Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721 Signed-off-by: John Fulton <fulton@redhat.com>
John Fulton [Wed, 30 Jan 2019 20:06:55 +0000 (15:06 -0500)]
Do not timeout podman/docker pull if timeout value is '0'
If user sets "docker_pull_timeout: '0'" then do not use the
timeout command when running podman/docker pull. Also, use
"timeout -s KILL"; without KILL, podman on RHEL8 beta does
not timeout and deployment can hang.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1670625 Signed-off-by: John Fulton <fulton@redhat.com>
tests: do not play dev_setup on containerized deployment
using `!` mark in tox.ini doesn't work on comma separated list.
The idea here is to skip all containerized scenario in dev_setup.yml and
use the `!` for the update scenario.
fix the wrong path used in various rgw testinfra tests.
set `1` as default value for `radosgw_num_instances`: if
`ansible_vars.get(radosgw_num_instances)` returns `None`, we can assume
there's only 1 instance since it's the default value in ceph-defaults.
This commit reorganizes the testing directory layout.
The idea is to have more consistency with the names of scenario and
their corresponding path, eg: non-container vs. container: each scenario
has a subdirectory for container deployment.
this commit refacts the way the environment are named by adding a factor
`{non_container,container}`. This will avoid a lot of duplicate
definition in tox.ini and bring kind of consistency.
tests: update default value for CEPH_STABLE_RELEASE
- update value for `CEPH_STABLE_RELEASE`: next release will ship with
`nautilus`. This variable is used for stable branch only, this way, it
will be ready when next stable version will be released.
- test upgrade from mimic to ceph@master: don't run dev_setup.yml on update
scenario, and run it in [update] section so we update from mimic to
ceph@master.
- run lvm_setup.yml for all scenarios except `lvm_batch`
1ac94c048ff1d1385de2892d0ecef7879ec563e9 introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.
config: make sure ceph_release is set for all client node
`ceph_release` is set in `ceph-container-common` but this role is
played only on first node for clients, this means ceph-config will fail
on all client nodes except the first one.
This commit ensure ceph_release is set for all client nodes.
Sébastien Han [Mon, 21 Jan 2019 11:08:56 +0000 (12:08 +0100)]
mon: enable msgr2
Enabling msgr2 style declaration for Nautilus and above. Prior releases
will keep the right syntax.
When upgrading from Mimic to Nautilus we must maintain something in the
form of:
Sébastien Han [Wed, 9 Jan 2019 12:23:07 +0000 (13:23 +0100)]
mon: ability to change mon listening port on container
You can now use 'ceph_mon_container_listen_port' to change the port the
monitor will listen on.
Setting the default to 3300 (assigned by IANA) since Nautilus has released the messenger2
transport protocol.
Sébastien Han [Mon, 21 Jan 2019 12:53:53 +0000 (13:53 +0100)]
Revert "mon: force peer addition"
This reverts commit ee08d1f89a588e878324141dd0f80c65058a377d which was
mostly to workaround a bug in ceph@master. Now, ceph@master is fixed so
reverting this. Thanks to https://github.com/ceph/ceph/pull/25900
guihecheng [Fri, 9 Nov 2018 00:56:57 +0000 (08:56 +0800)]
rgw: add support for multiple rgw instances on a single host
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.