John Fulton [Thu, 31 Jan 2019 21:17:20 +0000 (16:17 -0500)]
Fix CNI error when net=host is not used in some podman calls
With 'podman version 1.0.0' on RHEL8 beta the 'get ceph version' and
'ceph monitor mkfs' commands fail [1] with "error configuring network
namespace for container Missing CNI default network".
When net=host is added these errors are resolved. net=host is used in
many other calls (grep -R net=host | wc -l --> 38).
when ceph-container-common notifies handlers because a new container
image has been pulled, ceph-handler will throw an error because of
undefined variables since they are set in ceph-facts role.
- also add `--foreground` which seems to fix some issue we are facing when
using timeout with `podman`.
- use this fact in the `is ceph running already?` task.
John Fulton [Fri, 1 Feb 2019 13:32:14 +0000 (08:32 -0500)]
Make python print statements python3 compatible
The restart_osd_daemon.sh generated from the j2 template
contains a python call which uses 'print x' instead of
'print(x)'. Add the missing parentheses to make this call
compatible with both 2 and 3.
Also add parentheses to other python print calls found
in roles/ceph-client/defaults/main.yml and
infrastructure-playbooks/cluster-os-migration.yml.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1671721 Signed-off-by: John Fulton <fulton@redhat.com>
John Fulton [Wed, 30 Jan 2019 20:06:55 +0000 (15:06 -0500)]
Do not timeout podman/docker pull if timeout value is '0'
If user sets "docker_pull_timeout: '0'" then do not use the
timeout command when running podman/docker pull. Also, use
"timeout -s KILL"; without KILL, podman on RHEL8 beta does
not timeout and deployment can hang.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1670625 Signed-off-by: John Fulton <fulton@redhat.com>
tests: do not play dev_setup on containerized deployment
using `!` mark in tox.ini doesn't work on comma separated list.
The idea here is to skip all containerized scenario in dev_setup.yml and
use the `!` for the update scenario.
fix the wrong path used in various rgw testinfra tests.
set `1` as default value for `radosgw_num_instances`: if
`ansible_vars.get(radosgw_num_instances)` returns `None`, we can assume
there's only 1 instance since it's the default value in ceph-defaults.
This commit reorganizes the testing directory layout.
The idea is to have more consistency with the names of scenario and
their corresponding path, eg: non-container vs. container: each scenario
has a subdirectory for container deployment.
this commit refacts the way the environment are named by adding a factor
`{non_container,container}`. This will avoid a lot of duplicate
definition in tox.ini and bring kind of consistency.
tests: update default value for CEPH_STABLE_RELEASE
- update value for `CEPH_STABLE_RELEASE`: next release will ship with
`nautilus`. This variable is used for stable branch only, this way, it
will be ready when next stable version will be released.
- test upgrade from mimic to ceph@master: don't run dev_setup.yml on update
scenario, and run it in [update] section so we update from mimic to
ceph@master.
- run lvm_setup.yml for all scenarios except `lvm_batch`
1ac94c048ff1d1385de2892d0ecef7879ec563e9 introduced the support of
multiple rgw instances on a single host but somehow has missed to
implement this feature in rolling_update.
config: make sure ceph_release is set for all client node
`ceph_release` is set in `ceph-container-common` but this role is
played only on first node for clients, this means ceph-config will fail
on all client nodes except the first one.
This commit ensure ceph_release is set for all client nodes.
Sébastien Han [Mon, 21 Jan 2019 11:08:56 +0000 (12:08 +0100)]
mon: enable msgr2
Enabling msgr2 style declaration for Nautilus and above. Prior releases
will keep the right syntax.
When upgrading from Mimic to Nautilus we must maintain something in the
form of:
Sébastien Han [Wed, 9 Jan 2019 12:23:07 +0000 (13:23 +0100)]
mon: ability to change mon listening port on container
You can now use 'ceph_mon_container_listen_port' to change the port the
monitor will listen on.
Setting the default to 3300 (assigned by IANA) since Nautilus has released the messenger2
transport protocol.
Sébastien Han [Mon, 21 Jan 2019 12:53:53 +0000 (13:53 +0100)]
Revert "mon: force peer addition"
This reverts commit ee08d1f89a588e878324141dd0f80c65058a377d which was
mostly to workaround a bug in ceph@master. Now, ceph@master is fixed so
reverting this. Thanks to https://github.com/ceph/ceph/pull/25900
guihecheng [Fri, 9 Nov 2018 00:56:57 +0000 (08:56 +0800)]
rgw: add support for multiple rgw instances on a single host
With this, we could have multiple rgw instances on a single host
with a single run, don't have to use rgw-standalone.yml which does not
seems able to bind ports separately.
If you want to have multiple rgw instances, just change 'radosgw_instances'
to the number you want, which defaults to 1.
Not compatible with Multi-Site yet.
to avoid duplicating code in `site.yml.sample`, `site-docker.yml.sample`
and `setup.yml`, let's isolate this part of the code and simply include
it each time we need it.
This condition is useless and it's also creating issues we don't see in
our CI. ceph_release is set by either ceph-common or ceph-docker-common
so let's keep it this way.
Sébastien Han [Tue, 8 Jan 2019 17:36:14 +0000 (18:36 +0100)]
mon: force peer addition
Somewhat something changed with the introduction of msg2 and we have to
add each node as a peer so the monitors can form a quorum. This might be
due to our CI environment, although adding this is completly harmless
and solves monitors not being able to form quorum.
It seems that the initial monitor map wasn't containing the right
information about the peers (addresses like 0.0.0.0/0r1, for each rank.
Bruceforce [Thu, 3 Jan 2019 15:08:58 +0000 (16:08 +0100)]
nfs-ganesha: fixed nfs_ganesha_dev_apt_repo variable
The nfs_ganesha_dev_apt_repo variable was set incorrect in task
"fetch nfs-ganesha development repository"
Rishabh Dave [Wed, 12 Dec 2018 11:15:00 +0000 (16:45 +0530)]
ceph-infra: merge ntp_debian.yml and ntp_rpm.yml
Merge ntp_debian.yml and ntp_rpm.yml into one (the new file is called
setup_ntp.yml) since they are almost identical. Also avoid repetition
of the common setup step for ntpd and chronyd services.
Rishabh Dave [Wed, 2 Jan 2019 11:34:49 +0000 (17:04 +0530)]
copy certificates as root user
Since the current user on the controller node, might not have the
permission to read the TLS certificate and related files, copy these
files to the Ceph nodes as root user.
Fixes: https://github.com/ceph/ceph-ansible/issues/3465 Signed-off-by: Rishabh Dave <ridave@redhat.com>
update: do not enforce `serial: 1` on client nodes
There is no need to enforce `serial: 1` on client nodes.
Let's make it parameterizable by introducing a new *extra* variable
`client_update_batch`, if not filled this will default to `{{
ansible_forks }}`.
NOTE: this is only usable as an extra variable passed with
`-e client_update_batch=<num>`
Rishabh Dave [Mon, 17 Dec 2018 10:34:46 +0000 (16:04 +0530)]
set any_errors_fatal to true for all host sections
Add `any_errors_fatal: true` to all host sections in `site.yml.sample`
and `site-container.yml.sample` so that the playbook execution
ceases spontaneously and instantaneously when errors occurs.
Sébastien Han [Thu, 13 Dec 2018 10:38:23 +0000 (11:38 +0100)]
mon: remove ceph aliases for containers
These aliases have led to several issues making believe that ceph
binaries are actually present on the host when running the command.
However it wasn't explicit that the commands were only ran inside a
container.
It has brought to much confusion so we decided to remove them.
Closes: https://github.com/ceph/ceph-ansible/issues/3445 Signed-off-by: Sébastien Han <seb@redhat.com>