Sébastien Han [Thu, 2 Nov 2017 15:17:38 +0000 (16:17 +0100)]
osd: enhance backward compatibility
During the initial implementation of this 'old' thing we were falling
into this issue without noticing
https://github.com/moby/moby/issues/30341 and where blindly using --rm,
now this is fixed the prepare container disappears and thus activation
fail.
I'm fixing this for old jewel images.
Also this fixes the machine reboot case where the docker logs are
purgend. In the old scenario, we now store the log locally in the same
directory as the ceph-osd-run.sh script.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d4ed9a2064e503ac4a4fe978cb9e196ca9150272) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Thu, 5 Oct 2017 14:22:04 +0000 (16:22 +0200)]
ci: new osd scenarios
This commit add new osd scenarios, it aims to simplify the CI setup and
brings a better coverage on the OSD scenarios.
We decided to differentiate between filestore and bluestore, thinking
ahead when filestore won't be supported anymore.
So we now have two classes of tests:
* Filestore
* Bluestore
In each of those classes we have container and non-container.
Then for each we test the following:
* collocated
* collocated dmcrypt
* non-collocated
* non-collocated dmcrypt
* auto discovery collocated
* auto discovery collocated dmcrypt
This gives us a nice coverage and also reduces the footprint on the CI.
We are now up to 4 scenarios, each containing 6 OSD VMs.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit a53aa9e8b41606e2ff996f036a7a86679126cd92) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 23 Oct 2017 10:03:01 +0000 (12:03 +0200)]
Test ansible 2.4.1
We now test with Ansible 2.4. We had to change testinfra's version since
only recent versions work with 2.4. See:
https://github.com/philpep/testinfra/issues/249
Closes: https://github.com/ceph/ceph-ansible/issues/2087 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c4ad2477188c2d226a4ea2e0fa6693967d5b103c) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 27 Oct 2017 09:46:15 +0000 (11:46 +0200)]
default: remove dup variable
ceph_repository_type was declared multiple times. This commit fixes
this.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d2575c7f5e5520f6ee65c5007853b3248d2c7a10) Signed-off-by: Sébastien Han <seb@redhat.com>
John Fulton [Wed, 25 Oct 2017 23:46:02 +0000 (23:46 +0000)]
Make acls and mode parameters of opentack_keys optional
Only chmod or setfacl the requested keyring(s) in the
opentack_keys data structure when the mode or acls keys
of that data structure exist.
User may specify four permission combinations for the
keyring file(s): 1. only set ACL, 2. only set mode,
3. set neither mode nor ACL, 4. set mode and then ACL.
Sébastien Han [Thu, 26 Oct 2017 12:18:38 +0000 (14:18 +0200)]
purge: do not reboot by default
Rebooting servers is really intrusive and perhaps this is not what the
operator wants. So we disable the reboot by default now. Note that the
reboot might not happen all the time.
It can be enabled by default by running the purge playbook with -e
reboot_osd_node=True
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1505011 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 2837d0a22e258cee583f14e402a99d89c9a16cd6) Signed-off-by: Sébastien Han <seb@redhat.com>
Andy McCrae [Mon, 23 Oct 2017 13:57:24 +0000 (14:57 +0100)]
Option to set TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
Use "ceph_tcmalloc_max_total_thread_cache" to set the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES value inside /etc/default/ceph for
Debian installs, or /etc/sysconfig/ceph for Red Hat/CentOS installs.
By default this is set to 0, so the default package value will be used,
if specified this value will be changed to match the variable, and ceph
osd services will be restarted.
Sébastien Han [Wed, 25 Oct 2017 13:45:37 +0000 (15:45 +0200)]
rgw/nfs: fix section duplication
Once and for all, hopefully...
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8670b45ef2cfcf35bac5d7f83b93099bfa1d9f9e) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Fri, 20 Oct 2017 13:15:38 +0000 (15:15 +0200)]
osd: bring backward compatibility with old Jewel images
There was a huge resync from luminous to jewel in ceph-docker:
https://github.com/ceph/ceph-docker/pull/797
This change brought a new handy function to discover partitions tight to
an OSD. This function doesn't exist in the old image so the
ceph-osd-run.sh script breaks when trying to deploy Jewel OSD with that
old Jewel image version.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 968ef04324e9064fcecfe88bc5464ad9c2673a13) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 18 Oct 2017 16:03:30 +0000 (18:03 +0200)]
all: backward compatibility between stable-2.2 and 3.0
stable-3.0 brought numerous changes in ceph-ansible variables, this PR
aims to maintain backward compatibility for someone running stable-2.2
upgrading to stable-3.0 but keeps its groups_vars untouched.
We will then determine the right options to make sure the upgrade works
but we are expecting that new variables should be used.
We will drop this in a near future, maybe 3.1 or 3.2.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 4413511b6619e22007b7988ab9929d618e0dcd01) Signed-off-by: Sébastien Han <seb@redhat.com>
upgrade: fix upgrade jewel to luminous for nfs nodes
nfs nodes can't be upgraded from jewel to luminous because ceph-nfs role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
package is upgraded in `ceph-nfs` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 982326373b9474231015639eac8fc52a3b0878a3) Signed-off-by: Sébastien Han <seb@redhat.com>
upgrade: fix upgrade jewel to luminous for mgr nodes
mgr nodes can't be upgraded from jewel to luminous because ceph-mgr role
is skipped because of the condition `when:
"ceph_release_num[ceph_release] >= ceph_release_num.luminous"`. Indeed,
ceph-mgr package is upgraded in `ceph-mgr` role, therefore,
`ceph_release` is still set to the old version. It means the when can't
be satisfied.
In Jewel, we don't use bootstrap-rbd keyring for rbd-mirror nodes, it
results with a socket path/name different according to which ceph
release you are deploying.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit c2850b11be8a69780eaceeb5bd5f3616979dd29a) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 17 Oct 2017 13:54:17 +0000 (15:54 +0200)]
defaults: fix handlers for collocation
When doing collocation the condition "inventory_hostname in play_hosts"
is breaking the restart workflow.
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 90b75185d5fc473b377fafced95d7b35a80896aa) Signed-off-by: Sébastien Han <seb@redhat.com>
ceph-defaults: fix handlers that are always triggered
Handlers are always triggered in ceph-ansible because ceph.conf file is
generated with a randomly order for the different keys/values pairs
in sections.
In python, a dict is not sorted. It means in our case each time we try
to generate the ceph.conf file it will be rendered with a random order
since the mecanism behind consist of rendering a file from a python dict
with keys/values. Therefore, as a quick workaround, forcing this dict to be
sorted before rendering the configuration file will ensure that it will be
rendered always the same way.
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ec042219e64a321fa67fce0384af76eeb238c645) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Tue, 17 Oct 2017 09:49:41 +0000 (11:49 +0200)]
rpm: remove ability to install ceph community version
Downstream version of ceph-ansible could still trigger install from
upstream repo and import keys.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1503019 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit c72ddee2d9e93e72722004b109733a68ffd6b8d1) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Mon, 16 Oct 2017 12:15:43 +0000 (14:15 +0200)]
upgrade: support for rbd mirror and nfs
- Add upgrade support for rbd mirror and nfs daemons.
- Only works with systemd (remove sysvinit and upstart occurence)
- A bit of cleanup
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d920d4839d029cc2eed4cb0556782a20f867ddcc) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 11 Oct 2017 16:29:34 +0000 (18:29 +0200)]
config: proper render ceph.conf when doing collocation
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit aa70b07ae20407b20ec3b71320d2148788d2742e) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 11 Oct 2017 11:21:37 +0000 (13:21 +0200)]
osd: rollback bindmount of /run/udev
This is causing unknown issues when trying to start a dmcrypt container.
Basically the container is stuck at mount opening the LUKS device. This
is still unknown why this is causing trouble but we need to move
forward. Also, this doesn't seem to help in any ways to fix the race
condition we've seen.
Here is the log for dmcrypt:
cryptsetup 1.7.4 processing "cryptsetup --debug --verbose --key-file
key luksClose fbf8887d-8694-46ca-b9ff-be79a668e2a9"
Running command close.
Locking memory.
Installing SIGINT/SIGTERM handler.
Unblocking interruption on signal.
Allocating crypt device context by device fbf8887d-8694-46ca-b9ff-be79a668e2a9.
Initialising device-mapper backend library.
dm version [ opencount flush ] [16384] (*1)
dm versions [ opencount flush ] [16384] (*1)
Detected dm-crypt version 1.14.1, dm-ioctl version 4.35.0.
Device-mapper backend running with UDEV support enabled.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Releasing device-mapper backend.
Trying to open and read device /dev/sdc1 with direct-io.
Allocating crypt device /dev/sdc1 context.
Trying to open and read device /dev/sdc1 with direct-io.
Initialising device-mapper backend library.
dm table fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
securedata ] [16384] (*1)
Trying to open and read device /dev/sdc1 with direct-io.
Crypto backend (gcrypt 1.5.3) initialized in cryptsetup library
version 1.7.4.
Detected kernel Linux 3.10.0-693.el7.x86_64 x86_64.
Reading LUKS header of size 1024 from device /dev/sdc1
Key length 32, device size 1943016847 sectors, header size 2050
sectors.
Deactivating volume fbf8887d-8694-46ca-b9ff-be79a668e2a9.
dm status fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush ]
[16384] (*1)
Udev cookie 0xd4d14e4 (semid 32769) created
Udev cookie 0xd4d14e4 (semid 32769) incremented to 1
Udev cookie 0xd4d14e4 (semid 32769) incremented to 2
Udev cookie 0xd4d14e4 (semid 32769) assigned to REMOVE task(2) with
flags (0x0)
dm remove fbf8887d-8694-46ca-b9ff-be79a668e2a9 [ opencount flush
retryremove ] [16384] (*1) fbf8887d-8694-46ca-b9ff-be79a668e2a9: Stacking NODE_DEL [verify_udev]
Udev cookie 0xd4d14e4 (semid 32769) decremented to 1
Udev cookie 0xd4d14e4 (semid 32769) waiting for zero
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit d0a9e57bfcf68e41e25a1b3868ded447d09f8199) Signed-off-by: Sébastien Han <seb@redhat.com>
Sébastien Han [Wed, 11 Oct 2017 10:52:12 +0000 (12:52 +0200)]
purge-iscsi: fix group name
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1500281 Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 85e13a864c1317849d7bf34441fa1f7b33939556) Signed-off-by: Sébastien Han <seb@redhat.com>
in addition to c4dcdaa20 this commit adds the missing condition on
install tasks for debian_rhcs deployment. Without them, these tasks are
played on any kind of deployment.
Jan Provaznik [Tue, 10 Oct 2017 10:43:23 +0000 (12:43 +0200)]
Ceph-nfs dynamic exports fixes
* DBus on host should include ganesha service file
* to allow ganesha container to respond on DBus it needs to run
in --privileged mode (ganesha folks contacted to look at this)
* ceph_nfs_include_exports_dir variable replaced with more general
ceph_nfs_dynamic_exports
Sébastien Han [Tue, 10 Oct 2017 07:57:39 +0000 (09:57 +0200)]
purge: fix journal purge
Using a condition when osd_scenario == 'non-collocated' was wrong since
these partitions can be collocated on a single device also. Removing the
check makes the purge of these partitions.
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1499871 Signed-off-by: Sébastien Han <seb@redhat.com>
Make role `ceph-mgr` handling itself the installation of `ceph-mgr`
package because it's complicated to manage it regarding we are going to
install `jewel vs. luminous`
Sébastien Han [Sun, 8 Oct 2017 15:29:32 +0000 (17:29 +0200)]
ci: re-add osd_pool_default_size to 1 with the override
If we don't do this the client will create pools with a replica 3 since
osd_pool_default_size was gone in ceph-override.json. This was making
switch_to_containers failing.
Sébastien Han [Sun, 8 Oct 2017 13:54:36 +0000 (15:54 +0200)]
infra: add independant purge-iscsi-gateways.yml
The current inclusion of purge-iscsi-gateways.yml in purge-cluster.yml
is not working well and blocking the CI too. So removing it from
purge-cluster.yml and re-add the original purge-iscsi-gateways.yml.
Boris Ranto [Fri, 6 Oct 2017 20:54:34 +0000 (22:54 +0200)]
purge-cluster: Do not use shell for rm
The shell wildcard expansion of non-existing paths fails on zsh making
the whole script fail. We can use file module with with_fileglob to
alleviate the problem instead.
Sébastien Han [Fri, 6 Oct 2017 14:49:46 +0000 (16:49 +0200)]
use get to check stdout_lines
During the initial play, the docker command doesn't not exist and then
there is no stdout_lines to the command. So get allows us to fix this by
declaring an array if the command fails.
Use an intermediate variable to build the final `dedicated_devices` list
to avoid duplicate entry in that array. (We need a 1:1 relation between
`dedicated_devices` and `devices` since we are using a `with_together`
later.