David Galloway [Fri, 27 Apr 2018 16:37:02 +0000 (12:37 -0400)]
testnode: Redo LVM removal
For whatever reason, `dmsetup remove_all` fails on Bionic. As long as
there are leftover lvs or vgs, a pv must be linked to them. We can just
force remove the physical volume which wipes out the rest of the LVM
data.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Fri, 27 Apr 2018 15:53:57 +0000 (11:53 -0400)]
cobbler: Set packages to install for Bionic
- udev-discover is no more
- net-tools provides ifconfig which is used in rc.local
- ifupdown provides `ifdown` and `ifup` which are in rc.local
- python is required for ansible
- ntp isn't installed by default anymore apparently
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Tue, 24 Apr 2018 19:28:42 +0000 (15:28 -0400)]
testnode: Exclude dm devices from list of physical volumes
This was actually happening because when the playbook first runs, the
setup module is run and sees the device mapper devices. We zap them
later in the playbook but ansible doesn't know that. We could just
re-run the setup module but this method will instead guarantee we don't
use dm-* devices.
Fixes: https://tracker.ceph.com/issues/23845 Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Tue, 27 Mar 2018 15:20:02 +0000 (11:20 -0400)]
cobbler: Just output to ttyS1
This covers all baremetal types except mira. Cheetah/Cobbler templating
was breaking with commit
https://github.com/ceph/ceph-cm-ansible/pull/389/commits/de871c037f4ae227bed00933eee24f849a4551b0.
The problem is if there is more than one kernel option type (like
console, ksdevice, etc.), $kernel_options gets expanded into a json
dictionary and breaks the templating.
David Galloway [Fri, 9 Mar 2018 21:51:25 +0000 (16:51 -0500)]
cobbler: Have rc.local output go to console
Usually if something goes wrong during the rc.local run, the machine
won't be reachable to debug over the network. Additionally, since we
reimage every machine before each job now, it's impossible to debug why
rc.local failed given a particular job. This outputs rc.local to the
tty specified in kernel_options so we can see the output in `$hostname_reimage` run logs.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Fri, 9 Mar 2018 19:58:08 +0000 (14:58 -0500)]
cobbler: Write exact /etc/default/grub
This fixes console output during Xenial and later. Prior to this, the
Plymouth boot screen would get loaded and "[37mUbuntu 16.04[-1;-1f[33m.
[37m. [37m. [37m." would get repeated to the console until the login
prompt shows up.
Writing our own file instead of finding and replacing variables makes
sure the settings are exactly what we want.
This snippet is only used on Debian-based distros. The default Cobbler
snippet is used on RPM-based distros.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Tue, 20 Mar 2018 15:22:53 +0000 (11:22 -0400)]
cobbler: Change method used to ping Cobbler host in rc.local
I've observed a *very* occasional race condition where dhclient
completes but the host can't ping Cobbler. Instead of timing out
waiting for one ping packet to return, we'll try pinging X number of
times (based on $attempts number) and then give up.
I'll paste an example of the race condition observed in the PR notes.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Mon, 26 Feb 2018 18:56:58 +0000 (13:56 -0500)]
pcp: Disable role for now
With the addition of RHEL to Sepia, teuthology will be running
cephlab.yml on unregistered RHEL testnodes. Since the PCP playbook gets run
before the testnodes playbook, RHEL systems in Sepia won't be registered
to our Satellite yet and PCP installation fails.
We're not currently using PCP so we can disable the role and save some
time and headache.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Thu, 1 Feb 2018 20:42:36 +0000 (15:42 -0500)]
nameserver: Let records tasks coexist with DDNS
It takes about 3 minutes for ansible to compile all the zone files.
That was causing nsupdate/DDNS to overwrite any new records we wanted to
add or change before named could be reloaded.
This PR:
- Writes zone files to a temporary location
- Dumps pending DDNS changes into zone files
- Freezes DDNS zone files from updates
- Moves temporary zone files into place all at once
- Unfreezes DDNS zone files
This results in about a 3 second window where DDNS updates will be
refused which isn't great but we can at least update records while OVH
jobs are running now.
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Fri, 19 Jan 2018 20:31:03 +0000 (15:31 -0500)]
cobbler: Use MAC address specified in ansible inventory instead of eth0
I concede. Name it whatever you want, RHEL.
This will allow the OS to use the "predictable naming" during anaconda
and after firstboot preventing NIC names from switching like we're
seeing in http://tracker.ceph.com/issues/22732 and http://tracker.ceph.com/issues/22643
Signed-off-by: David Galloway <dgallowa@redhat.com>
David Galloway [Wed, 10 Jan 2018 21:17:55 +0000 (16:17 -0500)]
cobbler: Remove DHCP config for NICs if ifup fails in rc.local
An issue was discovered where rc.local bails if a testnode has multiple
NICs cabled but each NIC doesn't have a DHCP reservation. For example,
some of the magnas have a second NIC cabled but are cabled to a tagged
port on the switch so they can pass traffic via multiple VLANs.
Fixes: http://tracker.ceph.com/issues/22651 Signed-off-by: David Galloway <dgallowa@redhat.com>