Loic Dachary [Thu, 7 Jan 2016 14:06:32 +0000 (15:06 +0100)]
Merge pull request #7001 from dachary/wip-14145-infernalis
infernalis: ceph-disk: use blkid instead of sgdisk -i
On CentOS 7.1 and other operating systems with a version of udev greater or equal to 214,
running ceph-disk prepare triggered unexpected removal and addition of partitions on
the disk being prepared. That created problems ranging from the OSD not being activated
to failures because /dev/sdb1 does not exist although it should.
Loic Dachary [Wed, 6 Jan 2016 22:36:57 +0000 (23:36 +0100)]
tests: ceph-disk cryptsetup close must try harder
Similar to how it's done in dmcrpyt_unmap in master ( 132e56615805cba0395898cf165b32b88600d633 ), the infernalis tests helper
that were deprecated by the addition of the deactivate / destroy
ceph-disk subcommand must try cryptsetup close a few times in some
contexts.
Loic Dachary [Fri, 18 Dec 2015 23:53:03 +0000 (00:53 +0100)]
ceph-disk: protect deactivate with activate lock
When ceph-disk prepares the disk, it triggers udev events and each of
them ceph-disk activate. If systemctl stop ceph-osd@2 happens while
there still are ceph-disk activate in flight, the systemctl stop may be
cancelled by the systemctl enable issued by one of the pending ceph-disk
activate.
This only matters in a test environment where disks are destroyed
shortly after they are activated.
src/ceph-disk: ceph-disk deactivate does not exist in ceph-disk
on infernalis. But the same feature is implemented in
ceph-test-disk.py for test purposes and has the same
problem. The patch is adapted to ceph-test-disk.py.
Loic Dachary [Wed, 6 Jan 2016 10:15:19 +0000 (11:15 +0100)]
ceph-disk: retry cryptsetup remove
Retry a cryptsetup remove ten times. After the ceph-osd terminates, the
device is released asyncrhonously and an attempt to cryptsetup remove
will may fail because it is considered busy. Although a few attempts are
made before giving up, the number of attempts / the duration of the
attempts cannot be controlled with a cryptsetup option. The workaround
is to increase this by trying a few times.
If cryptsetup remove fails for a reason that is unrelated to timeout,
the error will be repeated a few times. There is no undesirable side
effect. It will not hide a problem.
Loic Dachary [Fri, 18 Dec 2015 16:03:21 +0000 (17:03 +0100)]
ceph-disk: use blkid instead of sgdisk -i
sgdisk -i 1 /dev/vdb opens /dev/vdb in write mode which indirectly
triggers a BLKRRPART ioctl from udev (starting version 214 and up) when
the device is closed (see below for the udev release note). The
implementation of this ioctl by the kernel (even old kernels) removes
all partitions and adds them again (similar to what partprobe does
explicitly).
The side effects of partitions disappearing while ceph-disk is running
are devastating.
sgdisk is replaced by blkid which only opens the device in read mode and
will not trigger this unexpected behavior.
The problem does not show on Ubuntu 14.04 because it is running udev <
214 but shows on CentOS 7 which is running udev > 214.
git clone git://anonscm.debian.org/pkg-systemd/systemd.git
systemd/NEWS:
CHANGES WITH 214:
* As an experimental feature, udev now tries to lock the
disk device node (flock(LOCK_SH|LOCK_NB)) while it
executes events for the disk or any of its partitions.
Applications like partitioning programs can lock the
disk device node (flock(LOCK_EX)) and claim temporary
device ownership that way; udev will entirely skip all event
handling for this disk and its partitions. If the disk
was opened for writing, the close will trigger a partition
table rescan in udev's "watch" facility, and if needed
synthesize "change" events for the disk and all its partitions.
This is now unconditionally enabled, and if it turns out to
cause major problems, we might turn it on only for specific
devices, or might need to disable it entirely. Device Mapper
devices are excluded from this logic.
Loic Dachary [Wed, 16 Dec 2015 14:57:03 +0000 (15:57 +0100)]
ceph-disk: dereference symlinks in destroy and zap
The behavior of partprobe or sgdisk may be subtly different if given a
symbolic link to a device instead of an actual device. The debug output
is also more confusing when the symlink shows instead of the device it
points to.
Always dereference the symlink before running destroy and zap.
The default of 120 seconds may be exceeded when the disk is very slow
which can happen in cloud environments. Increase it to 600 seconds
instead.
The partprobe command may fail for the same reason but it does not have
a timeout parameter. Instead, try a few times before failing.
The udevadm settle guarding partprobe are not necessary because
partprobe already does the same. However, partprobe does not provide a
way to control the timeout. Having a udevadm settle after another is
going to be a noop most of the time and not add any delay. It matters
when the udevadm settle run by partprobe fails with a timeout because
partprobe will silentely ignores the failure.
Conflicts:
qa/workunits/ceph-disk/ceph-disk-test.py:
trivial, because destroy/deactivate are not implemented
in infernalis. The existing destroy_osd function
has to be modified so the id returned by sh() does
not have a trailing newline.
Loic Dachary [Wed, 21 Oct 2015 22:21:49 +0000 (00:21 +0200)]
tests: ceph-disk workunit uses configobj
Instead of using augtool to modify the configuration file, use
configobj. It is also used by the install teuthology task. The .ini
lens (puppet lens really) is unable to read ini files created by
configobj.
Herve Rousseau [Fri, 6 Nov 2015 08:52:28 +0000 (09:52 +0100)]
rgw: fix reload on non Debian systems.
When using reload in non-debian systems, /bin/sh's kill is used to send the HUP signal to the radosgw process.
This kill version doesn't understand -SIGHUP as a valid signal, using -HUP does work.
Jason Dillaman [Tue, 7 Jul 2015 16:11:13 +0000 (12:11 -0400)]
WorkQueue: new PointerWQ base class for ContextWQ
The existing work queues do not properly function if added to a running
thread pool. librbd uses a singleton thread pool which requires
dynamically adding/removing work queues as images are opened and closed.
Jason Dillaman [Mon, 9 Nov 2015 16:22:24 +0000 (11:22 -0500)]
librbd: fixed deadlock while attempting to flush AIO requests
In-flight AIO requests might force a flush if a snapshot was created
out-of-band. The flush completion was previously invoked asynchronously,
potentially via the same thread worker handling the AIO request. This
resulted in the flush operation deadlocking since it can't complete.
Fixes: #13726
Backport: infernalis, hammer Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit bfeb90e5fe24347648c72345881fd3d932243c98)
Sage Weil [Fri, 23 Oct 2015 17:27:39 +0000 (13:27 -0400)]
osd: fix OSDService vs Objecter init order
This reverts c7d96a5ed1d2cb844622af29b13705b8f7be6be7, but still keeps
the Objecter init *after* we have authenticated. This way we don't
crash when we get mon messages like MOSDPGCreate, and we also don't
request maps we aren't prepared to handle.
Boris Ranto [Fri, 23 Oct 2015 14:39:16 +0000 (16:39 +0200)]
ceph.spec.in: We no longer need redhat-lsb-core
Drop the redhat-lsb-core dependency as it is no longer necessary on
fedora/rhel.
The other two init scripts do not use redhat-lsb-core either. The
init-ceph.in conditionally requires /lib/lsb/init-functions and does not
use any of the functions defined in that file (at least not directly).
The init-radosgw file includes /etc/rc.d/init.d/functions on non-debian
platforms instead of /lib/lsb/init-functions file so it does not require
redhat-lsb-core either.
Boris Ranto [Fri, 23 Oct 2015 13:31:27 +0000 (15:31 +0200)]
init-rbdmap: Rewrite to use logger + clean-up
This patch rewrites the init-rbdmap init script so that it uses logger
instead of the log_* functions. The patch also fixes various smaller
bugs like:
* MAP_RV was undefined if mapping already existed
* UMNT_RV and UMAP_RV were almost always empty (if they succeeded) ->
removed them
* use of continue instead RET_OP in various places (RET_OP was not being
checked after the switch to logger messages)
* removed use of DESC (used only twice and only one occurrence actually
made sense)
Jason Dillaman [Wed, 21 Oct 2015 17:12:48 +0000 (13:12 -0400)]
librbd: potential assertion failure during cache read
It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.
Fixes: #13559
Backport: hammer Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Josh Durgin [Thu, 15 Oct 2015 01:28:33 +0000 (18:28 -0700)]
librbd: fix rebuild_object_map() when no object map exists
Enabling the object map feature and then attempting to rebuild it
results in an assert failure, since the number of objects was
accidentally passed to ObjectMap::aio_resize() instead of the size of
the image.
Jason Dillaman [Thu, 15 Oct 2015 04:15:54 +0000 (00:15 -0400)]
ceph_context: remove unsafe cast for singletons
It was previously assumed that a CephContext singleton would
inherit from CephContext::AssociatedSingletonObject, but it was
not enforced. This could result in unknown behavior when the
singleton is destroyed due to the implied virtual destructor.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 29 Sep 2015 18:13:46 +0000 (14:13 -0400)]
lttng: move tracepoint probes to dynamic libraries
LTTng-UST initializes itself at program load, which means it is
currently always enabled. This can lead to issues with SElinux
and AppArmor which might restrict access to the necessary device
files.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Tue, 13 Oct 2015 12:37:40 +0000 (08:37 -0400)]
debian/control: python-setuptools is a build dependency
cd ./ceph-detect-init ; python setup.py build
Traceback (most recent call last):
File "setup.py", line 23, in <module>
from setuptools import setup
ImportError: No module named setuptools