Loic Dachary [Thu, 5 Mar 2015 10:38:18 +0000 (11:38 +0100)]
install-deps.sh: strip | in the list of packages
Alternatives were introduced lately and the | needs to be stripped from
the list of packages to install otherwise apt-get will try to install
all packages.
Ken Dreyer [Wed, 4 Mar 2015 22:01:34 +0000 (15:01 -0700)]
ceph.spec.in: loosen ceph-test's dependencies
In Debian, the ceph-test package can be installed with any version of
ceph-common.
Prior to this commit, in RHEL, we're much more strict about which
version of the dependencies we require. We depend directly on
librados2/librbd1/libcephfs1 instead of ceph-common, and we also require
the specific versions of these libraries to match the version of
ceph-test.
For testing Ceph, it is nice to have the ability to upgrade the
librados2/librbd1/libcephfs1 libraries on a host without having to also
upgrade the ceph-test package as well.
Remove the version number requirements, and change the dependencies from
librados2/librbd1/libcephfs1 to simply "ceph-common". That will make
/etc/ceph/ and /var/log/ceph present for the tests.
Jason Dillaman [Tue, 3 Mar 2015 02:14:21 +0000 (21:14 -0500)]
librbd: delay completion of AioRequest::read_from_parent
If the object map is enabled, it's possible for a read request to
instantly complete due to the skipped librados operations. Now
AioRequest will block the completion of read_from_parent requests
to prevent the possibility of the parent image being closed while
the read_from_parent method invocation is in-progress.
Fixes: #10968 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 3 Mar 2015 02:07:01 +0000 (21:07 -0500)]
librbd: allow AioCompletions to be blocked
Blocked AioCompletions will not fire their callback until unblocked.
This is an expansion / replacement of the previous 'building' flag
used to block completions while additional requests were added to the
completion.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 2 Mar 2015 13:13:55 +0000 (08:13 -0500)]
librbd: handle possible aio_read return error code
AioRead and CopyupRequest were not properly handling possible
error codes from aio_read. They now correctly free the completion
and invoke the callback context.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Loic Dachary [Wed, 25 Feb 2015 14:32:50 +0000 (15:32 +0100)]
mon: ignore crushtool validation if too long
The crushtool is aborted if it takes more than mon lease seconds. Since
the monitor blocks while running it, this is mandatory otherwise the
monitor will be considered down and new elections triggered.
Loic Dachary [Mon, 2 Mar 2015 10:14:18 +0000 (11:14 +0100)]
mon: do not hardwire crushtool command line
Make crushtool a configuration value that defaults to crushtool and
allow it to be injected. It helps with testing: the command can be
replaced with another that misbehaves in various ways.
Jason Dillaman [Mon, 2 Mar 2015 22:39:20 +0000 (17:39 -0500)]
rbd: fixed formatted output of rbd image features
All feature flags were being displayed when using JSON/XML
formatted output. Now use the same formatting routing for
plain/JSON/XML output for features and flags.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 27 Feb 2015 14:46:55 +0000 (09:46 -0500)]
librbd: flush pending AIO after acquiring lock
There was a potential race condition between a delayed AIO
operation waiting on acquiring a lock and a snap_create
flushing all pending IO. Since snap_create owned md_lock, the
delayed AIO would not be allowed to complete -- deadlocking the
flush.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 27 Feb 2015 04:39:10 +0000 (23:39 -0500)]
librbd: hold snap_lock between clipping IO and registering AIO
In the case where concurrent IO is occurring when a trim resize
operation is initiated, hold the snap_lock between clipping the
IO operation and registering the pending op. That allows the
resize state machine to properly flush all operations issued
before the clip region was updated.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Loic Dachary [Fri, 27 Feb 2015 15:15:57 +0000 (16:15 +0100)]
mon: do not pollute directory with cvs files from crushtool
The --output-csv option to crushtool will create files in the current
directory of the monitor. The only reason for using it is because
crushtool requires at least one option for display. Relax this
constraint in crushtool and remove the option from the call made by the
monitor to validate a new crushmap.
include/util.h: initialize ceph_data_stats to zero
We decode this struct on the monitor. Although at the moment there's no
reports of any weird behavior by not initializing it, let's avoid it
completely by setting member values to zero -- just in case and because
it's a good policy.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
mon: mon_types.h: initialize LevelDBStoreStats and avoid craziness
On a mixed-version cluster, say firefly and dumpling, the first round of
data health checks could end up with crazy values being reported for
data usage/availability for dumpling monitors.
This would be caused by dumpling not supporting reporting of store
stats, and by not assuming values as zero on decoding we would end up
decoding trash.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
Ken Dreyer [Fri, 27 Feb 2015 17:32:37 +0000 (10:32 -0700)]
Add GPLv2 text file
Most of the ceph tree is LGPLv2.1, but there are some files that are
under the full GPLv2.
Add a copy of the GNU General Public License (version 2) to the
distribution. This file was copied verbatim from
https://www.gnu.org/licenses/gpl-2.0.txt
Jason Dillaman [Thu, 26 Feb 2015 21:58:07 +0000 (16:58 -0500)]
librbd: C_SaferCond memory leak
Unlike the other Context derived classes, C_SaferCond is not
a suicidal object which deletes itself. Swap heap allocations
of C_SaferCond to stack-based allocations as a result.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Greg Farnum [Thu, 26 Feb 2015 23:20:11 +0000 (15:20 -0800)]
ceph-fuse: test dentry invalidation options and fail out if we fail
We identify the Linux kernel version and based on that either expect to
be able to invalidate dentries effectively, or expect to be able to remount
the ceph-fuse mountpoint. Test it using the Client functions and callbacks by
spinning off a thread to invoke the test that is separate from the main
FUSE loop.
Most unfortunately, there doesn't seem to be a good interface to tell
FUSE to shut down if we need to do that. See
http://fuse.996288.n3.nabble.com/libfuse-exiting-fuse-session-loop-td10686.html
I tried changing our signal invocation or attempting a simple action on
the mount point but those were ineffectual at terminating the remaining
processes; fusermount actually gets rid of them all.
Greg Farnum [Thu, 26 Feb 2015 23:12:47 +0000 (15:12 -0800)]
Client: support using dentry invalidation callbacks on older kernels
This brings back a few small code chunks that were removed in 0827bb79ea5127e6763f6e904dfa1a3266046ffb. We check the kernel version,
and if it is less than 3.18 we use these dentry invalidation callbacks
instead of the remount callback. This should resolve a number of
issues with racing against remount, including #10916, and lets older
unprivileged users on older kernels run even if they can't apply
options on mount (#10542).
Jason Dillaman [Thu, 26 Feb 2015 17:00:41 +0000 (12:00 -0500)]
qa/workunits/rbd/copy.sh: explicitly choose the image format
The rbd CLI now utilizes the rbd_default_format configuration
setting, therefore the copy test now needs to tell rbd which image
format it is expecting to create.
Fixes: #10961 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Yan, Zheng [Wed, 25 Feb 2015 07:27:59 +0000 (15:27 +0800)]
client: re-send requsets before composing the cap reconnect message
After commit 419800fe (client: re-send request when MDS enters reconnecting
stage), cephfs client can send both unsafe requests and normal requests when
MDS is in reconnecting stage. Normal requests can have embedded cap releases,
the client code encodes these embedded cap releases after composing the cap
reconnect message. This causes the client sliently drop some caps. The fix
is re-send requsets (which add embedded cap releases) before composing the
cap reconnect message
Josh Durgin [Tue, 24 Feb 2015 22:31:27 +0000 (14:31 -0800)]
test_librbd: close ioctx after imagectx
There's no need to explicitly close the ioctx. Doing so may cause
problems when the Images using it are destroyed afterwards. Just let
normal cleanup at the end of the block take care of it in the correct
order.
Josh Durgin [Tue, 24 Feb 2015 04:28:38 +0000 (20:28 -0800)]
librbd: make ImageCtx->object_map always present
This simplifies locking by obviating the NULL checks. We no longer
need md_lock to protect these acceses. We can use object_map_lock
instead, to make sure no one reads an object map while its being
updated.
Keep track of whether the object map is enabled for a given snapshot
internally. In each public method, check this state, and automatically
set it correctly when refreshing the object map. During snapshot
removal, unconditionally try to remove the object map object, to
protect against bugs leaking objects, and to be consistent with image
removal.
Jason Dillaman [Wed, 25 Feb 2015 17:00:26 +0000 (12:00 -0500)]
librbd: restart async requests if lock owner doesn't report progress
Detect the case of a crashed lock owner by waiting for up to 30 seconds
for a async request progress message from the leader. If a progress
message isn't received, restart the request (and possibly take ownership
of the lock).
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 25 Feb 2015 04:35:31 +0000 (23:35 -0500)]
librbd: replace Finisher/SafeTimer use with facade
Replace the two Context threading classes used within
ImageWatcher with a facade to orchestrate the scheduling
and canceling of Context task callbacks.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 19:33:44 +0000 (14:33 -0500)]
librbd: cancel in-progress maint operations before releasing lock
Ensure that all in-flight maintenance operations (resize, flatten) are
not running when the exclusive lock is released. The lock will be
released when transitioning to a snapshot, closing the image, or
cooperatively when another client requests the lock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 17:53:45 +0000 (12:53 -0500)]
librbd: flush context potentially completing too early
If the async operation associated with a flush request completes,
only complete the flush contexts if no previous operations are
still in flight. Otherwise, move the flush contexts to an older
in-flight async operation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Josh Durgin [Tue, 24 Feb 2015 03:50:55 +0000 (19:50 -0800)]
librbd: take ImageCtx->snap_lock for write in add_snap()
add_snap() updates the ImageCtx snapshot metadata in memory, as well
as reading the flags as part of the object map snapshot. Both of these
require holding snap_lock.
Josh Durgin [Tue, 24 Feb 2015 03:49:12 +0000 (19:49 -0800)]
librbd: use snap_lock to protect ImageCtx->flags
This is another step towards eliminating md_lock from the writeback
path. Almost all the places that use ImageCtx->flags already use
snap_lock, so there's no need to create a new lock. For the others,
add a helper, test_flags() that acquires the lock, similar to
test_features().
This also makes sure we look up flags of the snapshot we're operating
on, instead of those for head.
Josh Durgin [Tue, 24 Feb 2015 02:46:26 +0000 (18:46 -0800)]
librbd: add and use a test_features() helper
This gets the appropriate locks, and checks the currently open
snapshot instead of head. Looking up features by snap_id prepares us
for future addition or removal of e.g. an object map throughout the
life of an image.
Josh Durgin [Tue, 24 Feb 2015 02:44:05 +0000 (18:44 -0800)]
librbd: use ImageCtx->snap_lock for ImageCtx->features
This was being protected by md_lock, but that has become too coarse
since it is used to prevent writes from proceeding while flushing
caches for a snapshot. With the addition of ObjectMap and
ImageWatcher, writeback could try to acquire md_lock again, leading to
a deadlock.
Greg Farnum [Fri, 13 Feb 2015 03:23:43 +0000 (19:23 -0800)]
fuse: do not invoke ll_register_callbacks() on finalize
We were passing in a NULL data structure, probably in an attempt to
let things clean up -- but our implementation just returns with a NULL
pass-in value, so drop it for clarity.
Boris Ranto [Wed, 7 Jan 2015 09:26:49 +0000 (10:26 +0100)]
Split python-ceph to appropriate python-* packages
python-ceph contains various header files/bindings for serveral
libraries, this patch creates *-devel packages for all the
libraries separately and provides the compatibility layer for
the split.
Jason Dillaman [Tue, 24 Feb 2015 14:25:14 +0000 (09:25 -0500)]
tests: speed up Python RBD random data generation
The RBD large_write test cases was taking multiple minutes to
run under a Fedora 21 VM. Replaced the million+ random number
generator calls with a single call to os.urandom. The test
now completes within seconds.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>