Yan, Zheng [Wed, 25 Feb 2015 07:27:59 +0000 (15:27 +0800)]
client: re-send requsets before composing the cap reconnect message
After commit 419800fe (client: re-send request when MDS enters reconnecting
stage), cephfs client can send both unsafe requests and normal requests when
MDS is in reconnecting stage. Normal requests can have embedded cap releases,
the client code encodes these embedded cap releases after composing the cap
reconnect message. This causes the client sliently drop some caps. The fix
is re-send requsets (which add embedded cap releases) before composing the
cap reconnect message
Josh Durgin [Tue, 24 Feb 2015 22:31:27 +0000 (14:31 -0800)]
test_librbd: close ioctx after imagectx
There's no need to explicitly close the ioctx. Doing so may cause
problems when the Images using it are destroyed afterwards. Just let
normal cleanup at the end of the block take care of it in the correct
order.
Josh Durgin [Tue, 24 Feb 2015 04:28:38 +0000 (20:28 -0800)]
librbd: make ImageCtx->object_map always present
This simplifies locking by obviating the NULL checks. We no longer
need md_lock to protect these acceses. We can use object_map_lock
instead, to make sure no one reads an object map while its being
updated.
Keep track of whether the object map is enabled for a given snapshot
internally. In each public method, check this state, and automatically
set it correctly when refreshing the object map. During snapshot
removal, unconditionally try to remove the object map object, to
protect against bugs leaking objects, and to be consistent with image
removal.
Jason Dillaman [Wed, 25 Feb 2015 17:00:26 +0000 (12:00 -0500)]
librbd: restart async requests if lock owner doesn't report progress
Detect the case of a crashed lock owner by waiting for up to 30 seconds
for a async request progress message from the leader. If a progress
message isn't received, restart the request (and possibly take ownership
of the lock).
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 25 Feb 2015 04:35:31 +0000 (23:35 -0500)]
librbd: replace Finisher/SafeTimer use with facade
Replace the two Context threading classes used within
ImageWatcher with a facade to orchestrate the scheduling
and canceling of Context task callbacks.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 19:33:44 +0000 (14:33 -0500)]
librbd: cancel in-progress maint operations before releasing lock
Ensure that all in-flight maintenance operations (resize, flatten) are
not running when the exclusive lock is released. The lock will be
released when transitioning to a snapshot, closing the image, or
cooperatively when another client requests the lock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 17:53:45 +0000 (12:53 -0500)]
librbd: flush context potentially completing too early
If the async operation associated with a flush request completes,
only complete the flush contexts if no previous operations are
still in flight. Otherwise, move the flush contexts to an older
in-flight async operation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Josh Durgin [Tue, 24 Feb 2015 03:50:55 +0000 (19:50 -0800)]
librbd: take ImageCtx->snap_lock for write in add_snap()
add_snap() updates the ImageCtx snapshot metadata in memory, as well
as reading the flags as part of the object map snapshot. Both of these
require holding snap_lock.
Josh Durgin [Tue, 24 Feb 2015 03:49:12 +0000 (19:49 -0800)]
librbd: use snap_lock to protect ImageCtx->flags
This is another step towards eliminating md_lock from the writeback
path. Almost all the places that use ImageCtx->flags already use
snap_lock, so there's no need to create a new lock. For the others,
add a helper, test_flags() that acquires the lock, similar to
test_features().
This also makes sure we look up flags of the snapshot we're operating
on, instead of those for head.
Josh Durgin [Tue, 24 Feb 2015 02:46:26 +0000 (18:46 -0800)]
librbd: add and use a test_features() helper
This gets the appropriate locks, and checks the currently open
snapshot instead of head. Looking up features by snap_id prepares us
for future addition or removal of e.g. an object map throughout the
life of an image.
Josh Durgin [Tue, 24 Feb 2015 02:44:05 +0000 (18:44 -0800)]
librbd: use ImageCtx->snap_lock for ImageCtx->features
This was being protected by md_lock, but that has become too coarse
since it is used to prevent writes from proceeding while flushing
caches for a snapshot. With the addition of ObjectMap and
ImageWatcher, writeback could try to acquire md_lock again, leading to
a deadlock.
Boris Ranto [Wed, 7 Jan 2015 09:26:49 +0000 (10:26 +0100)]
Split python-ceph to appropriate python-* packages
python-ceph contains various header files/bindings for serveral
libraries, this patch creates *-devel packages for all the
libraries separately and provides the compatibility layer for
the split.
Jason Dillaman [Tue, 24 Feb 2015 14:25:14 +0000 (09:25 -0500)]
tests: speed up Python RBD random data generation
The RBD large_write test cases was taking multiple minutes to
run under a Fedora 21 VM. Replaced the million+ random number
generator calls with a single call to os.urandom. The test
now completes within seconds.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 01:09:56 +0000 (20:09 -0500)]
tests: fix potential race conditions in test_ImageWatcher
The tests were sending invalid responses back to ImageWatchers
(missing the result code), which had the potential to allow the
lock to be acquired sooner than the test was expecting since
ImageWatcher would assume the last of response code meant no
clients owned the exclusive lock and would retry as fast as
possible.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 24 Feb 2015 00:45:03 +0000 (19:45 -0500)]
osdc: watch error callback invoked on cancelled context
The C_DoWatchError context did not verify whether or not the
watch was cancelled prior to invoking the callback. This
resulted in sporadic crashes when reconnect errors bubbled
up to destroyed objects.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Wed, 18 Feb 2015 22:53:04 +0000 (14:53 -0800)]
osd,mon: explicitly specify OSD features in MOSDBoot
We are using the connection features to populate the features field in the
OSDMap, but this is the *intersection* of mon and osd features, not the
osd features. Fix this by explicitly specifying the features in
MOSDBoot.
Fixes: #10911
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 16 Feb 2015 22:18:40 +0000 (14:18 -0800)]
osd: do not proxy reads unless all OSDs proxy features too
Specifically, the object_copy_data_t encoding changed such that the reply
encoding is dependent on features; if we proxy such a read to an old
OSD it will use *our* features to encode instead of the original OSD's.
Sage Weil [Mon, 16 Feb 2015 17:30:39 +0000 (09:30 -0800)]
osd/OSDMap: cache get_up_osd_features
This method is O(n) and called from in a few places for each IO operation.
Cache the value since it does not change over the lifetime of a single
epoch. Invalidate on apply_incremental() and decode.
Jason Dillaman [Mon, 23 Feb 2015 17:16:39 +0000 (12:16 -0500)]
librbd: fixed snap create race conditions
Since the post-snap create header update runs asynchrously
in a finalizer callback, it's possible that the snapshot
is not immediately visible. Also, if a proxied snap create
message is replayed, it's possible for the client to receive
a EEXISTS error.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Added a unique client id to announcement messages so that duplicate
lock release / acquired / requested messages can be detected and
ignored by the client. Also fixed an issue processing the result
code for async operations.
Fixes: #10898 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 20 Feb 2015 17:50:26 +0000 (12:50 -0500)]
rbd: disable RBD exclusive locking by default
Utilize the existing rbd_default_features config option to
control whether or not to enable RBD exclusive locking and
object map features by default. Also added a new option to
the rbd cli to specify the image features when creating images.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 19 Feb 2015 20:38:32 +0000 (15:38 -0500)]
osdc: pass fadvise op flags to WritebackHandler read requests
librbd was previously attempting to cast the provided Context to
retrieve the fadvise flags. To eliminate the unsafe cast, now
the fadvise flags are directly passed to the WritebackHandler::read
callback.
Fixes: #10914 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Thu, 12 Feb 2015 22:16:53 +0000 (14:16 -0800)]
osd/OSDMap: include pg_temp count in summary
It is useful to know how big the pg_temp map is. Strictly speaking
this is part of the OSDMap so I'm including it here. It looks like
this:
osdmap e25: 3 osds: 3 up, 3 in; 1 remapped pgs
It might be more user-friendly to put it in a line with the pgmap
somewhere (where other pg counts are included), but it doesn't quite
fit there either. So sticking with where it lives in the data
structure!
Samuel Just [Tue, 17 Feb 2015 18:08:01 +0000 (10:08 -0800)]
PG: compensate for bug 10780 on older peers
Previously, there was a harmless bug where we didn't fill in the
last_epoch_started field for a peer which we are resetting the
last_backfill line for. It's no longer harmless since we use that
as the activation epoch, so if the peer is missing the MIN_SIZE
feature bit, we fill in the last_epoch_started it meant to fill in.
It was possible for ImageWatcher to attempt to re-acquire held locks
via context callbacks. This issue affected resizing/flattening when
no work was required and rescheduling a watch upon two successive
failures.
Fixes: #10899 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Boris Ranto [Wed, 7 Jan 2015 09:00:21 +0000 (10:00 +0100)]
ceph.spec: split ceph-devel to appropriate *-devel packages
ceph-devel contains various header files/bindings for several
libraries, this patch creates *-devel packages for all the libraries
separately and provides the compatibility layer for the split.
http://tracker.ceph.com/issues/10884 Refs: #10884
Signed-off-by: Boris Ranto <branto@redhat.com>
Amended by Ken Dreyer <kdreyer@redhat.com> to add version numbers to the
Obsoletes, add Obsoletes to the libradosstriper1-devel and
libcephfs_jni1-devel subpackages, adjust the librados documentation, and
add the Redmine issue number to this commit log.
Jason Dillaman [Sat, 14 Feb 2015 06:24:44 +0000 (01:24 -0500)]
librbd: enforce write ordering with snapshot
The md_lock is now held for reading when scheduling write/discards.
Since snap_create now holds the lock for writing and flushes all
pending IO, write/discard operations will now be consistent for a
given request across objects.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Sat, 7 Feb 2015 14:13:10 +0000 (09:13 -0500)]
librbd: use separate files for snapshot object maps
Instead of relying on the built-in object snapshot support,
create a separate object map object for each image snapshot.
This will allow a future repair utility to rebuild the object
map for an image's snapshots.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Mapped IoCtx::write_full to existing test method used by the
ObjectWriteOperation::write_full API method. Also added missing
cls_log implementation for debugging.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>