Haomai Wang [Tue, 6 Jan 2015 09:18:43 +0000 (17:18 +0800)]
AsyncConnection: Fix mark_down race condition
Previously, if caller want to mark_down one connection and caller is event
thread callback, it will block for the wakeup. Meanwhile, the expected event
thread which will signal the blocked thread may also want to mark_down
connection which is own by already blocked thread. So deadlock is happen.
As tradeoff, introduce lock to file_events which can avoid create/delete
file_event callback. So we don't need wait for callback again.
If binding on a IP-Address fails, delay and retry again.
This happens mainly on IPv6 deployments. Due to DAD (Duplicate Address Detection)
or SLAAC it can be that IPv6 is not yet available when the daemons start.
Monitor daemons try to bind on a static IPv6 address and that might not be available
yet and that causes the monitor not to start.
Haomai Wang [Sun, 7 Dec 2014 16:28:11 +0000 (00:28 +0800)]
AsyncMessenger: Using EventCenter instead of poll for bind
Totally avoid extra thread in AsyncMessenger now. The bind socket will be
regarded as a normal socket and will dispatch a random Worker thread to
handle accept event.
Haomai Wang [Sun, 7 Dec 2014 15:00:06 +0000 (23:00 +0800)]
AsyncMessenger: Bind async thread to special cpu core
Now, 2-4 async op thread can fully meet a OSD's network demand with SSD
backend. So we can bind limited thread to special cores, it can improve
async event loop performance because most of structure and method will
processed within thread.
Loic Dachary [Thu, 15 Jan 2015 12:23:47 +0000 (13:23 +0100)]
tests: adapt to new json-pretty format
The json-pretty format was modified for readability and now includes
additional newlines / spaces. Either switch to json to avoid dealing
with space changes or modify the expected output to include them.
Loic Dachary [Thu, 15 Jan 2015 11:28:12 +0000 (12:28 +0100)]
common: restore format fallback semantic
When Formatter::create replaced new_formatter, the handling of an
invalid format was also incorrectly changed. When an invalid format (for
instance "plain") was specified, new_formatter returned a NULL pointer
which was sometime handled by creating a json-pretty formatter and
sometimes differently.
A new Formatter::create prototype with a fallback argument is added and
is used if it is not the empty string and that the format is not
known. This prototype is used where new_formatter returning NULL was
replaced by a json-pretty formatter.
John Spray [Wed, 14 Jan 2015 10:35:53 +0000 (10:35 +0000)]
mds: handle heartbeat_reset during shutdown
Because any thread might grab mds_lock and call heartbeat_reset
immediately after a call to suicide() completes, this needs
to be handled as a special case where we tolerate MDS::hb having
already been destroyed.
Fixes: #10382 Signed-off-by: John Spray <john.spray@redhat.com>
Jason Dillaman [Tue, 13 Jan 2015 04:17:50 +0000 (23:17 -0500)]
librbd: flush pending AIO requests under all existing flush scenarios
AIO requests that are waiting on the image lock should be flushed
during all existing RBD flush scenarios. A few flush cases were
missed in the original implementation.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 3 Nov 2014 21:51:06 +0000 (16:51 -0500)]
librbd: differentiate between R/O vs R/W RBD features
The new RBD exclusive lock feature should be treated as a
feature that is only applied when the image is opened in
R/W mode.
Older clients will need to handle the updated
cls_rbd::get_features method in order to properly determine
the incompatible features for an image depending on the
current mode.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Sun, 16 Nov 2014 19:20:42 +0000 (14:20 -0500)]
librbd: Add convenience library to support unit tests
Unit tests need access to the private symbols of librbd no
longer exported from librbd.so. A new librbd_internal
convenience library was created to allow access.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 8 Oct 2014 12:41:53 +0000 (08:41 -0400)]
librbd: Integrate librbd with new exclusive lock feature
Operations that update the image now require the exclusive lock
if the feature is enabled. AIO write and discard operations will
automatically request the exclusive lock from the current leader
to support live-migration.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Matt Richards [Tue, 13 Jan 2015 00:59:42 +0000 (16:59 -0800)]
librados: bump rados version number
As a follow-on to 49d114f1fff90e5c0f206725a5eb82c0ba329376,
increment the "extra" version field so clients can easily
determine if they have a version of librados that properly
translates C API operation flags.
Signed-off-by: Matthew Richards <mattjrichards@gmail.com>
Sage Weil [Mon, 12 Jan 2015 22:00:21 +0000 (14:00 -0800)]
osd: enable filestore_extsize by default
Note that this will only get used if the kernel is new enough; if it is
older than 3.5 the option will get disabled and extsize will not be used
even if the option is set to true.
Sage Weil [Mon, 12 Jan 2015 21:59:39 +0000 (13:59 -0800)]
os/FileStore: verify kernel is new enough before using extsize ioctl
Old kernels have an XFS bug that exposes uninitialized data when the
extsize hint is set and only partially written. This is fixed by Linux
commit aff3a9edb7080f69f07fe76a8bd089b3dfa4cb5d, documented in XFS bug
http://oss.sgi.com/bugzilla/show_bug.cgi?id=874, and tested by XFS
test xfs/229 to prevent regressions.
Notably the original bug affects kernel 3.2, which is widely deployed with
ubuntu precise 12.04.
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>
John Spray [Mon, 5 Jan 2015 19:34:57 +0000 (19:34 +0000)]
mon: implement `fs reset`
This is for use in CephFS disaster recovery. When
the metadata pool has been forcibly reset to a single-MDS
metadata tree, we would like to reset the MDSMap to match.
Sage Weil [Mon, 17 Nov 2014 20:46:51 +0000 (12:46 -0800)]
osd/ReplicatedPG: drop unnecessary cache_mode checks
This currently enumerates all cache modes except none, and we don't
arrive in this function when caching is disabled. And creating a whiteout
is not cache_mode dependent. Simplify!