Greg Farnum [Mon, 19 Aug 2013 17:29:49 +0000 (10:29 -0700)]
librados: synchronous commands should return on commit instead of ack
This is unlikely to be noticed by anybody, but it is a big change. Document
in the PendingReleaseNotes and bump up the librados minor version number
to 68.
Greg Farnum [Mon, 19 Aug 2013 17:21:16 +0000 (10:21 -0700)]
mon: make MonMap error message about unspecified monitors less specific.
The error message helpfully references the -m and -c CLI options for
specifying monitors, but this code can be invoked from non-core librados
client applications so that's unfortunately not kosher. Remove the
reference.
Sage Weil [Sat, 17 Aug 2013 21:30:37 +0000 (14:30 -0700)]
objclass: move cls_log into class_api.cc
Not sure why but this seems to resolve a linking problem when loading
classes:
2013-08-17 13:28:19.015776 7fb2bcffa700 0 _load_class could not open class /usr/lib/rados-classes/libcls_hello.so (dlopen failed): /usr/lib/rados-classes/libcls_hello.so: undefined symbol: cls_log
2013-08-17 13:28:19.015786 7fb2bcffa700 -1 osd.4 12 class hello open got (5) Input/output error
Sage Weil [Sat, 17 Aug 2013 05:08:00 +0000 (22:08 -0700)]
mds: create only one ESubtreeMap during fs creation
Previously we would create an empty ESubtreeMap when we opened the log
segment and then immediately journal a second one that created the root
and mdsdir. More importantly, for the second ESubtreeMap, we would not
wait for it to commit before requesting the ACTIVE state, leading to
#4894.
Instead, break start_new_segment() into two steps: one that creates the
in-memory LogSegment tracking structure, and one that journals the
ESubtreeMap. Open things early and write the (one) ESubtreeMap at the
end of boot_create().. and then wait for it.
Fixes: #4894 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Sage Weil [Sat, 17 Aug 2013 00:59:11 +0000 (17:59 -0700)]
ceph-post-file: single command to upload a file to cephdrop
Use sftp to upload to a directory that only this user and ceph devs can
access.
Distribute an ssh key to connect to the account. This will let us revoke
the key in the future if we feel the need. Also distribute a known_hosts
file so that users have some confidence that they are connecting to the
real ceph drop account and not some third party.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Thu, 15 Aug 2013 23:19:21 +0000 (16:19 -0700)]
osd: enforce RD, WR flags for class methods
Class methods are marked with RD and WR to help the OSD decide when we need
to flush objects or require certain permissions. Ensure that methods do
not step outside their advertised capabilities by keeping a counter of rd
and wr ops we perform in do_osd_ops() and making sure that class methods,
and any ops the indirectly call, do not break the rules.
Sage Weil [Thu, 15 Aug 2013 22:22:41 +0000 (15:22 -0700)]
cls_rbd: remove old assign_bid method
This method is problematic because it both writes/mutates and returns data,
which means that an untimely client disconnect or peering event will result
in a success to the client with no payload.
It has not been used since v0.52 (18054ba46fe2779d8df8b1a0d69ec93ca6a66c34)
which is pre-bobtail; so this change breaks compatibility with pre-bobtail
librbd clients (at least for image creation).
Sage Weil [Thu, 15 Aug 2013 22:06:38 +0000 (15:06 -0700)]
osd: do not return data payload for successful writes
We were somewhat inadvertantly returning a data payload for write
operations. This was a side-effect of the OpContext::ops field being a
reference to MOSDOp::ops: the return data would end up there, and then
the MOSDOpReply ctor would copy it.
Fix this by breaking the ref, and making the do_op() logic also claim
return result data for error values (so that errors can return data to the
caller).
Sage Weil [Thu, 15 Aug 2013 21:35:28 +0000 (14:35 -0700)]
common/Preforker: shut up warning
common/Preforker.h: In member function 'void Preforker::daemonize()':
common/Preforker.h:97:40: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
Sage Weil [Thu, 15 Aug 2013 21:36:57 +0000 (14:36 -0700)]
config: fix stringification of config values
The std::copy construct leaves a trailing separator character, which breaks
parsing for booleans (among other things) and probably mangles everything
else too.
Backport: dumpling Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Thu, 15 Aug 2013 20:42:50 +0000 (13:42 -0700)]
config: fix stringification of config values
The std::copy construct leaves a trailing separator character, which breaks
parsing for booleans (among other things) and probably mangles everything
else too.
Backport: dumpling Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Scott Devoid [Thu, 1 Aug 2013 16:20:27 +0000 (11:20 -0500)]
Document unstable nature of CephFS
- Add note to docs indicating that CephFS is not recommended for
production datasets.
- Add note to docs indicating that running CephFS with multiple MDS
servers is not currently recommended.
This fixes issue #5797 http://tracker.ceph.com/issues/5797
Yan, Zheng [Wed, 7 Aug 2013 06:38:22 +0000 (14:38 +0800)]
store: Add (experimental) ZFS parallel journal support
This patch adds ZFS parallel journal support. It uses libzfs provided by
zfsonlinux to access ZFS' functionalities. To enable ZFS parallel journal
support, compile ceph by:
./configure --with-libzfs LIBZFS_CFLAGS="-I<libzfs header> -I<libspl header>"
make
Add following line to osd section of ceph.conf
filestore zfs_snap = 1
Note: ZFS (no mater parallel journal is enabled or not) does not support
direct IO. To use it as backend FS for OSD, you need to add following line
to osd section of ceph.conf
Josh Durgin [Wed, 14 Aug 2013 22:28:19 +0000 (15:28 -0700)]
rados.py: fix Rados() backwards compatibility
Previously it had no name parameter, so the default will be used by
old clients. However, if an old client set rados_id, a new check that
both rados_id and name are set would result in an error. Fix this by
only applying the default names after the check, and add tests of this
behavior.
Fixes: #5970 Reported-by: Michael Morgan <mmorgan@dca.net> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage.weil@inktank.com>
Sage Weil [Tue, 13 Aug 2013 19:52:41 +0000 (12:52 -0700)]
librados: fix async aio completion wakeup
For aio flush, we register a wait on the most recent write. The write
completion code, however, was *only* waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we
Josh Durgin [Tue, 13 Aug 2013 02:17:09 +0000 (19:17 -0700)]
librados: fix locking for AioCompletionImpl refcounting
Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.
This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.
The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.
Refs: #5919 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Tested-by: Oliver Francke <Oliver.Francke@filoo.de>
Loic Dachary [Mon, 12 Aug 2013 13:20:57 +0000 (15:20 +0200)]
remove racy test assertions
Do not assert before the loop waiting for the thread to complete the
expected side effect. The whole point of the loop is to make sure
there is no window of opportunity for a race condition and asserting
before it means taking a useless risk. If run enough times, it will
happen.
Sage Weil [Tue, 13 Aug 2013 23:12:08 +0000 (16:12 -0700)]
types: pretty_si_t
Similar to si_t, but leaves a space between the numbers and the units. In
the degenerate case (no M, K, etc. modifier) there's simply a trailing
space. For example,