Loic Dachary [Mon, 30 Dec 2013 11:26:20 +0000 (12:26 +0100)]
ceph-disk: prepare --data-dir must not override files
ceph-disk does nothing when given a device that is already prepared. If
given a directory that already contains a successfully prepared OSD, it
will however override it.
Instead of overriding the files in the osd data directory, return
immediately if the magic file exists. Make it so the magic file is
created last to accurately reflect the success of the OSD preparation.
Loic Dachary [Sun, 29 Dec 2013 12:14:14 +0000 (13:14 +0100)]
mon: make ceph-mon --mkfs idempotent
A mon is considered to exist if the mon-data directory exists and is not
empty. If ceph-mon --mkfs is run twice, it will display succeed the
second time around and display an informative message.
Noah Watkins [Mon, 30 Dec 2013 20:56:32 +0000 (12:56 -0800)]
make: conditionally build filestore backends
Each of btrfs and zfs backends are wrapped in if __linux__ and if
WITH_ZFS, respectively, resulting in empty object files and the
associated warnings. This builds them under the same conditions.
According to this https://code.google.com/p/googletest/source/detail?r=446
the use of unnamed types (in this case the protection flag enums from
librbd/parent_types.h) as template parameters (in this case the gtest
macros) is not valid C++ pre C++0x.
As suggested, converting the enum into an int with integral promotion
via unary plus operator solves the problem.
Noah Watkins [Mon, 30 Dec 2013 20:14:02 +0000 (12:14 -0800)]
make: avoid symbol exporting for C++ libs on non-Linux
This removes export-symbol-regex for installed libraries with C++
interfaces on non-Linux where the hidden symbols are not resolved. This
is a temporary fix.
See ceph-devel topic "Shared library symbol visibility" for discussion
about a perm solution.
Noah Watkins [Mon, 30 Dec 2013 20:10:53 +0000 (12:10 -0800)]
make: add top-level libcommon dependency
On OSX there is consistently a problem with resolving pipe_cloexec and other
symbols through indirect libtool dependencies (below libglobal has a dependency
on libcommon). This makes the dependency top-level for most executables.
CXXLD ceph_test_timers
Undefined symbols for architecture x86_64:
"_pipe_cloexec", referenced from:
AdminSocket::create_shutdown_pipe(int*, int*) in libglobal.a(admin_socket.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Rutger ter Borg [Wed, 27 Nov 2013 08:49:00 +0000 (00:49 -0800)]
librados: read into user's bufferlist for aio_read
* The 'buf' argument to read() used to be passed into
AioCompletionImpl, and the results would be copied back after
reading. This is replaced with the creation of a static buffer of
that buf.
* The pbl argument in AioCompletionImpl is removed.
The patch is tested against an application using librados. I've
assumed that 'pbl' in
aio_read( ...., pbl, )
is allocated by the user. It may even speed things up: a buffer copy
is prevented.
Loic Dachary [Mon, 30 Dec 2013 09:30:51 +0000 (10:30 +0100)]
common: evaluate --show-config* after CEPH_ARGS
The content of CEPH_ARGS is appended to the list of arguments. When
--show-config or --show-config-value is also set, it should be evaluated
after all arguments are parsed to accurately reflect the value that
would be visible to the program.
It failed to do so because the action for --show-config* was carried out
immediately. It is postponed until all options are parsed instead.
Loic Dachary [Mon, 30 Dec 2013 09:09:41 +0000 (10:09 +0100)]
vstart: set fsid in [global]
If not set, commands that rely on --show-config-value fsid or something
equivalent will fail. ceph-disk does, for instance and setting the fsid
in CEPH_ARGS won't help because it will be appended after
--show-config-value :
Noah Watkins [Sun, 29 Dec 2013 18:01:38 +0000 (10:01 -0800)]
gtest: disable tr1/tuple
Not all compilers are supporting tr1/tuple. This forces libgtest to use
an internal implementation of tuple. Alternatively, the newer 1.6
version of gtest may correctly handle this case automatically.
Noah Watkins [Sun, 29 Dec 2013 18:32:22 +0000 (10:32 -0800)]
c++11: fix std::lock naming conflicts
Unfortunately, 'using namespace std;' is in pretty widespread use in the Ceph
tree, so we need to rename to avoid the conflict.
Example error output:
test/streamtest.cc:37:19: error: reference to 'lock' is ambiguous
Mutex::Locker l(lock);
^
test/streamtest.cc:32:7: note: candidate found by name lookup is 'lock'
Mutex lock("streamtest.cc lock");
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/mutex:346:1: note: candidate found by name lookup is 'std::__1::lock'
lock(_L0& __l0, _L1& __l1)
Noah Watkins [Sun, 29 Dec 2013 21:10:02 +0000 (13:10 -0800)]
kvstore: only build on linux
There are several non-standard errno values used. There is still work to
do on addressing errno portability in Ceph, and this disables kvstore on
non-Linux platforms until that work is complete.
Adds a ceph_spinlock_t implementation that will use pthread_spinlock_t
if available, and otherwise reverts to pthread_mutex_t. Note that this
spinlock is not intended to be used in process-shared memory.
Switches implementation in:
ceph_context
SimpleMessenger
atomic_t
Only ceph_context initialized its spinlock with PTHREAD_PROCESS_SHARED.
However, there does not appear to be any instance in which CephContext
is allocated in shared memory, and thus can use the default private
memory space behavior.
Haomai Wang [Sat, 28 Dec 2013 09:57:43 +0000 (17:57 +0800)]
Fix rbd bench-write improper behavior
"rbd bench-write" eject all write operations with the same offset at the same
time. It will result in non-objective performance result from this command.
fix #7066
Co-Author: Rongze Zhu <rongze@unitedstack.com> Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
Ilya Dryomov [Fri, 27 Dec 2013 17:40:59 +0000 (19:40 +0200)]
rbd: expose options available to rbd map
Add a -o / --options option, which would allow users to specify
rbd-specific and generic ceph client and osd options available at
mapping time in a comma separated list (similar to mount(8) mount
options).
Josh Durgin [Fri, 27 Dec 2013 01:38:52 +0000 (17:38 -0800)]
librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Loic Dachary [Thu, 26 Dec 2013 11:23:50 +0000 (12:23 +0100)]
osd: create default ruleset for erasure pools
The ruleset --osd_pool_default_crush_erasure_ruleset is created to be
suitable for erasure coded pools when OSDMap::build_simple is required
to build the default OSD map of a new cluster.
Loic Dachary [Thu, 26 Dec 2013 10:20:41 +0000 (11:20 +0100)]
osd: use CrushWrapper::add_simple_ruleset
Replace the manually crafted ruleset in OSDMap::build_simple_crush_map*
with calls to add_simple_ruleset. The generated ruleset do not have the
same behavior but that presumably do not cause any backward
compatibility problem because they are only created when a new cluster
is being initialized.
The prototypes of OSDMap::build_simple* are modified to allow for a
return code and display of a human readable error message.
The --osd-min-rep and --osd-max-rep configuration options are removed :
they were only used in the code that was removed.
Loic Dachary [Wed, 25 Dec 2013 12:19:56 +0000 (13:19 +0100)]
osd: build_simple creates a single rule
The three rules created by build_simple are identical. They are replaced
by a single rule named replicated_rule which is set to be used by the
data, rbd and metadata pools.
Instead of hardcoding the ruleset number to zero, it is read from
osd_pool_default_crush_ruleset which defaults to zero.
The CEPH_DEFAULT_CRUSH_REPLICATED_RULESET enum is moved from osd_type.h to
config.h because it may be needed when osd_type.h is not included.
Loic Dachary [Thu, 26 Dec 2013 23:10:55 +0000 (00:10 +0100)]
crush: set min_rep and max_rep depending on mode
Assuming firstn is for replica and indep is for erasure. This is a
strong constraint but it is unlikely to make the resulting ruleset unfit
to be used in most cases.
Loic Dachary [Thu, 26 Dec 2013 07:39:52 +0000 (08:39 +0100)]
qa: remove osd pool create erasure tests
Creating an erasure pool will crash the OSD because OSD::_make_pg
asserts if the type is not replicated. The tests related to erasure
coded pool creation are removed from qa/workunits/cephtool/test.sh.
The osd-create-pool.sh unit test covers the cases removed from test.sh
more extensively. The intent is to check the interactions with the MON
only, therefore it does not run an OSD and the absence of erasure code
placement group backend implementation is not an issue.
Loic Dachary [Wed, 25 Dec 2013 12:30:34 +0000 (13:30 +0100)]
mon: MDS data and metadata pool numbers are hardcoded
The MDS assumes pool 0 and 1 are suitable for data and metadata
respectively. Instead of relying on the CEPH_DATA_RULE and
CEPH_METADATA_RULE constants that only match by chance, set a hardcoded
value specific to MDS to reduce the fragility of the hardcoded
assumption.
Haomai Wang [Thu, 26 Dec 2013 03:20:52 +0000 (11:20 +0800)]
Fix WBThrottle thread disappear problem
New ceph_osd.cc code did ObjectStore init work before global_init_daemonize(),
and WBThrottle thread is created when objectstore constructed. So after
daemon(), WBThrottle thread won't exist in new process. It will result in
deadlock.
When "cur_ios" which is member of WBThrottle hits hard limit, there exists two
ways to decrease "cur_ios". The first is WBThrottle thread which is dead if
deamonize, another is SyncThread. SyncThread will block at op_tp.pause()
because thread in op_tp(threadpool) block at
wbthrottle.throttle(FileStore::doop). So no thread will continue process jobs
in filestore layer and all threads is waiting.
Ilya Dryomov [Wed, 25 Dec 2013 19:41:16 +0000 (21:41 +0200)]
ceph_argparse: kill _daemon versions of argparse calls
Commit c76bbc2e6df1, which introduced _daemon versions of some of the
argparse calls, also changed the behaviour of non-_daemon versions.
The change resulted in incorrect error messages, e.g.
$ ./rbd create b0 --size
Option --size requires an argument.
The users of _daemon versions were added in commit be801f6c506d and
removed in commit f26bd55e57f1, so just kill the _daemon versions and
restore the old behaviour. (This effectively reverts commit c76bbc2e6df1.)