Ilya Dryomov [Fri, 27 Dec 2013 17:40:59 +0000 (19:40 +0200)]
rbd: expose options available to rbd map
Add a -o / --options option, which would allow users to specify
rbd-specific and generic ceph client and osd options available at
mapping time in a comma separated list (similar to mount(8) mount
options).
Josh Durgin [Fri, 27 Dec 2013 01:38:52 +0000 (17:38 -0800)]
librbd: call user completion after incrementing perfcounters
The perfcounters (and the ictx) are only valid while the image is
still open. If the librbd user gets the callback for its last I/O,
then closes the image, the ictx and its perfcounters will be
invalid. If the AioCompletion object is has not run the rest of its
complete() method yet, it will access these now-invalid addresses,
possibly leading to a crash.
The AioCompletion object is independent of the ictx and does not
access it again after incrementing perfcounters, so avoid this race by
calling the user's callback after this step. The AioCompletion object
will be cleaned up by the rest of complete_request(), independent of
the ImageCtx.
Loic Dachary [Thu, 26 Dec 2013 11:23:50 +0000 (12:23 +0100)]
osd: create default ruleset for erasure pools
The ruleset --osd_pool_default_crush_erasure_ruleset is created to be
suitable for erasure coded pools when OSDMap::build_simple is required
to build the default OSD map of a new cluster.
Loic Dachary [Thu, 26 Dec 2013 10:20:41 +0000 (11:20 +0100)]
osd: use CrushWrapper::add_simple_ruleset
Replace the manually crafted ruleset in OSDMap::build_simple_crush_map*
with calls to add_simple_ruleset. The generated ruleset do not have the
same behavior but that presumably do not cause any backward
compatibility problem because they are only created when a new cluster
is being initialized.
The prototypes of OSDMap::build_simple* are modified to allow for a
return code and display of a human readable error message.
The --osd-min-rep and --osd-max-rep configuration options are removed :
they were only used in the code that was removed.
Loic Dachary [Wed, 25 Dec 2013 12:19:56 +0000 (13:19 +0100)]
osd: build_simple creates a single rule
The three rules created by build_simple are identical. They are replaced
by a single rule named replicated_rule which is set to be used by the
data, rbd and metadata pools.
Instead of hardcoding the ruleset number to zero, it is read from
osd_pool_default_crush_ruleset which defaults to zero.
The CEPH_DEFAULT_CRUSH_REPLICATED_RULESET enum is moved from osd_type.h to
config.h because it may be needed when osd_type.h is not included.
Loic Dachary [Thu, 26 Dec 2013 23:10:55 +0000 (00:10 +0100)]
crush: set min_rep and max_rep depending on mode
Assuming firstn is for replica and indep is for erasure. This is a
strong constraint but it is unlikely to make the resulting ruleset unfit
to be used in most cases.
Loic Dachary [Thu, 26 Dec 2013 07:39:52 +0000 (08:39 +0100)]
qa: remove osd pool create erasure tests
Creating an erasure pool will crash the OSD because OSD::_make_pg
asserts if the type is not replicated. The tests related to erasure
coded pool creation are removed from qa/workunits/cephtool/test.sh.
The osd-create-pool.sh unit test covers the cases removed from test.sh
more extensively. The intent is to check the interactions with the MON
only, therefore it does not run an OSD and the absence of erasure code
placement group backend implementation is not an issue.
Loic Dachary [Wed, 25 Dec 2013 12:30:34 +0000 (13:30 +0100)]
mon: MDS data and metadata pool numbers are hardcoded
The MDS assumes pool 0 and 1 are suitable for data and metadata
respectively. Instead of relying on the CEPH_DATA_RULE and
CEPH_METADATA_RULE constants that only match by chance, set a hardcoded
value specific to MDS to reduce the fragility of the hardcoded
assumption.
Haomai Wang [Thu, 26 Dec 2013 03:20:52 +0000 (11:20 +0800)]
Fix WBThrottle thread disappear problem
New ceph_osd.cc code did ObjectStore init work before global_init_daemonize(),
and WBThrottle thread is created when objectstore constructed. So after
daemon(), WBThrottle thread won't exist in new process. It will result in
deadlock.
When "cur_ios" which is member of WBThrottle hits hard limit, there exists two
ways to decrease "cur_ios". The first is WBThrottle thread which is dead if
deamonize, another is SyncThread. SyncThread will block at op_tp.pause()
because thread in op_tp(threadpool) block at
wbthrottle.throttle(FileStore::doop). So no thread will continue process jobs
in filestore layer and all threads is waiting.
Ilya Dryomov [Wed, 25 Dec 2013 19:41:16 +0000 (21:41 +0200)]
ceph_argparse: kill _daemon versions of argparse calls
Commit c76bbc2e6df1, which introduced _daemon versions of some of the
argparse calls, also changed the behaviour of non-_daemon versions.
The change resulted in incorrect error messages, e.g.
$ ./rbd create b0 --size
Option --size requires an argument.
The users of _daemon versions were added in commit be801f6c506d and
removed in commit f26bd55e57f1, so just kill the _daemon versions and
restore the old behaviour. (This effectively reverts commit c76bbc2e6df1.)
Sage Weil [Mon, 23 Dec 2013 21:14:43 +0000 (13:14 -0800)]
librados: mark old get_version() as deprecated
Use the newly-discovered (for me) deprecated attribute to mark the old
get_version() method and point users toward get_version64(). And fix a
couple of users in the kvstore code!
Loic Dachary [Mon, 23 Dec 2013 20:44:38 +0000 (21:44 +0100)]
vstart/stop: do not loop forever on kill
It may be the case that stop.sh can't stop a process for reasons
unrelated to vstart.sh. Because apache runs independantly, for
instance. Instead of trying forever, try twice in a raw ( should be
enough 99% of the case ) and try three more times, sleeping one second
between each try should be more than enough.
Ilya Dryomov [Mon, 23 Dec 2013 16:12:56 +0000 (18:12 +0200)]
crush: use kernel-doc consistently
kernel-doc syntax is "@arg: desc", not "@param arg desc". In addition,
these comments are usually placed around function definitions instead
of function declarations. Follow these guidelines to shrink the diff.
Ilya Dryomov [Mon, 23 Dec 2013 16:12:56 +0000 (18:12 +0200)]
crush/mapper: unsigned -> unsigned int
Kernel implementation is located in net/, and use of "unsigned int" is
preferred to bare "unsigned" in net tree (as proven by several net/
cleanups). Follow this guideline to shrink the diff.
Loic Dachary [Mon, 23 Dec 2013 12:10:18 +0000 (13:10 +0100)]
mon: use kill instead of pkill in osd-pool-create
The --pidfile option of pkill is not supported by all versions. Use kill
instead for compatibility. Instead of looping on : loop on sleep 1 so an
inifinite loop does is slower at filling the disk.
We are relying on connection features to track OSD supported
features. However, we were not forwarding connection features
when we forwarded a message from a peon to the leader. That
was breaking the OSD feature tracking.
Fixes: 7051 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Loic Dachary [Fri, 20 Dec 2013 19:39:21 +0000 (20:39 +0100)]
mon: unit test for osd pool create
It is inconvenient to run such tests in the
qa/workunits/cephtool/test.sh because they require that the mon is
restarted to test errors in the format of the default erasure code
properties and check the appropriate error message is output.
osd-pool-create.sh runs a single mon from sources using command
line options and a temporary directory, the same way vstart.sh does but
lightweight.
Loic Dachary [Sun, 22 Dec 2013 22:37:08 +0000 (23:37 +0100)]
mon: erasure code pool properties defaults
If no properties are set when creating an erasure coded pool, default to
using the jerasure plugin with the cauchy_good technique which is the
fastest.
The defaults are set with osd_pool_default_erasure_code_properties.
The erasure code plugins are loaded from the directory specified in the
erasure-code-directory property. Contrary to the other properties it
will most commonly be the same throughout the cluster. The default is
set to /usr/lib/ceph/erasure-code with
osd_pool_default_erasure_code_directory
Loic Dachary [Sat, 21 Dec 2013 12:58:44 +0000 (13:58 +0100)]
common: implement get_str_map to parse key/values
It is capable of parsing json or key=value pairs. The prototype is made
to look like get_str_list. The implementation is in common + include and
use .h. It will probably be moved to common and use .hpp instead, along
with str_list.{cc,h}.
Loic Dachary [Sat, 21 Dec 2013 14:49:19 +0000 (15:49 +0100)]
mon: osd create pool must fail on incompatible type
When osd create pool is called twice on the same pool, it will succeed
because the pool already exists. However, if a different type is
specified, it must fail.
Loic Dachary [Fri, 20 Dec 2013 16:05:45 +0000 (17:05 +0100)]
packaging: erasure-code plugins go in /usr/lib/ceph
Install the plugins in /usr/lib/ceph/erasure-code instead of
/usr/lib/erasure-code to comply with FHS : "Applications may use a
single subdirectory under /usr/lib."
Loic Dachary [Sun, 22 Dec 2013 17:26:42 +0000 (18:26 +0100)]
mon: s/rep/replicated/ in pool create prototype
The test is updated to remove unecessary asserts. Since all combinations
of properties and pool type are allowed, there is no way to statically
check the validity of the arguments.
Sage Weil [Sun, 22 Dec 2013 17:00:43 +0000 (09:00 -0800)]
rgw: add -ldl for mongoose
/usr/bin/ld: mongoose/mongoose.o: undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line
error: collect2: ld returned 1 exit status