Loic Dachary [Mon, 23 Dec 2013 20:44:38 +0000 (21:44 +0100)]
vstart/stop: do not loop forever on kill
It may be the case that stop.sh can't stop a process for reasons
unrelated to vstart.sh. Because apache runs independantly, for
instance. Instead of trying forever, try twice in a raw ( should be
enough 99% of the case ) and try three more times, sleeping one second
between each try should be more than enough.
Ilya Dryomov [Mon, 23 Dec 2013 16:12:56 +0000 (18:12 +0200)]
crush: use kernel-doc consistently
kernel-doc syntax is "@arg: desc", not "@param arg desc". In addition,
these comments are usually placed around function definitions instead
of function declarations. Follow these guidelines to shrink the diff.
Ilya Dryomov [Mon, 23 Dec 2013 16:12:56 +0000 (18:12 +0200)]
crush/mapper: unsigned -> unsigned int
Kernel implementation is located in net/, and use of "unsigned int" is
preferred to bare "unsigned" in net tree (as proven by several net/
cleanups). Follow this guideline to shrink the diff.
Loic Dachary [Mon, 23 Dec 2013 12:10:18 +0000 (13:10 +0100)]
mon: use kill instead of pkill in osd-pool-create
The --pidfile option of pkill is not supported by all versions. Use kill
instead for compatibility. Instead of looping on : loop on sleep 1 so an
inifinite loop does is slower at filling the disk.
We are relying on connection features to track OSD supported
features. However, we were not forwarding connection features
when we forwarded a message from a peon to the leader. That
was breaking the OSD feature tracking.
Fixes: 7051 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Loic Dachary [Fri, 20 Dec 2013 19:39:21 +0000 (20:39 +0100)]
mon: unit test for osd pool create
It is inconvenient to run such tests in the
qa/workunits/cephtool/test.sh because they require that the mon is
restarted to test errors in the format of the default erasure code
properties and check the appropriate error message is output.
osd-pool-create.sh runs a single mon from sources using command
line options and a temporary directory, the same way vstart.sh does but
lightweight.
Loic Dachary [Sun, 22 Dec 2013 22:37:08 +0000 (23:37 +0100)]
mon: erasure code pool properties defaults
If no properties are set when creating an erasure coded pool, default to
using the jerasure plugin with the cauchy_good technique which is the
fastest.
The defaults are set with osd_pool_default_erasure_code_properties.
The erasure code plugins are loaded from the directory specified in the
erasure-code-directory property. Contrary to the other properties it
will most commonly be the same throughout the cluster. The default is
set to /usr/lib/ceph/erasure-code with
osd_pool_default_erasure_code_directory
Loic Dachary [Sat, 21 Dec 2013 12:58:44 +0000 (13:58 +0100)]
common: implement get_str_map to parse key/values
It is capable of parsing json or key=value pairs. The prototype is made
to look like get_str_list. The implementation is in common + include and
use .h. It will probably be moved to common and use .hpp instead, along
with str_list.{cc,h}.
Loic Dachary [Sat, 21 Dec 2013 14:49:19 +0000 (15:49 +0100)]
mon: osd create pool must fail on incompatible type
When osd create pool is called twice on the same pool, it will succeed
because the pool already exists. However, if a different type is
specified, it must fail.
Loic Dachary [Fri, 20 Dec 2013 16:05:45 +0000 (17:05 +0100)]
packaging: erasure-code plugins go in /usr/lib/ceph
Install the plugins in /usr/lib/ceph/erasure-code instead of
/usr/lib/erasure-code to comply with FHS : "Applications may use a
single subdirectory under /usr/lib."
Loic Dachary [Sun, 22 Dec 2013 17:26:42 +0000 (18:26 +0100)]
mon: s/rep/replicated/ in pool create prototype
The test is updated to remove unecessary asserts. Since all combinations
of properties and pool type are allowed, there is no way to statically
check the validity of the arguments.
Sage Weil [Sun, 22 Dec 2013 17:00:43 +0000 (09:00 -0800)]
rgw: add -ldl for mongoose
/usr/bin/ld: mongoose/mongoose.o: undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line
error: collect2: ld returned 1 exit status
Noah Watkins [Sat, 21 Dec 2013 19:08:59 +0000 (13:08 -0600)]
linux_version: build on all platforms
This linux version check is used in FileJournal to check about write
caching behavior. This is a temporary fix that will result in the
failure path and a warning about writing caching being turned on until
methods for OSX/FreeBSD/Windows can be found to find the same
information.
Noah Watkins [Sat, 21 Dec 2013 19:03:05 +0000 (13:03 -0600)]
make: add libcommon for missing symbols
On OSX without linking in libcommon at the end of these make targets
there is a missing reference to pipe_cloexec, even though the dependency
is present indirectly through libglobal.
valloc conflicts with an existing call, and none of these macros are
actually used in buffer.h. The DARWIN check isn't valid either since
this is an installed header and that depends on acconfig.h
Loic Dachary [Thu, 12 Dec 2013 22:14:02 +0000 (23:14 +0100)]
osd: erasure code benchmark workunit
Display benchmark results for the default erasure code plugins, in a tab
separated CSV file. The first two column contain the amount of KB
that were coded or decoded, for a given combination of parameters
displayed in the following fields.
seconds KB plugin k m work. iter. size eras.
1.2 10 example 2 1 encode 10 1024 0
0.5 10 example 2 1 decode 10 1024 1
It can be used as input for a human readable report. It is also intented
to be used to show if a given version of an erasure code plugin performs
better than another.
The last column ( not shown above for brievety ) is the exact command
that was run to produce the result so it can be copy / pasted to
reproduce them or to profile.
Only the jerasure techniques mentionned in
https://www.usenix.org/legacy/events/fast09/tech/full_papers/plank/plank_html/
are benchmarked, the others are assumed to be less interesting.
Loic Dachary [Fri, 13 Dec 2013 23:41:03 +0000 (00:41 +0100)]
osd: set erasure code packet size default to 2048
As shown in
https://www.usenix.org/legacy/events/fast09/tech/full_papers/plank/plank_html/
under "Impact of the Packet Size", the optimal for is in the order of 1k
rather than the current default of 8. Benchmarks are required to find
the actual optimum.
Loic Dachary [Fri, 13 Dec 2013 13:07:37 +0000 (14:07 +0100)]
osd: better performances for the erasure code example
The XOR based example is ten times slower than it could because it uses
the buffer::ptr[] operator. Use a temporary char * instead. It performs
as well as jerasure Reed Solomon when decoding with a single erasure:
Loic Dachary [Thu, 12 Dec 2013 13:03:26 +0000 (14:03 +0100)]
osd: conditionally disable dlclose of erasure code plugins
When profiling, tools such as valgrind --tool=callgrind require that the
dynamically loaded libraries are not dlclosed so they can collect usage
information.
The public ErasureCodePluginRegistry::disable_dlclose boolean is introduced
for this purpose.
David Zafman [Thu, 19 Dec 2013 22:37:28 +0000 (14:37 -0800)]
osd: Fix assert which doesn't apply when compat_mode on
Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit edaec9a8361396bd4c12814c16610669694b5b6c)
David Zafman [Tue, 17 Dec 2013 06:08:07 +0000 (22:08 -0800)]
Add backward comptible acting set until all OSDs updated
Add configuration variable to override compatible acting set handling.
Later we'll check the osdmap that all OSDs are updated to use new acting sets.
Fixes: #6990 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 19cff890eb6083eefdb7b709773313b2c8acbcea)
Sage Weil [Thu, 19 Dec 2013 21:12:20 +0000 (13:12 -0800)]
osd/ReplicatedPG: drop RepGather::ondone callback
We kick the blocked contexts in the completion path of process_copy_chunk(),
after we have take the RWWRITE obc lock. There is no need to delay the
unblocking until the RepGather finishes.
This also fixes a leak: the ondone wasn't getting cleaned up if a peering
interval change happens and the repgather is applied early in on_change().
Sage Weil [Sat, 14 Dec 2013 00:39:02 +0000 (16:39 -0800)]
osd/ReplicatedPG: EBUSY on cache-evict when watchers are present
Linger operations will follow the object to the cache pool when the pool
overlay process is set. If we evict the object, the object_info_t will
go away along with the watch state and confusing things will happen.
Prevent that from happening by returning EBUSY when you try to evict a
watched object.
Note that you *can* flush a watched object, and the dirty flag will be
cleared. But you still can't evict it.
Sage Weil [Fri, 13 Dec 2013 21:41:58 +0000 (13:41 -0800)]
osd/ReplicatedPG: fix locking for promote
After we get the copy-from data and unblock the obc, we still need to take
the RWWRITE lock on the object for the duration of the repop while we
actually apply the change locally.
Sage Weil [Thu, 12 Dec 2013 23:40:41 +0000 (15:40 -0800)]
osd/ReplicatedPG: fix user_version preservation for copy_from
In the process of fixing this for flush, we break promote, so we need to
adjust them both here. Basic strategy: do not set user_modify, but handle
the user_version explicitly in the callbacks.
For copy_from, we don't have a clean way to pass the result through to
finish_copyfrom in do_osd_ops; do so by putting it in user_at_version. (If
we were to call finish_copyfrom directly from the callback this might
be simpler, but let's not go there right now.)