We are relying on connection features to track OSD supported
features. However, we were not forwarding connection features
when we forwarded a message from a peon to the leader. That
was breaking the OSD feature tracking.
Fixes: 7051 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Loic Dachary [Fri, 20 Dec 2013 19:39:21 +0000 (20:39 +0100)]
mon: unit test for osd pool create
It is inconvenient to run such tests in the
qa/workunits/cephtool/test.sh because they require that the mon is
restarted to test errors in the format of the default erasure code
properties and check the appropriate error message is output.
osd-pool-create.sh runs a single mon from sources using command
line options and a temporary directory, the same way vstart.sh does but
lightweight.
Loic Dachary [Sun, 22 Dec 2013 22:37:08 +0000 (23:37 +0100)]
mon: erasure code pool properties defaults
If no properties are set when creating an erasure coded pool, default to
using the jerasure plugin with the cauchy_good technique which is the
fastest.
The defaults are set with osd_pool_default_erasure_code_properties.
The erasure code plugins are loaded from the directory specified in the
erasure-code-directory property. Contrary to the other properties it
will most commonly be the same throughout the cluster. The default is
set to /usr/lib/ceph/erasure-code with
osd_pool_default_erasure_code_directory
Loic Dachary [Sat, 21 Dec 2013 12:58:44 +0000 (13:58 +0100)]
common: implement get_str_map to parse key/values
It is capable of parsing json or key=value pairs. The prototype is made
to look like get_str_list. The implementation is in common + include and
use .h. It will probably be moved to common and use .hpp instead, along
with str_list.{cc,h}.
Loic Dachary [Sat, 21 Dec 2013 14:49:19 +0000 (15:49 +0100)]
mon: osd create pool must fail on incompatible type
When osd create pool is called twice on the same pool, it will succeed
because the pool already exists. However, if a different type is
specified, it must fail.
Loic Dachary [Fri, 20 Dec 2013 16:05:45 +0000 (17:05 +0100)]
packaging: erasure-code plugins go in /usr/lib/ceph
Install the plugins in /usr/lib/ceph/erasure-code instead of
/usr/lib/erasure-code to comply with FHS : "Applications may use a
single subdirectory under /usr/lib."
Loic Dachary [Sun, 22 Dec 2013 17:26:42 +0000 (18:26 +0100)]
mon: s/rep/replicated/ in pool create prototype
The test is updated to remove unecessary asserts. Since all combinations
of properties and pool type are allowed, there is no way to statically
check the validity of the arguments.
Sage Weil [Sun, 22 Dec 2013 17:00:43 +0000 (09:00 -0800)]
rgw: add -ldl for mongoose
/usr/bin/ld: mongoose/mongoose.o: undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line
error: collect2: ld returned 1 exit status
Noah Watkins [Sat, 21 Dec 2013 19:08:59 +0000 (13:08 -0600)]
linux_version: build on all platforms
This linux version check is used in FileJournal to check about write
caching behavior. This is a temporary fix that will result in the
failure path and a warning about writing caching being turned on until
methods for OSX/FreeBSD/Windows can be found to find the same
information.
Noah Watkins [Sat, 21 Dec 2013 19:03:05 +0000 (13:03 -0600)]
make: add libcommon for missing symbols
On OSX without linking in libcommon at the end of these make targets
there is a missing reference to pipe_cloexec, even though the dependency
is present indirectly through libglobal.
valloc conflicts with an existing call, and none of these macros are
actually used in buffer.h. The DARWIN check isn't valid either since
this is an installed header and that depends on acconfig.h
Loic Dachary [Thu, 12 Dec 2013 22:14:02 +0000 (23:14 +0100)]
osd: erasure code benchmark workunit
Display benchmark results for the default erasure code plugins, in a tab
separated CSV file. The first two column contain the amount of KB
that were coded or decoded, for a given combination of parameters
displayed in the following fields.
seconds KB plugin k m work. iter. size eras.
1.2 10 example 2 1 encode 10 1024 0
0.5 10 example 2 1 decode 10 1024 1
It can be used as input for a human readable report. It is also intented
to be used to show if a given version of an erasure code plugin performs
better than another.
The last column ( not shown above for brievety ) is the exact command
that was run to produce the result so it can be copy / pasted to
reproduce them or to profile.
Only the jerasure techniques mentionned in
https://www.usenix.org/legacy/events/fast09/tech/full_papers/plank/plank_html/
are benchmarked, the others are assumed to be less interesting.
Loic Dachary [Fri, 13 Dec 2013 23:41:03 +0000 (00:41 +0100)]
osd: set erasure code packet size default to 2048
As shown in
https://www.usenix.org/legacy/events/fast09/tech/full_papers/plank/plank_html/
under "Impact of the Packet Size", the optimal for is in the order of 1k
rather than the current default of 8. Benchmarks are required to find
the actual optimum.
Loic Dachary [Fri, 13 Dec 2013 13:07:37 +0000 (14:07 +0100)]
osd: better performances for the erasure code example
The XOR based example is ten times slower than it could because it uses
the buffer::ptr[] operator. Use a temporary char * instead. It performs
as well as jerasure Reed Solomon when decoding with a single erasure:
Loic Dachary [Thu, 12 Dec 2013 13:03:26 +0000 (14:03 +0100)]
osd: conditionally disable dlclose of erasure code plugins
When profiling, tools such as valgrind --tool=callgrind require that the
dynamically loaded libraries are not dlclosed so they can collect usage
information.
The public ErasureCodePluginRegistry::disable_dlclose boolean is introduced
for this purpose.
Alexandre Oliva [Thu, 19 Dec 2013 16:09:46 +0000 (08:09 -0800)]
mds: fix Resetter locking
ceph-mds --reset-journal didn't work; it would deadlock waiting for
the osdmap. Comparing the init code in the Dumper (that worked) with
that in the Resetter (that didn't), I noticed the lock had to be
released before waiting for the osdmap.
Now the resetter works. However, both the resetter and the dumper
fail an assertion after they've performed their task; I didn't look
into it:
../../src/msg/SimpleMessenger.cc: In function 'void SimpleMessenger::reaper()' t
hread 7fdc188d27c0 time 2013-12-19 04:48:16.930895
../../src/msg/SimpleMessenger.cc: 230: FAILED assert(!cleared)
ceph version 0.72.1-6-g6bca44e (6bca44ec129d11f1c4f38357db8ae435616f2c7c)
1: (SimpleMessenger::reaper()+0x706) [0x880da6]
2: (SimpleMessenger::wait()+0x36f) [0x88180f]
3: (Resetter::reset()+0x714) [0x56e664]
4: (main()+0x1359) [0x562769]
5: (__libc_start_main()+0xf5) [0x3632e21b45]
6: /l/tmp/build/ceph/build/src/ceph-mds() [0x564e49]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to int
erpret this.
2013-12-19 04:48:16.934093 7fdc188d27c0 -1 ../../src/msg/SimpleMessenger.cc: In
function 'void SimpleMessenger::reaper()' thread 7fdc188d27c0 time 2013-12-19 04
:48:16.930895
../../src/msg/SimpleMessenger.cc: 230: FAILED assert(!cleared)
osd: OSD: reflect OSDMap EC flag being set by setting on-disk feature
If OSDMap has the EC feature set, then update our superblock to
reflect as such, making our on-disk format incompatible with previous
OSDs without EC support.
Fixes: 6028 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Add the osd's features to the osd's extra info field in the OSDMap
so we can track which OSDs are able to deal with Erasure Codes.
This will allow us to decide whether or not we are ready to set EC
whenever the user asks us to set EC on a pool -- which shall be
handled by a subsequent commit.
Fixes: 6028 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Loic Dachary [Wed, 18 Dec 2013 16:16:08 +0000 (17:16 +0100)]
crush: silence error messages in unit tests
The error messages are intentional when error conditions are
created. They will create false positive in the gitbuilder parser when
the string error is found.
The --debug-crush flag is detected to allow the caller to reset the
verbosity level.
Sage Weil [Tue, 17 Dec 2013 17:28:43 +0000 (09:28 -0800)]
mon: warn if crush has non-optimal tunables
Allow warning to be disabled via ceph.conf. Link to the docs from the
warning detail. Add a section to the docs specifically about what to do
about the warning.
Laurent Barbe [Wed, 18 Dec 2013 13:20:24 +0000 (14:20 +0100)]
upstart: add rbdmap script
Upstart script for mapping / unmapping rbd device based on /etc/ceph/rbdmap file.
It does not mount or unmount filesystem, this part should be performed by _netdev option in fstab.