Loic Dachary [Thu, 29 Aug 2013 11:31:10 +0000 (13:31 +0200)]
ErasureCodeJerasure: unit test common to all techniques
A typed unit test is defined and must run regardless of the technique.
When a new technique is derived from ErasureCodeJerasure, it is added
to the JerasureTypes typedef and the test will validate that:
* it provides reasonable defaults for the technique specific
parameters
* it modifies the k, m and w to reasonable defaults depending
on the imposed constraints ( for instance Liber8tion requires
that w == 8 but the test sets it to 7 )
* the encoding of K=2, M=2 produces 4 chunks, the first two
of which contains the original buffer data showing the
code is systematic
* decoding when all 4 chunks are available indeed retrieves
the original buffer content
* decoding when the two data chunks are are missing indeed
retrieves the original buffer content
Loic Dachary [Thu, 29 Aug 2013 10:58:53 +0000 (12:58 +0200)]
ErasureCodeJerasure: base class for jerasure ErasureCodeInterface
The ErasureCodeJerasure class is derived from ErasureCodeInterface and
is meant to be derived to implement each jerasure technique (
Reed-Solomon, Cauchy ... ).
The parameters K ( number of data chunks ), M ( number of coding chunks
) and W ( word size ) are data members common to all techniques. The
technique data member is expected to be set to a string describing the
technique for debugging purposes.
minimum_to_decode_with_cost ignores the cost and calls minimum_to_decode.
minimum_to_decode returns the first K chunks or an error if there are
not enough. Since all codes are systematic, when all chunks are
available returning the first K allows for concatenation and is the best
choice.
The encode method converts bufferlist into char* as expected by the
jerasure functions. The padding of the incoming buffer depends on the
technique and is computed by the pad_in_length method. Encoding is done
with the jerasure_encode method.
The decode method converts the char* returned by the jerasure functions
into bufferlists to be consumed by the caller. The decoding is done by
the jerasure_decode method.
The to_int convenience method is used to convert parameters. The
is_prime convenience method will be used by some techniques to validate
parameters.
Immediately after creating an ErasureCodeJerasure derived object, the
init method must be called. It will call the parse method to interpret
the parameters required by the technique and set the k, m and w data
members. The prepare method is expected to compute the matrix ( and
schedule if necessary ) and store it in a data member. The init method
will be called while holding the ErasureCodePluginRegistry mutex. The
encode and decode methods will not be protected by a mutex and may be
called by different threads for the benefit of different placement
groups. They will not have any side effect on the object.
Loic Dachary [Fri, 23 Aug 2013 20:22:08 +0000 (22:22 +0200)]
ErasureCodeJerasure: import jerasure-1.2A
The files are copied verbatim from
http://web.eecs.utk.edu/~plank/plank/papers/Jerasure-1.2A.tar and a
section is added to the top level COPYING file to reflect the BSD
license.
Loic Dachary [Wed, 28 Aug 2013 15:29:18 +0000 (17:29 +0200)]
ErasureCodePlugin: plugin registry tests and example
libec_example.la is a fully functional plugin based on
ErasureCodeExample to test the ErasureCodePlugin abstract
interface. It is dynamically loaded to test the
ErasureCodePluginRegistry implementation.
Although the plugin is built in the test directory, it will be
installed. noinst_LTLIBRARIES won't build the shared library, only the
static version which is not suitable for testing.
Loic Dachary [Wed, 28 Aug 2013 13:57:54 +0000 (15:57 +0200)]
ErasureCodePlugin: plugin registry
A ErasureCodePluginRegistry singleton holds all erasure plugin objects
derived from ErasureCodePlugin and dlopen(2) handles for the lifetime
of the OSD and is cleaned up by the destructor.
The registry has a single entry point ( method factory ) and should
be used as follows:
If the plugin requested ( "jerasure" in the example above ) is not
found in the *plugins* data member, the load method is called and will:
* dlopen(parameters["erasure-code-directory"] + "jerasure")
* f = dlsym("__erasure_code_init")
* f("jerasure")
* check that it registered "jerasure"
The plugin is expected to do something like
instance.add(plugin_name, new ErasureCodePluginJerasure());
to register itself.
The factory method is protected with a Mutex to avoid race
conditions when using the same plugin from two threads.
The erasure_codelib_LTLIBRARIES variable is added to the Makefile
and the plugins are expected to add themselves and be installed
in the $(libdir)/erasure-code
Loic Dachary [Wed, 28 Aug 2013 13:46:34 +0000 (15:46 +0200)]
ErasureCodePlugin: plugin interface
When dynamically loaded, a plugin is expected to define
int __erasure_code_init(char *plugin_name);
When called, it is responsible for registering an ErasureCodePlugin
derived object that provides a factory method from which the concrete
implementation of the ErasureCodeInterface object can be generated:
Loic Dachary [Mon, 19 Aug 2013 17:15:07 +0000 (19:15 +0200)]
ErasureCode: example implementation : K=2 M=1
An erasure code implementation designed for tests. Although it is fully
functional and could be used on actual data, it is mainly provided for
testing purposes. It splits data in two, computes an XOR parity and
can sustain the loss of one chunk.
The constructor will usleep(3) for parameters["usleep"] microseconds
so that the caller can create race conditions.
Loic Dachary [Mon, 19 Aug 2013 16:56:56 +0000 (18:56 +0200)]
ErasureCode: abstract interface
The erasure coded pool relies on this abstract interface to encode and
decode the chunks stored in the OSD. It has been designed to be
generic enough to accomodate the libraries and algorithms that are
most likely to be used. It does not claim to be universal.
- In "includes", inttypes.h was cluttering the system's one. This caused
random build errors on some systems/in some conditions. Renaming it.
- Add emergency defs of PRI*64 headers when int_types.h does not define
them (which, unfortunately, can happen on some systems).
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
Sage Weil [Thu, 5 Sep 2013 04:29:11 +0000 (21:29 -0700)]
common/crc32c_intel_fast: avoid reading partial trailing word
The optimized intel code reads in word-sized chunks, knowing that the
allocator will only hand out memory in word-sized increments. This makes
valgrind unhappy. Whitelisting doesn't work because for some reason there
is no caller context (probably because of some interaction with yasm?).
Instead, just use the baseline code for the last few bytes. This should
not be significant.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
cleanup: passing context to NamedState for ceph_clock
This makes the constructor call on the subclasses explicit, and passes
the cct to the NamedState constructor. This cct is used by ceph_clock
to set enter_time.
Removes the last reference to g_ceph_context from libosd.
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
Roald van Loon [Wed, 28 Aug 2013 14:56:23 +0000 (16:56 +0200)]
cleanup: removing globals from common/obj_bencher
This file is in common/ but cant be included in libcommon.la because of
this reference. Removing it, making the binary calling it to pass the
correct cephcontext (rados, rest-bench).
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
Roald van Loon [Wed, 28 Aug 2013 10:11:08 +0000 (12:11 +0200)]
cleanup: removed last references to globals from client
- There were some refs in SyntheticClient to g_(ceph_context|conf), I
replaced them with client->cct and client->cct->_conf.
- There were some refs in fuse_ll to g_conf, also replaced them with
client->cct or cfuse->client->cct where applicable.
This makes everything in src/client completely independent from globals.
Signed-off-by: Roald J. van Loon <roaldvanloon@gmail.com>
Yehuda Sadeh [Fri, 23 Aug 2013 22:39:20 +0000 (15:39 -0700)]
rgw: flush pending data when completing multipart part upload
Fixes: #6111
Backport: dumpling
When completing the part upload we need to flush any data that we
aggregated and didn't flush yet. With earlier code didn't have to deal
with it as for multipart upload we didn't have any pending data.
What we do now is we call the regular atomic data completion
function that takes care of it.
When posting an object it is possible to provide a key
name that refers to the original filename, however we
need to verify that in the end we don't end up with an
empty object name.
Yehuda Sadeh [Thu, 22 Aug 2013 00:22:46 +0000 (17:22 -0700)]
rgw: OPTIONS request doesn't need to read object info
This is a bucket-only operation, so we shouldn't look at the
object. Object may not exist and we might respond with Not
Exists response which is not what we want.
Sage Weil [Wed, 28 Aug 2013 22:04:16 +0000 (15:04 -0700)]
osd: initial COPY_FROM (not viable for large objects)
Initial pass at COPY_FROM implementation. This uses COPY_GET to read an
object from another OSD and write it locally. It chunks the read but
accumulates it all in-memory and commits it at once, so it is only suitable
for smaller objects.
Sage Weil [Mon, 26 Aug 2013 23:24:16 +0000 (16:24 -0700)]
objecter, librados: add COPY_FROM operation
This operation will copy an entire object (data, attrs, omap)
atomically. If the src_version does not match the source object, or
the source object is updated while the copy is in progress, we will
fail with a suitable error code. By atomic we mean that it will either
successfully copy the entire object in its entirety or it will fail (and
require no cleanup).
Add to C++ librados API only for now.
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts: