Samuel Just [Fri, 3 Jun 2016 00:13:09 +0000 (17:13 -0700)]
src/: remove all direct comparisons to get_max()
get_max() now returns a special singleton type from which hobject_t's
can be assigned and constructed, but which cannot be directly compared.
This patch also cleans up all such uses to use is_max() instead.
This should prevent some issues like 16113 by preventing us from
checking for max-ness by comparing against a sentinel value. The more
complete fix will be to make all fields of hobject_t private and enforce
a canonical max() representation that way. That patch will be hard to
backport, however, so we'll settle for this for now.
Fixes: http://tracker.ceph.com/issues/16113 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Fri, 3 Jun 2016 00:36:21 +0000 (17:36 -0700)]
hobject: compensate for non-canonical hobject_t::get_max() encodings
This closes a loop-hole that could allow a non-canonical in memory
hobject_t::get_max() object which would return true for is_max(), but
false for *this == hobject_t::get_max().
Fixes: http://tracker.ceph.com/issues/16113 Signed-off-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Wed, 1 Jun 2016 05:38:05 +0000 (13:38 +0800)]
cmake: install cython modules
* fix CYTHON_ADD_MODULE() macro. because python_add_module() offered by
FindPythonLibs.cmake creates a target with name of ${name}, which conflicts
with existing targets like "rbd" or "rados". so we can not reuse the
name in ${name}.pyx. and instead, we should specify the target name
explicitly.
* add distutils_install_cython_module() function to build and install
cython modules.
* we can split build and install of cython module, but the install phase
always tries to build the module. so keep it this way. will look at it
later on.
* move the variables initializations into the Distutils.cmake module.
Kefu Chai [Sat, 28 May 2016 09:09:37 +0000 (17:09 +0800)]
cmake: add the autoconf path variables back
* partially revert 7a602ec.
* the directory variables created by automake, like "prefix", "bindir",
and "libdir", are used for generating configuration_file() for substitution,
and they should have the same names with ones from autotools.
* also fix the ${pkglibdir}, it should be the ${libdir}/${PACKAGE}. so
the plugins are not installed into ${libdir}, in which the installed shared
objects are supposed to be shared with other applications.
* install shared libraries into ${CMAKE_INSTALL_LIBDIR} instead of
${prefix}/lib. this complies to what ceph.spec.in requires:
ceph.spec.in expects the shared libraries to be installed into
${_libdir}, and ${_libdir} is /usr/lib64 on an amd64 machine.
Sage Weil [Mon, 23 May 2016 19:03:18 +0000 (15:03 -0400)]
os/bluestore: track new compression config options
Class-wide Compressor, compression mode, and options. For now these are
global, although later we'll do them per-Collection so they can be pool-
specific.
Sage Weil [Mon, 23 May 2016 19:01:40 +0000 (15:01 -0400)]
os/bluestore/bluestore_types: add length to the compression_header_t
Snappy fails to decompress if there are extra zeros in the input buffer.
So, store the length explicitly in the header to avoid feeding them into
the decompressor.
Sage Weil [Fri, 20 May 2016 14:30:43 +0000 (10:30 -0400)]
os/bluestore: avoid passing overlapping allocated/released sets to fm
BitmapFreelistManager doesn't like overlapping allocated+released sets
when the debug option is enabled, because it does a read to verify the
op is valid and that may not have been applied to the kv store yet.
This makes bluestore ObjectStore/StoreTest.SimpleCloneTest/2 pass with
bluestore_clone_cow = false and bluestore_freelist_type = bitmap.
Sage Weil [Thu, 19 May 2016 15:57:43 +0000 (11:57 -0400)]
os/bluestore/bluestore_types: blob_t: add tracking for released extents
We reference count which parts of the blob are used (by lextents), but
currently we only release our space back to the system when all references
go away. That is a problem if the blob is large (say, 4MB), and we, say,
truncate off most (but not all) of it.
Unfortunately, we can't simply deallocate anything that doesn't have a
reference, because the logical refs are on byte boundaries, and allocation
happens in larger units (min_alloc_size). A one byte logical punch_hole
might be responsible for the release of a larger block of storage.
To resolve this, we keep track of which portions of the blob have been
released by poisoning the offset in the extents vector. We expect that
this vector will be almost always short, so we do not bother with a
indexed structure, since iterating a blob offset to determine if it is
still allocated is likely faster.
Sage Weil [Thu, 19 May 2016 12:47:44 +0000 (08:47 -0400)]
os/bluestore: only direct write into unused blob space
We can only do a direct write into an already-allocated blob once, if that
range hasn't yet been used. Once it has been used, it is much to complex
to keep track of when all references to it have committed to disk before
reusing it, so we don't try to handle that case at all.
Since the range has never been used, we can assert that there are no
references to it.
Sage Weil [Thu, 19 May 2016 12:45:00 +0000 (08:45 -0400)]
os/bluestore: mark used range on partial blob writes
- writing into unreferenced blob space
- wal blob writes
both need to update the blob used map. The full blob writes generates
blobs that are always full, so no change is needed there. New partial
blob creations need to indicate which parts aren't yet used.
Sage Weil [Thu, 19 May 2016 12:43:49 +0000 (08:43 -0400)]
os/bluestore/bluestore_types: add blob_t unused
Keep track of which ranges of this blob have *never* been used. We do
this as a negative so that the common case of a fully-written blob is an
empty set.
Sage Weil [Thu, 19 May 2016 10:25:23 +0000 (06:25 -0400)]
os/bluestore: defer csum calcuations sometimes
When we are doing a partial chunk overwrite, we need to defer the csum_data
update. Otherwise, another write in the same transaction might need to
read part of the chunk, not find the data in the buffer cache, read it
from disk, and fail the csum check.
This patch defers the calculation until after we've build the transaction
and are about to commit to the kv store.