]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 years agohobject: enforce max canonical value 9432/head
Samuel Just [Fri, 3 Jun 2016 00:38:05 +0000 (17:38 -0700)]
hobject: enforce max canonical value

Signed-off-by: Samuel Just <sjust@redhat.com>
9 years agosrc/: remove all direct comparisons to get_max()
Samuel Just [Fri, 3 Jun 2016 00:13:09 +0000 (17:13 -0700)]
src/: remove all direct comparisons to get_max()

get_max() now returns a special singleton type from which hobject_t's
can be assigned and constructed, but which cannot be directly compared.

This patch also cleans up all such uses to use is_max() instead.

This should prevent some issues like 16113 by preventing us from
checking for max-ness by comparing against a sentinel value.  The more
complete fix will be to make all fields of hobject_t private and enforce
a canonical max() representation that way.  That patch will be hard to
backport, however, so we'll settle for this for now.

Fixes: http://tracker.ceph.com/issues/16113
Signed-off-by: Samuel Just <sjust@redhat.com>
9 years agoPG::replica_scrub: don't adjust pool on max object
Samuel Just [Fri, 3 Jun 2016 00:39:09 +0000 (17:39 -0700)]
PG::replica_scrub: don't adjust pool on max object

Signed-off-by: Samuel Just <sjust@redhat.com>
9 years agohobject: compensate for non-canonical hobject_t::get_max() encodings
Samuel Just [Fri, 3 Jun 2016 00:36:21 +0000 (17:36 -0700)]
hobject: compensate for non-canonical hobject_t::get_max() encodings

This closes a loop-hole that could allow a non-canonical in memory
hobject_t::get_max() object which would return true for is_max(), but
false for *this == hobject_t::get_max().

Fixes: http://tracker.ceph.com/issues/16113
Signed-off-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #9084 from dzafman/wip-dz-misc
Samuel Just [Wed, 1 Jun 2016 21:16:32 +0000 (14:16 -0700)]
Merge pull request #9084 from dzafman/wip-dz-misc

Wip dz misc

Reviewed-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #9426 from linuxbox2/cmake-mds
Casey Bodley [Wed, 1 Jun 2016 18:07:59 +0000 (14:07 -0400)]
Merge pull request #9426 from linuxbox2/cmake-mds

cmake: change libmds back to a static library

Reviewed-by: Casey Bodley <cbodley@redhat.com>
9 years agocmake: restore static linkage (libmds) 9426/head
Matt Benjamin [Wed, 1 Jun 2016 17:25:19 +0000 (13:25 -0400)]
cmake: restore static linkage (libmds)

Required by ceph-mds.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
9 years agoMerge pull request #9385 from ceph/wip-cmake-kefu
Matt Benjamin [Wed, 1 Jun 2016 16:42:52 +0000 (12:42 -0400)]
Merge pull request #9385 from ceph/wip-cmake-kefu

cmake: more fixes

fixes make install workflow

9 years agoMerge pull request #9228 from liewegas/wip-bluestore-write
Sage Weil [Wed, 1 Jun 2016 16:29:49 +0000 (12:29 -0400)]
Merge pull request #9228 from liewegas/wip-bluestore-write

os/bluestore: new write path (checksums and compression)

Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoMerge pull request #9106 from SUSE/wip-15869
Nathan Cutler [Wed, 1 Jun 2016 16:14:38 +0000 (18:14 +0200)]
Merge pull request #9106 from SUSE/wip-15869

rpm: unconditionally set ceph user's primary group to ceph (SUSE)

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
9 years agocmake: install cython modules 9385/head
Kefu Chai [Wed, 1 Jun 2016 05:38:05 +0000 (13:38 +0800)]
cmake: install cython modules

* fix CYTHON_ADD_MODULE() macro. because python_add_module() offered by
  FindPythonLibs.cmake creates a target with name of ${name}, which conflicts
  with existing targets like "rbd" or "rados". so we can not reuse the
  name in ${name}.pyx. and instead, we should specify the target name
  explicitly.
* add distutils_install_cython_module() function to build and install
  cython modules.
* we can split build and install of cython module, but the install phase
  always tries to build the module. so keep it this way. will look at it
  later on.
* move the variables initializations into the Distutils.cmake module.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install compressor plugins into ${pkglibdir/compressor
Kefu Chai [Wed, 1 Jun 2016 03:25:11 +0000 (11:25 +0800)]
cmake: install compressor plugins into ${pkglibdir/compressor

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install erasure plugins into ${pkglibdir}/erasure-code
Kefu Chai [Wed, 1 Jun 2016 03:14:49 +0000 (11:14 +0800)]
cmake: install erasure plugins into ${pkglibdir}/erasure-code

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: remove duplicated file from ceph-osd
Kefu Chai [Sat, 28 May 2016 21:19:23 +0000 (05:19 +0800)]
cmake: remove duplicated file from ceph-osd

objclass/class_api.cc is already included in libosd

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: fix dependencies on tracing headers
Kefu Chai [Sat, 28 May 2016 16:23:22 +0000 (00:23 +0800)]
cmake: fix dependencies on tracing headers

group the header dependencies by its tp .so, so the traced target can
depend on them by the name of ${name}-tp.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: libradosstriper's OUTPUT_NAME should be radosstriper
Kefu Chai [Sat, 28 May 2016 16:22:31 +0000 (00:22 +0800)]
cmake: libradosstriper's OUTPUT_NAME should be radosstriper

and s/libradosstriper/radosstriper/ otherwise the created .so
filename would be liblibradosstriper.so with the default prefix.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install the artifacts the packaging requires
Kefu Chai [Sat, 28 May 2016 07:42:33 +0000 (15:42 +0800)]
cmake: install the artifacts the packaging requires

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install init script to etc/init.d
Kefu Chai [Sat, 28 May 2016 13:59:14 +0000 (21:59 +0800)]
cmake: install init script to etc/init.d

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: add the autoconf path variables back
Kefu Chai [Sat, 28 May 2016 09:09:37 +0000 (17:09 +0800)]
cmake: add the autoconf path variables back

* partially revert 7a602ec.
* the directory variables created by automake, like "prefix", "bindir",
  and "libdir", are used for generating configuration_file() for substitution,
  and they should have the same names with ones from autotools.
* also fix the ${pkglibdir}, it should be the ${libdir}/${PACKAGE}. so
  the plugins are not installed into ${libdir}, in which the installed shared
  objects are supposed to be shared with other applications.
* install shared libraries into ${CMAKE_INSTALL_LIBDIR} instead of
  ${prefix}/lib. this complies to what ceph.spec.in requires:
  ceph.spec.in expects the shared libraries to be installed into
  ${_libdir}, and ${_libdir} is /usr/lib64 on an amd64 machine.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install ceph-post-file
Kefu Chai [Sat, 28 May 2016 09:03:08 +0000 (17:03 +0800)]
cmake: install ceph-post-file

and related pubkey for sftp

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: add ceph-brag
Kefu Chai [Sat, 28 May 2016 08:53:36 +0000 (16:53 +0800)]
cmake: add ceph-brag

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: project name should be "ceph"
Kefu Chai [Sat, 28 May 2016 08:38:09 +0000 (16:38 +0800)]
cmake: project name should be "ceph"

so it would be easier to figure out paths, also
CMAKE_INSTALL_DOCDIR is composed using the PROJECT_NAME also.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: install ceph-{detect-init,disk}
Kefu Chai [Sat, 28 May 2016 07:44:36 +0000 (15:44 +0800)]
cmake: install ceph-{detect-init,disk}

add a cmake module named Distutils.cmake for setting up python modules
using setup.py.

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: add ceph_rgw_{json,multi}parser
Kefu Chai [Sat, 28 May 2016 08:18:54 +0000 (16:18 +0800)]
cmake: add ceph_rgw_{json,multi}parser

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: compile and install ceph-bluefs-tool
Kefu Chai [Sat, 28 May 2016 08:18:31 +0000 (16:18 +0800)]
cmake: compile and install ceph-bluefs-tool

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: s/ceph_test_xattr_bench/ceph_xattr_bench/
Kefu Chai [Sat, 28 May 2016 07:44:17 +0000 (15:44 +0800)]
cmake: s/ceph_test_xattr_bench/ceph_xattr_bench/

to match with automake and packager

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: rename ceph-psim to ceph_psim
Kefu Chai [Sat, 28 May 2016 07:43:34 +0000 (15:43 +0800)]
cmake: rename ceph-psim to ceph_psim

to match with automake and packager

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agocmake: fix bash_completion install path
Kefu Chai [Sat, 28 May 2016 05:20:35 +0000 (13:20 +0800)]
cmake: fix bash_completion install path

Signed-off-by: Kefu Chai <kchai@redhat.com>
9 years agoos/bluestore: fsck: check for dup overlay keys 9228/head
Sage Weil [Tue, 31 May 2016 19:26:14 +0000 (15:26 -0400)]
os/bluestore: fsck: check for dup overlay keys

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fsck: use common helper to verify blobs and refs
Sage Weil [Tue, 31 May 2016 19:17:51 +0000 (15:17 -0400)]
os/bluestore: fsck: use common helper to verify blobs and refs

The checks are the same (or should be--we had missed a few).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: add FLAG_NOCACHE flag; do not cache unbuffered writes
Sage Weil [Tue, 31 May 2016 18:18:31 +0000 (14:18 -0400)]
os/bluestore: add FLAG_NOCACHE flag; do not cache unbuffered writes

Add a Buffer flag to mark that a buffer should not be cached once it is
stable.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: do not use buffered bdev in write path
Sage Weil [Tue, 31 May 2016 17:49:34 +0000 (13:49 -0400)]
os/bluestore: do not use buffered bdev in write path

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: do not rely on bdev buffered reads in read path
Sage Weil [Tue, 31 May 2016 17:48:13 +0000 (13:48 -0400)]
os/bluestore: do not rely on bdev buffered reads in read path

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: no buffered write in wal path
Sage Weil [Tue, 31 May 2016 17:47:57 +0000 (13:47 -0400)]
os/bluestore: no buffered write in wal path

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: populate buffer cache on read
Sage Weil [Tue, 31 May 2016 17:43:25 +0000 (13:43 -0400)]
os/bluestore: populate buffer cache on read

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: keep intrusive_list of WRITING buffers
Sage Weil [Tue, 31 May 2016 17:35:10 +0000 (13:35 -0400)]
os/bluestore: keep intrusive_list of WRITING buffers

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: simple per-collection lru for buffers
Sage Weil [Tue, 31 May 2016 18:35:29 +0000 (14:35 -0400)]
os/bluestore: simple per-collection lru for buffers

Size these using a global config.  This is only a starting point--we'll
obviously have to rework this to share memory across collections.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: use bufferptr for csum_data
Sage Weil [Tue, 31 May 2016 16:40:16 +0000 (12:40 -0400)]
os/bluestore: use bufferptr for csum_data

encode/decode of vector<char> is not optimized.  Bufferptr is a more
natural type here anyway.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoceph_test_objecstore: Adds a test case for compression stuff verification (incomplete)
Igor Fedotov [Fri, 27 May 2016 17:05:21 +0000 (20:05 +0300)]
ceph_test_objecstore: Adds a test case for compression stuff verification (incomplete)

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Fixes configuration observation.
Igor Fedotov [Fri, 27 May 2016 17:04:56 +0000 (20:04 +0300)]
os/bluestore: Fixes configuration observation.

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Cosmetic fixes in bluestore logging
Igor Fedotov [Tue, 24 May 2016 13:41:48 +0000 (16:41 +0300)]
os/bluestore: Cosmetic fixes in bluestore logging

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Enables cow for cloning at bluestore for store test
Igor Fedotov [Tue, 24 May 2016 12:55:41 +0000 (15:55 +0300)]
os/bluestore: Enables cow for cloning at bluestore for store test

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Fixes Bnode serialization/deserialization and removes legacy Bnode...
Igor Fedotov [Tue, 24 May 2016 12:52:15 +0000 (15:52 +0300)]
os/bluestore: Fixes Bnode serialization/deserialization and removes legacy Bnode::ref_map

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoceph_test_objectstore: extends SimpleObjectTest with the case where write happens...
Igor Fedotov [Fri, 27 May 2016 14:19:59 +0000 (17:19 +0300)]
ceph_test_objectstore: extends SimpleObjectTest with the case where write happens for neighboring csum blocks to verify for potential alignment issue

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Removes legacy block_size retrieval
Igor Fedotov [Fri, 27 May 2016 12:46:02 +0000 (15:46 +0300)]
os/bluestore: Removes legacy block_size retrieval

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: use WriteContext and do_alloc_write for _do_write_small
Sage Weil [Thu, 26 May 2016 15:11:55 +0000 (11:11 -0400)]
os/bluestore: use WriteContext and do_alloc_write for _do_write_small

Kill some mostly-duplicated code

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: consolidate WriteContext items into a write_item
Sage Weil [Thu, 26 May 2016 15:10:31 +0000 (11:10 -0400)]
os/bluestore: consolidate WriteContext items into a write_item

Also include b_off in there.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: avoid unnecessary write_onode calls
Sage Weil [Thu, 26 May 2016 14:02:44 +0000 (10:02 -0400)]
os/bluestore: avoid unnecessary write_onode calls

_wctx_finish callers always write the onode; we only need to worry about
our changes to the bnode.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: drop unused _pad_* methods
Sage Weil [Tue, 31 May 2016 18:33:55 +0000 (14:33 -0400)]
os/bluestore: drop unused _pad_* methods

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: drop unused _pad_zeros args
Sage Weil [Tue, 31 May 2016 18:33:46 +0000 (14:33 -0400)]
os/bluestore: drop unused _pad_zeros args

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix offset skew check
Sage Weil [Thu, 26 May 2016 12:59:30 +0000 (08:59 -0400)]
os/bluestore: fix offset skew check

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: ~0x -> ~
Sage Weil [Mon, 23 May 2016 19:16:38 +0000 (15:16 -0400)]
os/bluestore: ~0x -> ~

e.g., 0x432da000~1000 instead of 0x432da000~0x1000

I think it's sufficiently clear that the value after ~ should have the same
base as the first bit, and it's easier to read.  And less text.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agocompressor: Extends decompressor interface to be able to provide compressed data...
Igor Fedotov [Tue, 24 May 2016 17:05:14 +0000 (20:05 +0300)]
compressor: Extends decompressor interface to be able to provide compressed data length.

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: compress on write
Sage Weil [Mon, 23 May 2016 19:04:46 +0000 (15:04 -0400)]
os/bluestore: compress on write

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: do not partially deallocate compressed blobs
Sage Weil [Mon, 23 May 2016 19:04:31 +0000 (15:04 -0400)]
os/bluestore: do not partially deallocate compressed blobs

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: _do_write_big: limit size of blobs based on compression mode
Sage Weil [Mon, 23 May 2016 19:04:11 +0000 (15:04 -0400)]
os/bluestore: _do_write_big: limit size of blobs based on compression mode

We may want to compress in smaller chunks based on hints/policy.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: track new compression config options
Sage Weil [Mon, 23 May 2016 19:03:18 +0000 (15:03 -0400)]
os/bluestore: track new compression config options

Class-wide Compressor, compression mode, and options.  For now these are
global, although later we'll do them per-Collection so they can be pool-
specific.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: add length to the compression_header_t
Sage Weil [Mon, 23 May 2016 19:01:40 +0000 (15:01 -0400)]
os/bluestore/bluestore_types: add length to the compression_header_t

Snappy fails to decompress if there are extra zeros in the input buffer.
So, store the length explicitly in the header to avoid feeding them into
the decompressor.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix BufferSpace::read()
Sage Weil [Mon, 23 May 2016 18:59:18 +0000 (14:59 -0400)]
os/bluestore: fix BufferSpace::read()

- we weren't reading from 'clean' buffers
- restructured loop a bit chasing another bug (but it ended up being
  in the caller)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agolibrados: add COMPRESSIBLE and INCOMPRESSIBLE alloc hints
Sage Weil [Fri, 20 May 2016 18:26:33 +0000 (14:26 -0400)]
librados: add COMPRESSIBLE and INCOMPRESSIBLE alloc hints

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agocompressor: add a get_type() method to Compressor interface
Sage Weil [Fri, 20 May 2016 18:18:52 +0000 (14:18 -0400)]
compressor: add a get_type() method to Compressor interface

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix _do_read cached vs read result assembly
Sage Weil [Mon, 23 May 2016 19:00:37 +0000 (15:00 -0400)]
os/bluestore: fix _do_read cached vs read result assembly

We weren't handling the case of

 read block 0~300
 cache bloc 100~100

where the result is read(head) + cached + read(tail). Restructure the
loop to handle this.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix _do_read read out of buffer cache
Sage Weil [Mon, 23 May 2016 18:59:36 +0000 (14:59 -0400)]
os/bluestore: fix _do_read read out of buffer cache

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix up _set_csum helper
Sage Weil [Fri, 20 May 2016 18:23:55 +0000 (14:23 -0400)]
os/bluestore: fix up _set_csum helper

- make it thread-safe
- call during mount

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/store_test: Fixes dump_mismatch_bl to avoid assert on lengths mismatch. Starts...
Igor Fedotov [Fri, 20 May 2016 14:59:37 +0000 (17:59 +0300)]
os/store_test: Fixes dump_mismatch_bl to avoid assert on lengths mismatch. Starts using it for BufferCacheTest

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: use bdev_block_size instead of min_alloc_size for allocators
Sage Weil [Fri, 20 May 2016 19:25:39 +0000 (15:25 -0400)]
os/bluestore: use bdev_block_size instead of min_alloc_size for allocators

min_alloc_size is more dynamic; we just need the block size unit here.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: min_alloc_size options for different media types
Ramesh Chander [Fri, 20 May 2016 17:05:15 +0000 (10:05 -0700)]
os/bluestore: min_alloc_size options for different media types

Signed-off-by: Ramesh Chander <Ramesh.Chander@sandisk.com>
9 years agoos/bluestore: Fixes duplicate blob move when cloning
Igor Fedotov [Fri, 20 May 2016 16:41:34 +0000 (19:41 +0300)]
os/bluestore: Fixes duplicate blob move when cloning

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: avoid passing overlapping allocated/released sets to fm
Sage Weil [Fri, 20 May 2016 14:30:43 +0000 (10:30 -0400)]
os/bluestore: avoid passing overlapping allocated/released sets to fm

BitmapFreelistManager doesn't like overlapping allocated+released sets
when the debug option is enabled, because it does a read to verify the
op is valid and that may not have been applied to the kv store yet.

This makes bluestore ObjectStore/StoreTest.SimpleCloneTest/2 pass with
bluestore_clone_cow = false and bluestore_freelist_type = bitmap.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/BitmapFreelistManager: drop newline on hex dumps
Sage Weil [Fri, 20 May 2016 14:29:04 +0000 (10:29 -0400)]
os/bluestore/BitmapFreelistManager: drop newline on hex dumps

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agobuffer: add no-newline hexdump option
Sage Weil [Fri, 20 May 2016 14:28:52 +0000 (10:28 -0400)]
buffer: add no-newline hexdump option

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/BitmapFreelistManager: use hex
Sage Weil [Fri, 20 May 2016 14:08:28 +0000 (10:08 -0400)]
os/bluestore/BitmapFreelistManager: use hex

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: drop warning
Sage Weil [Fri, 20 May 2016 13:59:05 +0000 (09:59 -0400)]
os/bluestore: drop warning

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoceph_test_objectstore: fix BufferCacheReadTest
Sage Weil [Fri, 20 May 2016 13:11:10 +0000 (09:11 -0400)]
ceph_test_objectstore: fix BufferCacheReadTest

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: _dump_onode crcs in hex
Sage Weil [Thu, 19 May 2016 20:29:19 +0000 (16:29 -0400)]
os/bluestore: _dump_onode crcs in hex

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: remove obsolete tail cache
Sage Weil [Thu, 19 May 2016 16:54:39 +0000 (12:54 -0400)]
os/bluestore: remove obsolete tail cache

The buffer cache will cover this in a much more general way.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: Fixes improper length calculation in BufferSpace::read + adds simplifie...
Igor Fedotov [Thu, 19 May 2016 14:08:41 +0000 (17:08 +0300)]
os/bluestore: Fixes improper length calculation in BufferSpace::read + adds simplified test case to highlight an issue for append to existing blob

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: drop min_alloc_size locals
Sage Weil [Thu, 19 May 2016 16:12:17 +0000 (12:12 -0400)]
os/bluestore: drop min_alloc_size locals

We have this in the class, now.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: fix min_alloc_size global
Sage Weil [Fri, 20 May 2016 13:09:35 +0000 (09:09 -0400)]
os/bluestore: fix min_alloc_size global

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: release partial extents
Sage Weil [Thu, 19 May 2016 16:00:08 +0000 (12:00 -0400)]
os/bluestore: release partial extents

Use the blob put_ref helper so that we can deallocate blobs partially
(instead of always waiting until they are completely unused).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: only write into a blob region that is allocated
Sage Weil [Thu, 19 May 2016 15:58:54 +0000 (11:58 -0400)]
os/bluestore: only write into a blob region that is allocated

We're only worried about direct writes and wal overwrites; the other write
paths are to freshly allocated blobs.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: blob_t: add tracking for released extents
Sage Weil [Thu, 19 May 2016 15:57:43 +0000 (11:57 -0400)]
os/bluestore/bluestore_types: blob_t: add tracking for released extents

We reference count which parts of the blob are used (by lextents), but
currently we only release our space back to the system when all references
go away.  That is a problem if the blob is large (say, 4MB), and we, say,
truncate off most (but not all) of it.

Unfortunately, we can't simply deallocate anything that doesn't have a
reference, because the logical refs are on byte boundaries, and allocation
happens in larger units (min_alloc_size).  A one byte logical punch_hole
might be responsible for the release of a larger block of storage.

To resolve this, we keep track of which portions of the blob have been
released by poisoning the offset in the extents vector.  We expect that
this vector will be almost always short, so we do not bother with a
indexed structure, since iterating a blob offset to determine if it is
still allocated is likely faster.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: add poison offset to pextent_t
Sage Weil [Thu, 19 May 2016 15:13:36 +0000 (11:13 -0400)]
os/bluestore/bluestore_types: add poison offset to pextent_t

This is a "magic" offset that we can use to indicate an invalid extent
(vs, say, an extent at offset 0 that might clobber real data if it were
used).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: remove dead _txc_release
Sage Weil [Thu, 19 May 2016 12:52:23 +0000 (08:52 -0400)]
os/bluestore: remove dead _txc_release

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: only direct write into unused blob space
Sage Weil [Thu, 19 May 2016 12:47:44 +0000 (08:47 -0400)]
os/bluestore: only direct write into unused blob space

We can only do a direct write into an already-allocated blob once, if that
range hasn't yet been used.  Once it has been used, it is much to complex
to keep track of when all references to it have committed to disk before
reusing it, so we don't try to handle that case at all.

Since the range has never been used, we can assert that there are no
references to it.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: mark used range on partial blob writes
Sage Weil [Thu, 19 May 2016 12:45:00 +0000 (08:45 -0400)]
os/bluestore: mark used range on partial blob writes

- writing into unreferenced blob space
- wal blob writes

both need to update the blob used map.  The full blob writes generates
blobs that are always full, so no change is needed there.  New partial
blob creations need to indicate which parts aren't yet used.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: add blob_t unused
Sage Weil [Thu, 19 May 2016 12:43:49 +0000 (08:43 -0400)]
os/bluestore/bluestore_types: add blob_t unused

Keep track of which ranges of this blob have *never* been used.  We do
this as a negative so that the common case of a fully-written blob is an
empty set.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agounittest_bluestore_types: benchmark different csum methods
Sage Weil [Thu, 19 May 2016 11:55:03 +0000 (07:55 -0400)]
unittest_bluestore_types: benchmark different csum methods

crc32c wins on my laptop.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agounittest_bluestore_types: run csum tests on all algorithms
Sage Weil [Thu, 19 May 2016 11:40:33 +0000 (07:40 -0400)]
unittest_bluestore_types: run csum tests on all algorithms

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: blob_t: add xxhash64
Sage Weil [Thu, 19 May 2016 11:40:13 +0000 (07:40 -0400)]
os/bluestore/bluestore_types: blob_t: add xxhash64

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agocommon/Checksummer: add xxhash64
Sage Weil [Thu, 19 May 2016 11:19:36 +0000 (07:19 -0400)]
common/Checksummer: add xxhash64

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: drop old Checksummer
Sage Weil [Thu, 19 May 2016 10:55:43 +0000 (06:55 -0400)]
os/bluestore: drop old Checksummer

blob_t uses it directly via the static methods.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: use blob_t csum methods
Sage Weil [Thu, 19 May 2016 10:53:29 +0000 (06:53 -0400)]
os/bluestore: use blob_t csum methods

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore/bluestore_types: simpler {calc,verify}_csum methods
Sage Weil [Thu, 19 May 2016 10:51:09 +0000 (06:51 -0400)]
os/bluestore/bluestore_types: simpler {calc,verify}_csum methods

This keeps the CSUM_* definitions local to blob_t, and avoids passing
arguments around.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: defer csum calcuations sometimes
Sage Weil [Thu, 19 May 2016 10:25:23 +0000 (06:25 -0400)]
os/bluestore: defer csum calcuations sometimes

When we are doing a partial chunk overwrite, we need to defer the csum_data
update.  Otherwise, another write in the same transaction might need to
read part of the chunk, not find the data in the buffer cache, read it
from disk, and fail the csum check.

This patch defers the calculation until after we've build the transaction
and are about to commit to the kv store.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agodoc/dev/bluestore: update based on Igor's feedback
Sage Weil [Thu, 19 May 2016 10:25:43 +0000 (06:25 -0400)]
doc/dev/bluestore: update based on Igor's feedback

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoos/bluestore: Fixes some issues when using Buffer Cache from _do_read and improves...
Igor Fedotov [Tue, 17 May 2016 15:22:15 +0000 (18:22 +0300)]
os/bluestore: Fixes some issues when using Buffer Cache from _do_read and improves test coverage

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoos/bluestore: Fixes invalid assert in Buffer::truncate
Igor Fedotov [Tue, 17 May 2016 14:58:11 +0000 (17:58 +0300)]
os/bluestore: Fixes invalid assert in  Buffer::truncate

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agotest/objectstore: Adds trivial test case to verify buffer cache use in bluestore
Igor Fedotov [Mon, 16 May 2016 17:55:55 +0000 (20:55 +0300)]
test/objectstore: Adds trivial test case to verify buffer cache use in bluestore

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>
9 years agoAdds cached buffer processing for _do_read
Igor Fedotov [Mon, 16 May 2016 16:53:37 +0000 (19:53 +0300)]
Adds cached buffer processing for _do_read

Signed-off-by: Igor Fedotov <ifedotov@mirantis.com>