David Zafman [Tue, 18 Nov 2014 21:00:15 +0000 (13:00 -0800)]
ceph_objectstore_tool: Add feature called set-allow-sharded-objects
Uses --op set-allow-sharded-objects option
This operation will be rejected if on the target OSD's osdmap there is
at least one OSD which does not support ERASURE CODES.
Prompt the user that they could import if sharded state allowed
Prompt the user to use new feature if sharded state found inconsistent
Loic Dachary [Thu, 13 Nov 2014 16:32:14 +0000 (17:32 +0100)]
tests: ceph_objectstore_tool.py replace stop.sh with init-ceph
The stop.sh will stop all ceph-* processes. Use the init-ceph script
instead to selectively kill the daemons run by the vstart.sh cluster
used for ceph_objectstore_tool.
Loic Dachary [Thu, 13 Nov 2014 16:27:01 +0000 (17:27 +0100)]
tests: ceph_objectstore_tool.py run faster by default
By default use only a small number of objects to speed up the tests. If
the argument "big" is given, use a large number of objects as it may
help find some problems.
Loic Dachary [Thu, 13 Nov 2014 16:21:48 +0000 (17:21 +0100)]
tests: ceph_objectstore_tool.py run mon and osd on specific port
By default vstart.sh runs MDS but they are not needed for the tests,
only run mon and osd instead. Instead of using the default vstart.sh
port which may conflict with a already running vstart.sh, set the
CEPH_PORT=7400 which is not used by any other test run with make check.
David Zafman [Wed, 12 Nov 2014 23:22:04 +0000 (15:22 -0800)]
ceph_objectstore_tool: Fixes to make import work again
The is_pg() call is now true even for pgs pending removal, fix broken
finish_remove_pgs() by removing is_pg() check.
Need to add create_collection() to the initial transaction on import
Fixes: #10090 Signed-off-by: David Zafman <dzafman@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 5ce09198bf475e5c3a2df26232fa04ba9912b103)
David Zafman [Wed, 20 Aug 2014 08:33:45 +0000 (01:33 -0700)]
ceph_objectstore_tool: Bug fixes and test improvements
ceph_objectgstore_tool:
Fix bugs in the way collection_list_partial() was being called
which caused objects to be seen over and over again.
Unit test:
Fix get_objs() to walk pg tree for pg with sub-directories
Create more objects to test object listing code
Limit number of larger objects
Limit number of objects which get attributes and omaps
David Zafman [Wed, 30 Jul 2014 19:39:49 +0000 (12:39 -0700)]
Complete replacement of ceph_filestore_tool and ceph_filestore_dump
with unified ceph_objectstore_tool
Move list-lost-objects and fix-lost-objects features from
ceph_filestore_tool to ceph_objectstore_tool as list-lost, fix-lost
Change --type to --op for info, log, export...operations
Add --type for the ObjectStore type (defaults to filestore)
Change --filestore-path to --data-path
Update installation, Makefile.am, and .gitignore
Fix and rename test case to match
Add some additional invalid option checks
David Zafman [Wed, 14 May 2014 19:42:21 +0000 (12:42 -0700)]
common,ceph_filestore_dump: Add ability for utilities to suppress library dout output
Suppress dout output with CODE_ENVIRONMENT_UTILITY_NODOUT
ceph_filestore_dump turns on dout output if --debug specified
When used it can still be enable with --log-to-stderr --err-to-stderr
Jason Dillaman [Mon, 15 Dec 2014 15:53:53 +0000 (10:53 -0500)]
librbd: complete all pending aio ops prior to closing image
It was possible for an image to be closed while aio operations
were still outstanding. Now all aio operations are tracked and
completed before the image is closed.
Fixes: #10299
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Yan, Zheng [Mon, 13 Oct 2014 03:34:18 +0000 (11:34 +0800)]
client: use finisher to abort MDS request
When a request is interrupted, libfuse first locks an internal mutex,
then calls the interrupt callback. libfuse need to lock the same mutex
when unregistering interrupt callback. We unregister interrupt callback
while client_lock is locked, so we can't acquiring the client_lock in
the interrupt callback.
This commit introduce two new types of setfilelock request. Unlike
setfilelock (UNLOCK) request, these two new types of setfilelock request
do not drop locks that have alread been acquired, they only interrupt
blocked setfilelock request.
Yan, Zheng [Thu, 9 Oct 2014 01:42:08 +0000 (09:42 +0800)]
client: register callback for fuse interrupt
libfuse allows program to reigster a callback for interrupt. When a file
system operation is interrupted, the fuse kernel driver sends interupt
request to libfuse. libfuse calls the interrupt callback when receiving
interrupt request.
Sage Weil [Fri, 16 Jan 2015 17:02:28 +0000 (09:02 -0800)]
crush/builder: fix warnings
crush/builder.c: In function 'crush_remove_list_bucket_item':
crush/builder.c:977:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (weight < bucket->h.weight)
^
crush/builder.c: In function 'crush_remove_tree_bucket_item':
crush/builder.c:1031:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (weight < bucket->h.weight)
^
Loic Dachary [Thu, 16 Oct 2014 00:02:58 +0000 (17:02 -0700)]
crush: improve constness of CrushWrapper methods
A number of CrushWrapper get methods or predicates were not const
because they need to maintain transparently the rmaps. Make the rmaps
mutable and update the constness of the methods to match what the caller
would expect.
Currently in CrushWrapper, the member "struct crush_map *crush" is a public member,
so people can break the encapsulation and manipulate directly to the crush structure.
This is not a good practice for encapsulation and will lead to inconsistent if code
mix use the CrushWrapper API and crush C API.A simple example could be:
1.some code use crush_add_rule(C-API) to add a rule, which will not set the have_rmap flag to false in CrushWrapper
2.another code using CrushWrapper trying to look up the newly added rule by name will get a -ENOENT.
This patch move CrushWrapper::crush to private, together with three reverse map(type_rmap, name_rmap, rule_name_rmap)
and also change codes accessing the CrushWrapper::crush to make it compile.
Sage Weil [Fri, 5 Dec 2014 23:55:24 +0000 (15:55 -0800)]
crush: set straw_calc_version=1 for default+optimal; do not touch for presets
When using the presets for compatibility (i.e., based on version), do not
touch the straw behavior, as it does not affect mapping or compatibility.
However, make a point of setting it by default and for optimal.
For most users, this means that they will not see any change unless they
explicitly enable the new behavior, or switch to default or optimal
tunables. The idea is that if they touched it, they shouldn't be
too surprised by the subsequent data movement.
Sage Weil [Wed, 3 Dec 2014 00:33:11 +0000 (16:33 -0800)]
crush: fix crush_calc_straw() scalers when there are duplicate weights
The straw bucket was originally tested with uniform weights and with a
few more complicated patterns, like a stair step (1,2,3,4,5,6,7,8,9). And
it worked!
However, it does not behave with a pattern like
1, 2, 2, 3, 3, 4, 4
Strangely, it does behave with
1, 1, 2, 2, 3, 3, 4, 4
and more usefully it does behave with
1, 2, 2.001, 3, 3.001, 4, 4.001
That is, the logic that explicitly copes with weights that are duplicates
is broken.
The fix is to simply remove the special handling for duplicate weights --
it isn't necessary and doesn't work correctly anyway.
Add a test that compares the mapping result of [1, 2, 2, 3, 3, ...] with
[1, 2, 2.001, 3, 3.001, ...] and verifies that the difference is small.
With the fix, we get .00012, whereas the original implementation gets
.015.
Note that this changes the straw bucket scalar *precalculated* values that
are encoded with the map, and only when the admin opts into the new behavior.
Sage Weil [Tue, 2 Dec 2014 22:50:21 +0000 (14:50 -0800)]
crush: fix distortion of straw scalers by 0-weight items
The presence of a 0-weight item in a straw bucket should have no effect
on the placement of other items. Add a test validating that and fix
crush_calc_straw() to fix the distortion.
Note that this effects the *precalculation* of the straw bucket inputs and
does not effect the actually mapping process given a compiled or encoded
CRUSH map, and only when straw_calc_version == 1 (i.e., the admin opted in
to the new behavior).