Yehuda Sadeh [Fri, 27 Mar 2015 23:32:48 +0000 (16:32 -0700)]
rgw: generate new tag for object when setting object attrs
Fixes: #11256
Backport: firefly, hammer
Beforehand we were reusing the object's tag, which is problematic as
this tag is used for bucket index updates, and we might be clobbering a
racing update (like object removal).
Jason Dillaman [Mon, 15 Dec 2014 15:53:53 +0000 (10:53 -0500)]
librbd: complete all pending aio ops prior to closing image
It was possible for an image to be closed while aio operations
were still outstanding. Now all aio operations are tracked and
completed before the image is closed.
Fixes: #10299
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Yan, Zheng [Mon, 13 Oct 2014 03:34:18 +0000 (11:34 +0800)]
client: use finisher to abort MDS request
When a request is interrupted, libfuse first locks an internal mutex,
then calls the interrupt callback. libfuse need to lock the same mutex
when unregistering interrupt callback. We unregister interrupt callback
while client_lock is locked, so we can't acquiring the client_lock in
the interrupt callback.
This commit introduce two new types of setfilelock request. Unlike
setfilelock (UNLOCK) request, these two new types of setfilelock request
do not drop locks that have alread been acquired, they only interrupt
blocked setfilelock request.
Yan, Zheng [Thu, 9 Oct 2014 01:42:08 +0000 (09:42 +0800)]
client: register callback for fuse interrupt
libfuse allows program to reigster a callback for interrupt. When a file
system operation is interrupted, the fuse kernel driver sends interupt
request to libfuse. libfuse calls the interrupt callback when receiving
interrupt request.
Sage Weil [Fri, 16 Jan 2015 17:02:28 +0000 (09:02 -0800)]
crush/builder: fix warnings
crush/builder.c: In function 'crush_remove_list_bucket_item':
crush/builder.c:977:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (weight < bucket->h.weight)
^
crush/builder.c: In function 'crush_remove_tree_bucket_item':
crush/builder.c:1031:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (weight < bucket->h.weight)
^
Loic Dachary [Thu, 16 Oct 2014 00:02:58 +0000 (17:02 -0700)]
crush: improve constness of CrushWrapper methods
A number of CrushWrapper get methods or predicates were not const
because they need to maintain transparently the rmaps. Make the rmaps
mutable and update the constness of the methods to match what the caller
would expect.
Currently in CrushWrapper, the member "struct crush_map *crush" is a public member,
so people can break the encapsulation and manipulate directly to the crush structure.
This is not a good practice for encapsulation and will lead to inconsistent if code
mix use the CrushWrapper API and crush C API.A simple example could be:
1.some code use crush_add_rule(C-API) to add a rule, which will not set the have_rmap flag to false in CrushWrapper
2.another code using CrushWrapper trying to look up the newly added rule by name will get a -ENOENT.
This patch move CrushWrapper::crush to private, together with three reverse map(type_rmap, name_rmap, rule_name_rmap)
and also change codes accessing the CrushWrapper::crush to make it compile.
Sage Weil [Fri, 5 Dec 2014 23:55:24 +0000 (15:55 -0800)]
crush: set straw_calc_version=1 for default+optimal; do not touch for presets
When using the presets for compatibility (i.e., based on version), do not
touch the straw behavior, as it does not affect mapping or compatibility.
However, make a point of setting it by default and for optimal.
For most users, this means that they will not see any change unless they
explicitly enable the new behavior, or switch to default or optimal
tunables. The idea is that if they touched it, they shouldn't be
too surprised by the subsequent data movement.
Sage Weil [Wed, 3 Dec 2014 00:33:11 +0000 (16:33 -0800)]
crush: fix crush_calc_straw() scalers when there are duplicate weights
The straw bucket was originally tested with uniform weights and with a
few more complicated patterns, like a stair step (1,2,3,4,5,6,7,8,9). And
it worked!
However, it does not behave with a pattern like
1, 2, 2, 3, 3, 4, 4
Strangely, it does behave with
1, 1, 2, 2, 3, 3, 4, 4
and more usefully it does behave with
1, 2, 2.001, 3, 3.001, 4, 4.001
That is, the logic that explicitly copes with weights that are duplicates
is broken.
The fix is to simply remove the special handling for duplicate weights --
it isn't necessary and doesn't work correctly anyway.
Add a test that compares the mapping result of [1, 2, 2, 3, 3, ...] with
[1, 2, 2.001, 3, 3.001, ...] and verifies that the difference is small.
With the fix, we get .00012, whereas the original implementation gets
.015.
Note that this changes the straw bucket scalar *precalculated* values that
are encoded with the map, and only when the admin opts into the new behavior.
Sage Weil [Tue, 2 Dec 2014 22:50:21 +0000 (14:50 -0800)]
crush: fix distortion of straw scalers by 0-weight items
The presence of a 0-weight item in a straw bucket should have no effect
on the placement of other items. Add a test validating that and fix
crush_calc_straw() to fix the distortion.
Note that this effects the *precalculation* of the straw bucket inputs and
does not effect the actually mapping process given a compiled or encoded
CRUSH map, and only when straw_calc_version == 1 (i.e., the admin opted in
to the new behavior).
Rongze Zhu [Fri, 10 Oct 2014 11:18:00 +0000 (19:18 +0800)]
crush: fix incorrect use of adjust_item_weight method
adjust_item_weight method will adjust all buckets which the item
inside. If the osd.0 in host=fake01 and host=fake02, we execute
"ceph osd crush osd.0 10 host=fake01", it not only will adjust fake01's
weight, but also will adjust fake02's weight.
the patch add adjust_item_weightf_in_loc method and fix remove_item,
_remove_item_under, update_item, insert_item, detach_bucket methods.
Sage Weil [Thu, 13 Nov 2014 18:59:22 +0000 (10:59 -0800)]
crush/CrushWrapper: fix detach_bucket
In commit 9850227d2f0ca2f692a154de2c14a0a08e751f08 we changed the call that
changed the weight of all instances of item to one that explicitly
changes it in the parent bucket, but parent_id may not be valid at the
call site. Move this into the conditional block to fix.
Greg Farnum [Fri, 6 Feb 2015 05:12:17 +0000 (21:12 -0800)]
fsync-tester: print info about PATH and locations of lsof lookup
We're seeing the lsof invocation fail (as not found) in testing and nobody can
identify why. Since attempting to reproduce the issue has not worked, this
patch will gather data from a genuinely in-vitro location.
Sage Weil [Tue, 2 Dec 2014 18:08:18 +0000 (10:08 -0800)]
crush: recalculate straw scalers during a reweight
The crushtool --reweight function triggers a fresh calculation of bucket
weights so that they are always the sum of the item weights. In the
straw bucket case, the weights were updated but the corresponding straw
scalers were not being recalculated. The result is that there was not
effect on placement in adjusted buckets until the next time a bucket item's
weight was adjusted.
Rongze Zhu [Mon, 10 Nov 2014 16:13:42 +0000 (00:13 +0800)]
crush: fix tree bucket functions
There are incorrect nodes' weight in tree bucket when construct tree
bucket. The tree bucket don't store item id in items array, so the tree
bucket will not work correctly. The patch fix above bugs and add a
simple test for tree bucket.
Sage Weil [Tue, 16 Dec 2014 01:04:32 +0000 (17:04 -0800)]
osd: handle no-op write with snapshot case
If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:
0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Fix is to not recreated the snapdir if it already exists.
Jason Dillaman [Mon, 27 Oct 2014 18:47:19 +0000 (14:47 -0400)]
osdc: Constrain max number of in-flight read requests
Constrain the number of in-flight RADOS read requests to the
cache size. This reduces the chance of the cache memory
ballooning during certain scenarios like copy-up which can
invoke many concurrent read requests.
Fixes: #9854
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 068d68850d09dfcaccc5a3ce85c80b2f6d808ea9)
Jason Dillaman [Mon, 19 Jan 2015 15:28:56 +0000 (10:28 -0500)]
librbd: gracefully handle deleted/renamed pools
snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.
Josh Durgin [Wed, 14 Jan 2015 23:01:38 +0000 (15:01 -0800)]
qa: ignore duplicates in rados ls
These can happen with split or with state changes due to reordering
results within the hash range requested. It's easy enough to filter
them out at this stage.
Yehuda Sadeh [Sat, 13 Dec 2014 01:07:30 +0000 (17:07 -0800)]
rgw: use s->bucket_attrs instead of trying to read obj attrs
Fixes: #10307
Backport: firefly, giant
This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.
Yehuda Sadeh [Wed, 5 Nov 2014 21:40:55 +0000 (13:40 -0800)]
rgw: remove swift user manifest (DLO) hash calculation
Fixes: #9973
Backport: firefly, giant
Previously we were iterating through the parts, creating hash of the
parts etags (as S3 does for multipart uploads). However, swift just
calculates the etag for the empty manifest object.
Lei Dong [Mon, 27 Oct 2014 02:29:48 +0000 (10:29 +0800)]
fix can not disable max_size quota
Currently if we enable quota and set max_size = -1, it doesn’t
mean max_size is unlimited as expected. Instead, it means object
with any size is not allowed to upload because of “QuotaExceeded”.
The root cause is the function rgw_rounded_kb which convert max_size
to max_size_kb returns 0 for -1 because it takes an unsigned int
but we pass an int to it. A simple fix is check max_size before
it’s rounded to max_size_kb.
Test case:
1 enable and set quota:
radosgw-admin quota enable --uid={user_id} --quota-scope=user
radosgw-admin quota set --quota-scope=user --uid={user_id}\
--max-objects=100 --max-size=-1
2 upload any object with non-zero length
it will return 403 with “QuotaExceeded” and return 200 if you apply the fix.
Fixes: #5595
Backport: dumpling, firefly
We need to update the bucket index when updating object attrs, otherwise
we're missing meta changes that need to be registered. It also
solves issue of bucket index not knowing about object acl changes,
although this one still requires some more work.
Yehuda Sadeh [Wed, 5 Nov 2014 21:28:02 +0000 (13:28 -0800)]
rgw: send back ETag on S3 object copy
Fixes: #9479
Backport: firefly, giant
We didn't send the etag back correctly. Original code assumed the etag
resided in the attrs, but attrs only contained request attrs.
Yehuda Sadeh [Wed, 7 Jan 2015 21:56:14 +0000 (13:56 -0800)]
rgw: index swift keys appropriately
Fixes: #10471
Backport: firefly, giant
We need to index the swift keys by the full uid:subuser when decoding
the json representation, to keep it in line with how we store it when
creating it through other mechanism.
Yehuda Sadeh [Thu, 20 Nov 2014 18:36:05 +0000 (10:36 -0800)]
rgw-admin: create subuser if needed when creating user
Fixes: #10103
Backport: firefly, giant
This turned up after fixing #9973. Earlier we also didn't create the
subuser in this case, but we didn't really read the subuser info when it
was authenticating. Now we do that as required, so we end up failing the
authentication. This only applies to cases where a subuser was created
using 'user create', and not the 'subuser create' command.
Sage Weil [Thu, 8 Jan 2015 19:17:03 +0000 (11:17 -0800)]
osd: requeue PG when we skip handling a peering event
If we don't handle the event, we need to put the PG back into the peering
queue or else the event won't get processed until the next event is
queued, at which point we'll be processing events with a delay.
The queue_null is not necessary (and is a waste of effort) because the
event is still in pg->peering_queue and the PG is queued.
Loic Dachary [Fri, 9 Jan 2015 00:32:17 +0000 (01:32 +0100)]
Merge pull request #3127 from ktdreyer/firefly-no-epoch
Revert "ceph.spec.: add epoch"
Reviewed-by: Ken Dreyer <kdreyer@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Loic Dachary <ldachary@redhat.com>
Loic Dachary [Fri, 9 Jan 2015 00:28:11 +0000 (01:28 +0100)]
Merge pull request #3220 from ceph/wip-mon-backports.firefly
mon: backports for #9987 against firefly
Reviewed-by: Joao Eduardo Luis <joao@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Loic Dachary <ldachary@redhat.com>