Sage Weil [Mon, 12 Jan 2015 01:28:04 +0000 (17:28 -0800)]
osd: requeue blocked op before flush it was blocked on
If we have request A (say, cache-flush) that blocks things, and then
request B that gets blocked on it, and we have an interval change, then we
need to requeue B first, then A, so that the resulting queue will keep
A before B and preserve the order.
Loic Dachary [Wed, 17 Dec 2014 15:06:55 +0000 (16:06 +0100)]
crush: set_choose_tries = 100 for erasure code rulesets
It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph
cluster. The default tries (50) will frequently lead to bad mappings for
this use case. Changing it to 100 makes no significant CPU performance
difference, as tested manually by running crushtool on one million
mappings.
Samuel Just [Fri, 5 Dec 2014 23:29:52 +0000 (15:29 -0800)]
osd_types: op_queue_age_hist and fs_perf_stat should be in osd_stat_t::operator==
Fixes: 10259
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 1ac17c0a662e6079c2c57edde2b4dc947f547f57)
mon: PGMonitor: available size 0 if no osds on pool's ruleset
get_rule_avail() may return < 0, which we were using blindly assuming it
would always return an unsigned value. We would end up with weird
values if the ruleset had no osds.
Sage Weil [Tue, 2 Dec 2014 02:15:59 +0000 (18:15 -0800)]
osd: tolerate sessionless con in fast dispatch path
We can now get a session cleared from a Connection at any time. Change
the assert to an if in ms_fast_dispatch to cope. It's pretty rare, but it
can happen, especially with delay injection. In particular, a racing
thread can call mark_down() on us.
Greg Farnum [Fri, 6 Feb 2015 05:12:17 +0000 (21:12 -0800)]
fsync-tester: print info about PATH and locations of lsof lookup
We're seeing the lsof invocation fail (as not found) in testing and nobody can
identify why. Since attempting to reproduce the issue has not worked, this
patch will gather data from a genuinely in-vitro location.
Jason Dillaman [Mon, 27 Oct 2014 18:47:19 +0000 (14:47 -0400)]
osdc: Constrain max number of in-flight read requests
Constrain the number of in-flight RADOS read requests to the
cache size. This reduces the chance of the cache memory
ballooning during certain scenarios like copy-up which can
invoke many concurrent read requests.
Fixes: #9854
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This commit ensures that we check for timestamp of s3 request is within
acceptable grace time of radosgw
Addresses some failures in #10062 Fixes: #10062 Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
(cherry picked from commit 4b35ae067fef9f97b886afe112d662c61c564365)
Lei Dong [Mon, 27 Oct 2014 02:29:48 +0000 (10:29 +0800)]
fix can not disable max_size quota
Currently if we enable quota and set max_size = -1, it doesn’t
mean max_size is unlimited as expected. Instead, it means object
with any size is not allowed to upload because of “QuotaExceeded”.
The root cause is the function rgw_rounded_kb which convert max_size
to max_size_kb returns 0 for -1 because it takes an unsigned int
but we pass an int to it. A simple fix is check max_size before
it’s rounded to max_size_kb.
Test case:
1 enable and set quota:
radosgw-admin quota enable --uid={user_id} --quota-scope=user
radosgw-admin quota set --quota-scope=user --uid={user_id}\
--max-objects=100 --max-size=-1
2 upload any object with non-zero length
it will return 403 with “QuotaExceeded” and return 200 if you apply the fix.
Fixes: #5595
Backport: dumpling, firefly
We need to update the bucket index when updating object attrs, otherwise
we're missing meta changes that need to be registered. It also
solves issue of bucket index not knowing about object acl changes,
although this one still requires some more work.
Yehuda Sadeh [Wed, 5 Nov 2014 21:28:02 +0000 (13:28 -0800)]
rgw: send back ETag on S3 object copy
Fixes: #9479
Backport: firefly, giant
We didn't send the etag back correctly. Original code assumed the etag
resided in the attrs, but attrs only contained request attrs.
Yehuda Sadeh [Wed, 5 Nov 2014 21:40:55 +0000 (13:40 -0800)]
rgw: remove swift user manifest (DLO) hash calculation
Fixes: #9973
Backport: firefly, giant
Previously we were iterating through the parts, creating hash of the
parts etags (as S3 does for multipart uploads). However, swift just
calculates the etag for the empty manifest object.
Yehuda Sadeh [Thu, 20 Nov 2014 18:36:05 +0000 (10:36 -0800)]
rgw-admin: create subuser if needed when creating user
Fixes: #10103
Backport: firefly, giant
This turned up after fixing #9973. Earlier we also didn't create the
subuser in this case, but we didn't really read the subuser info when it
was authenticating. Now we do that as required, so we end up failing the
authentication. This only applies to cases where a subuser was created
using 'user create', and not the 'subuser create' command.
Yehuda Sadeh [Sat, 13 Dec 2014 01:07:30 +0000 (17:07 -0800)]
rgw: use s->bucket_attrs instead of trying to read obj attrs
Fixes: #10307
Backport: firefly, giant
This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.
Yehuda Sadeh [Sat, 13 Dec 2014 01:07:30 +0000 (17:07 -0800)]
rgw: use s->bucket_attrs instead of trying to read obj attrs
Fixes: #10307
Backport: firefly, giant
This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.
Petr Machata [Thu, 29 Jan 2015 17:15:02 +0000 (10:15 -0700)]
support Boost 1.57.0
Sometime after 1.55, boost introduced a forward declaration of
operator<< in optional.hpp. In 1.55 and earlier, when << was used
without the _io having been included, what got dumped was an implicit
bool conversion.
http://tracker.ceph.com/issues/10688 Refs: #10688 Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 85717394c33137eb703a7b88608ec9cf3287f67a)
Sage Weil [Tue, 16 Dec 2014 01:04:32 +0000 (17:04 -0800)]
osd: handle no-op write with snapshot case
If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:
0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Fix is to not recreated the snapdir if it already exists.
John Spray [Wed, 14 Jan 2015 10:35:53 +0000 (10:35 +0000)]
mds: handle heartbeat_reset during shutdown
Because any thread might grab mds_lock and call heartbeat_reset
immediately after a call to suicide() completes, this needs
to be handled as a special case where we tolerate MDS::hb having
already been destroyed.
Fixes: #10382 Signed-off-by: John Spray <john.spray@redhat.com>
Jason Dillaman [Mon, 15 Dec 2014 15:53:53 +0000 (10:53 -0500)]
librbd: complete all pending aio ops prior to closing image
It was possible for an image to be closed while aio operations
were still outstanding. Now all aio operations are tracked and
completed before the image is closed.
Fixes: #10299
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 19 Jan 2015 15:28:56 +0000 (10:28 -0500)]
librbd: gracefully handle deleted/renamed pools
snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.
Fixes: #10270
Backport: giant, firefly Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Josh Durgin [Wed, 14 Jan 2015 23:01:38 +0000 (15:01 -0800)]
qa: ignore duplicates in rados ls
These can happen with split or with state changes due to reordering
results within the hash range requested. It's easy enough to filter
them out at this stage.
Sage Weil [Tue, 14 Oct 2014 19:41:48 +0000 (12:41 -0700)]
msg/simple: do not stop_and_wait on mark_down
We originally blocked in mark_down for fast dispatch threads
to complete to avoid various races in the code. Most of these
were in the OSD itself, where we were not prepared to get
messges on connections that had no attached session. Since
then, the OSD checks have been cleaned up to handle this.
There were other races we were worried about too, but the
details have been lost in the depths of time.
Instead, take the other route: make mark_down never block on
dispatch. This lets us remove the special case that
was added in order to cope with fast dispatch calling
mark_down on itself.
Now, the only stop_and_wait() user is the shutdown sequence.
Sage Weil [Fri, 31 Oct 2014 23:25:09 +0000 (16:25 -0700)]
msg/Pipe: inject delay in stop_and_wait
Inject a delay in stop_and_wait. This will mostly affect the connection
race Pipe takeover code which currently calls stop_and_wait while holding
the msgr->lock. This should make it easier for a racing fast_dispatch
method to get stuck on a call that (indirectly) needs the msgr lock.
See #9921.
Greg Farnum [Tue, 28 Oct 2014 23:45:43 +0000 (16:45 -0700)]
SimpleMessenger: Pipe: do not block on takeover while holding global lock
We previously were able to cause deadlocks:
1) Existing pipe is fast_dispatching
2) Replacement incoming pipe is accepted
*) blocks on stop_and_wait() of existing Pipe
3) External things are blocked on SimpleMessenger::lock() while
blocking completion of the fast dispatch.
To resolve this, if we detect that an existing Pipe we want to take over is
in the process of fast dispatching, we unlock our locks and wait on it to
finish. Then we go back to the lookup step and retry.
The effect of this should be safe:
1) We are not making any changes to the existing Pipe in new ways
2) We have not registered the new Pipe anywhere
3) We have not sent back any replies based on Messenger state to
the remote endpoint.