Samuel Just [Fri, 14 Nov 2014 23:44:20 +0000 (15:44 -0800)]
PG: always clear_primary_state when leaving Primary
Otherwise, entries from the log collection process might leak into the next
epoch, where we might end up choosing a different authoritative log. In this
case, it resulted in us not rolling back to log entries on one of the replicas
prior to trying to recover from an affected object due to the peer_missing not
being cleared.
Fixes: #10059
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit c87bde64dfccb5d6ee2877cc74c66fc064b1bcd7)
Sage Weil [Thu, 5 Feb 2015 11:07:50 +0000 (03:07 -0800)]
mon: ignore osd failures from before up_from
If the failure was generated for an instance of the OSD prior to when
it came up, ignore it.
This probably causes a fair bit of unnecessary flapping in the wild...
Backport: giant, firefly Fixes: #10762 Reported-by: Dan van der Ster <dan@vanderster.com> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 400ac237d35d0d1d53f240fea87e8483c0e2a7f5)
Sage Weil [Mon, 12 Jan 2015 01:28:04 +0000 (17:28 -0800)]
osd: requeue blocked op before flush it was blocked on
If we have request A (say, cache-flush) that blocks things, and then
request B that gets blocked on it, and we have an interval change, then we
need to requeue B first, then A, so that the resulting queue will keep
A before B and preserve the order.
Loic Dachary [Wed, 17 Dec 2014 15:06:55 +0000 (16:06 +0100)]
crush: set_choose_tries = 100 for erasure code rulesets
It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph
cluster. The default tries (50) will frequently lead to bad mappings for
this use case. Changing it to 100 makes no significant CPU performance
difference, as tested manually by running crushtool on one million
mappings.
Samuel Just [Fri, 5 Dec 2014 23:29:52 +0000 (15:29 -0800)]
osd_types: op_queue_age_hist and fs_perf_stat should be in osd_stat_t::operator==
Fixes: 10259
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 1ac17c0a662e6079c2c57edde2b4dc947f547f57)
mon: PGMonitor: available size 0 if no osds on pool's ruleset
get_rule_avail() may return < 0, which we were using blindly assuming it
would always return an unsigned value. We would end up with weird
values if the ruleset had no osds.
Sage Weil [Tue, 2 Dec 2014 02:15:59 +0000 (18:15 -0800)]
osd: tolerate sessionless con in fast dispatch path
We can now get a session cleared from a Connection at any time. Change
the assert to an if in ms_fast_dispatch to cope. It's pretty rare, but it
can happen, especially with delay injection. In particular, a racing
thread can call mark_down() on us.
Greg Farnum [Fri, 6 Feb 2015 05:12:17 +0000 (21:12 -0800)]
fsync-tester: print info about PATH and locations of lsof lookup
We're seeing the lsof invocation fail (as not found) in testing and nobody can
identify why. Since attempting to reproduce the issue has not worked, this
patch will gather data from a genuinely in-vitro location.
Jason Dillaman [Mon, 27 Oct 2014 18:47:19 +0000 (14:47 -0400)]
osdc: Constrain max number of in-flight read requests
Constrain the number of in-flight RADOS read requests to the
cache size. This reduces the chance of the cache memory
ballooning during certain scenarios like copy-up which can
invoke many concurrent read requests.
Fixes: #9854
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This commit ensures that we check for timestamp of s3 request is within
acceptable grace time of radosgw
Addresses some failures in #10062 Fixes: #10062 Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
(cherry picked from commit 4b35ae067fef9f97b886afe112d662c61c564365)
Lei Dong [Mon, 27 Oct 2014 02:29:48 +0000 (10:29 +0800)]
fix can not disable max_size quota
Currently if we enable quota and set max_size = -1, it doesn’t
mean max_size is unlimited as expected. Instead, it means object
with any size is not allowed to upload because of “QuotaExceeded”.
The root cause is the function rgw_rounded_kb which convert max_size
to max_size_kb returns 0 for -1 because it takes an unsigned int
but we pass an int to it. A simple fix is check max_size before
it’s rounded to max_size_kb.
Test case:
1 enable and set quota:
radosgw-admin quota enable --uid={user_id} --quota-scope=user
radosgw-admin quota set --quota-scope=user --uid={user_id}\
--max-objects=100 --max-size=-1
2 upload any object with non-zero length
it will return 403 with “QuotaExceeded” and return 200 if you apply the fix.
Fixes: #5595
Backport: dumpling, firefly
We need to update the bucket index when updating object attrs, otherwise
we're missing meta changes that need to be registered. It also
solves issue of bucket index not knowing about object acl changes,
although this one still requires some more work.
Yehuda Sadeh [Wed, 5 Nov 2014 21:28:02 +0000 (13:28 -0800)]
rgw: send back ETag on S3 object copy
Fixes: #9479
Backport: firefly, giant
We didn't send the etag back correctly. Original code assumed the etag
resided in the attrs, but attrs only contained request attrs.
Yehuda Sadeh [Wed, 5 Nov 2014 21:40:55 +0000 (13:40 -0800)]
rgw: remove swift user manifest (DLO) hash calculation
Fixes: #9973
Backport: firefly, giant
Previously we were iterating through the parts, creating hash of the
parts etags (as S3 does for multipart uploads). However, swift just
calculates the etag for the empty manifest object.
Yehuda Sadeh [Thu, 20 Nov 2014 18:36:05 +0000 (10:36 -0800)]
rgw-admin: create subuser if needed when creating user
Fixes: #10103
Backport: firefly, giant
This turned up after fixing #9973. Earlier we also didn't create the
subuser in this case, but we didn't really read the subuser info when it
was authenticating. Now we do that as required, so we end up failing the
authentication. This only applies to cases where a subuser was created
using 'user create', and not the 'subuser create' command.
Yehuda Sadeh [Sat, 13 Dec 2014 01:07:30 +0000 (17:07 -0800)]
rgw: use s->bucket_attrs instead of trying to read obj attrs
Fixes: #10307
Backport: firefly, giant
This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.
Yehuda Sadeh [Sat, 13 Dec 2014 01:07:30 +0000 (17:07 -0800)]
rgw: use s->bucket_attrs instead of trying to read obj attrs
Fixes: #10307
Backport: firefly, giant
This is needed, since we can't really read the bucket attrs by trying to
read the bucket entry point attrs. We already have the bucket attrs
anyway, use these.
Petr Machata [Thu, 29 Jan 2015 17:15:02 +0000 (10:15 -0700)]
support Boost 1.57.0
Sometime after 1.55, boost introduced a forward declaration of
operator<< in optional.hpp. In 1.55 and earlier, when << was used
without the _io having been included, what got dumped was an implicit
bool conversion.
http://tracker.ceph.com/issues/10688 Refs: #10688 Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 85717394c33137eb703a7b88608ec9cf3287f67a)
Sage Weil [Tue, 16 Dec 2014 01:04:32 +0000 (17:04 -0800)]
osd: handle no-op write with snapshot case
If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:
0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
Fix is to not recreated the snapdir if it already exists.
John Spray [Wed, 14 Jan 2015 10:35:53 +0000 (10:35 +0000)]
mds: handle heartbeat_reset during shutdown
Because any thread might grab mds_lock and call heartbeat_reset
immediately after a call to suicide() completes, this needs
to be handled as a special case where we tolerate MDS::hb having
already been destroyed.
Fixes: #10382 Signed-off-by: John Spray <john.spray@redhat.com>
Jason Dillaman [Mon, 15 Dec 2014 15:53:53 +0000 (10:53 -0500)]
librbd: complete all pending aio ops prior to closing image
It was possible for an image to be closed while aio operations
were still outstanding. Now all aio operations are tracked and
completed before the image is closed.
Fixes: #10299
Backport: giant, firefly, dumpling Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 19 Jan 2015 15:28:56 +0000 (10:28 -0500)]
librbd: gracefully handle deleted/renamed pools
snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.
Fixes: #10270
Backport: giant, firefly Signed-off-by: Jason Dillaman <dillaman@redhat.com>