Conflicts:
src/common/Throttle.cc
src/common/Throttle.h
src/test/common/Throttle.cc: in jewel, we don't have perfconter
for throttles, neither do we have backport14.h back then, so we need to
resolve these conflicts, by removing perfcounter related change in
Throttle.cc, and add the make_unique helper for test/common/Throttle.cc,
add scope_guard to Throttle.cc.
Orit Wasserman [Sun, 21 Jan 2018 10:11:34 +0000 (12:11 +0200)]
rgw: resharding needs to set back the bucket ACL after link
Jewel only fix. New implementation of resharding in Luminous. Fixes: http://tracker.ceph.com/issues/22703 Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Huan Zhang [Fri, 24 Jun 2016 03:27:53 +0000 (11:27 +0800)]
rbd discard return -EINVAL if len > MAX_INT32
rbd discard use 'int' to return discarded length, but the 'len' user
passed is 'uint64', in some case, the ret value will be truncated
and return a negative value which means discard failed. ret -EINVAL
if len > MAX_INT32 to indicate support len <= MAX_INT32 only.
Jason Dillaman [Wed, 15 Nov 2017 14:09:15 +0000 (09:09 -0500)]
librbd: prevent overflow of discard API result code
Prevent discard/writesame lengths larger than 2GB.
Fixes: http://tracker.ceph.com/issues/21966 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3effd324db181e625665be33b5c6529dca723cc5) Signed-off-by: Nathan Cutler <ncutler@suse.com>
Conflicts:
PendingReleaseNotes (adapted for jewel)
src/librbd/librbd.cc (no writesame in jewel)
Li Wang [Wed, 1 Nov 2017 09:21:29 +0000 (09:21 +0000)]
rbd-nbd: fix unused nbd device search bug in container
In some container scenarios, the host may choose to
map a specific nbd device, for example, /dev/nbd6 into the
container, in that case, the nbd device available in the
container is not numbered from 0. The current unused
nbd device search function will return no result.
This patch fixes it.
Fixes: http://tracker.ceph.com/issues/22012 Signed-off-by: Li Wang <laurence.liwang@gmail.com> Reviewed-by: Yunchuan Wen <yunchuan.wen@kylin-cloud.com>
(cherry picked from commit be0f9581f9727187ca03232e0b368e7da7a60609)
Jason Dillaman [Fri, 27 Oct 2017 20:45:54 +0000 (16:45 -0400)]
cls/journal: ensure tags are properly expired
Previously, if only the local image was using the journal or if
a disconnected peer was attached, the tag entries could not be
expired even if unreferenced.
Fixes: http://tracker.ceph.com/issues/21960 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 19fa1c7f5b2809e9a223b7b196dfc031e97a5dcd)
Matt Benjamin [Tue, 3 Oct 2017 21:48:29 +0000 (17:48 -0400)]
rgw: release cls lock if taken in RGWCompleteMultipart
Follows Casey's proposal to conditionally release the lock in
::complete(), in order to avoid duplicated code in various early
return cases.
Fixes: http://tracker.ceph.com/issues/21596 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 704f793f08a02760d23eb5778b738bb07be0e7cf)
Conflicts:
qa/tasks/ceph_manager.py (bring in a053ce091e1aa910a1d01aec489203500e67efe5
which has to be cherry-picked manually anyway, because it pre-dates the
ceph-qa-suite move)
Ning Yao [Thu, 7 Sep 2017 10:52:55 +0000 (10:52 +0000)]
test: fix misc fiemap testing
1) Different filesystem will have different behavior to
allocate extents. Therefore, even if write 4000 extents,
the filesystem may not really allocate 4000 extents.
2) kstore always return [0, xxx] even if offset ~= 0. Therefore,
the whole non-zero offset FiemapHoles test should be skipped
3) enable fiemap test for filestore, bluestore, memstore again
Fixes: http://tracker.ceph.com/issues/21716 Signed-off-by: Ning Yao <yaoning@unitedstack.com> Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 87f33376d977962ab7438c46873ea9b6292390d1)
Conflicts:
src/test/objectstore/store_test.cc (master does not set
filestore_op_thread_suicide_timeout; trivial resolution)
Casey Bodley [Thu, 5 Oct 2017 20:39:30 +0000 (16:39 -0400)]
rgw: RGWUser::init no longer overwrites user_id
if an admin op specifies a user_id and does not find a user with that
id, but does find a user based on a later field (email, access key,
etc), RGWUser::user_id will be overwritten with the existing user's id
when this happens on 'radosgw-admin user create', RGWUser::execute_add()
will modify that existing user, instead of trying to create a new user
with the given user_id (and failing due to the conflicting email,
access key, etc)
by preserving the original user_id (when specified), this uid conflict
is detected in RGWUser::check_op() and a "user id mismatch" error is
returned
Conflicts:
src/tools/ceph_objectstore_tool.cc (in jewel, ::encode() takes only two
arguments, while in luminous/master it takes a third which we omit
here)
Matt Benjamin [Fri, 19 Jan 2018 18:05:27 +0000 (13:05 -0500)]
rgw_file: alternate fix deadlock on lru eviction
This change is an alternate fix for two problems found and fixed
by Yao Zongyou <yaozongyou@vip.qq.com>.
The deadlock can be avoided just by not taking it in the recycle
case, which invariantly holds the lock.
The invalidation of the insert iterator by the recyle-path unlink
we'd like to handle as a condition in order to preserve the cached
insertion point optimization we get in the common case. (The
original behavior was, indeed, incorrect.)
Based on feedback from Yao, removed the RGWFileHandle dtor version
of the unlink check, which I think happened twice.
Xuehan Xu [Sat, 6 Jan 2018 02:40:33 +0000 (10:40 +0800)]
common: compute SimpleLRU's size with contents.size() instead of lru.size()
As libstdc++ earlier than version 5 implement the list::size() as a O(n) operation,
this should be needed to avoid regression of various ceph component's performance.
Yan, Zheng [Thu, 11 Jan 2018 09:50:22 +0000 (17:50 +0800)]
client: fix cap revoke race
If caps are been revoking by the auth MDS, don't consider them as
issued even they are still issued by non-auth MDS. The non-auth
MDS should also be revoking/exporting these caps, the client just
hasn't received the cap revoke/export message.
The race I encountered is: When caps are exporting to new MDS, the
client receives cap import message and cap revoke message from the
new MDS, then receives cap export message from the old MDS. When
the client receives cap revoke message from the new MDS, the revoking
caps are still issued by the old MDS, so the client does nothing.
Later when the cap export message is received, the client removes
the caps issued by the old MDS. (Another way to fix the race is
calling ceph_check_caps() in handle_cap_export())
Josh Durgin [Thu, 11 Jan 2018 02:39:28 +0000 (21:39 -0500)]
config: lower default omap entries recovered at once
For large omap DBs, reading 64k leads to heartbeat timeouts. There
are numerous callchains leading to this recovery step, many of which
do not have heartbeat handles, so for an easily backported version
just change the default number of entries read. DBs approaching 100GB
may require an even lower setting, but this should be good enough for
most clusters, without sacrificing recovery speed.
Casey Bodley [Mon, 18 Dec 2017 16:42:21 +0000 (11:42 -0500)]
rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors
Josh Durgin [Sat, 4 Jun 2016 01:46:15 +0000 (18:46 -0700)]
HashIndex: randomize split threshold by a configurable amount
Store a random value up to the filestore_split_rand_factor for each
collection when it is created or apply-layout-settings is run. This
should help distribute the load of splitting directories across a
longer period of time.
In cls_timeindex_list() though `to_index` has expired for a timespan, the marker is set for a subsequent index during the time boundary check.
This marker is further returned to RGWObjectExpirer::process_single_shard(), where this out_marker is trimmed from the respective shard,
resulting in a lost removal hint and a leaked object.
Jason Dillaman [Wed, 27 Sep 2017 13:40:08 +0000 (09:40 -0400)]
librbd: hold cache_lock while clearing cache nonexistence flags
When transitioning from a snapshot that had an associated parent
to a snapshot where the parent was flattened and removed, the cache
was being referenced without holding the required lock.
Fixes: http://tracker.ceph.com/issues/21558 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 16ef97830cde30efb96f7aee69834b3a5c2d5248)
rgw: data sync: set num_shards when building full maps
When radosgw-admin data sync init is called on a cluster, the next run
of rgw crashes as when it processes ListBucketIndexesCR, num_shards
isn't set which is later referenced in ListBucketIndexesCR. Setting the
n sync_info.num_shards correctly to handle this case
Adam C. Emerson [Wed, 20 Dec 2017 22:06:32 +0000 (17:06 -0500)]
rgw: Plumb refresh logic into object cache
Now when we force a refetch of bucket info it will actually go to the
OSD rather than simply using the objects in the object cache.
Fixes: http://tracker.ceph.com/issues/22517 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit d997f657750faf920170843e62deacab70008d8b) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Adam C. Emerson [Tue, 19 Dec 2017 21:47:09 +0000 (16:47 -0500)]
rgw: Add expiration in the object cache
We had it in the chained caches, but it doesn't do much good if
they just fetch objects out of the object cache.
Fixes: http://tracker.ceph.com/issues/22517 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 82a7e6ca31b416a7f0e41b5fda4c403d1d6be947) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>