Conflicts:
src/common/Throttle.cc
src/common/Throttle.h
src/test/common/Throttle.cc: in jewel, we don't have perfconter
for throttles, neither do we have backport14.h back then, so we need to
resolve these conflicts, by removing perfcounter related change in
Throttle.cc, and add the make_unique helper for test/common/Throttle.cc,
add scope_guard to Throttle.cc.
Orit Wasserman [Sun, 21 Jan 2018 10:11:34 +0000 (12:11 +0200)]
rgw: resharding needs to set back the bucket ACL after link
Jewel only fix. New implementation of resharding in Luminous. Fixes: http://tracker.ceph.com/issues/22703 Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Huan Zhang [Fri, 24 Jun 2016 03:27:53 +0000 (11:27 +0800)]
rbd discard return -EINVAL if len > MAX_INT32
rbd discard use 'int' to return discarded length, but the 'len' user
passed is 'uint64', in some case, the ret value will be truncated
and return a negative value which means discard failed. ret -EINVAL
if len > MAX_INT32 to indicate support len <= MAX_INT32 only.
Jason Dillaman [Wed, 15 Nov 2017 14:09:15 +0000 (09:09 -0500)]
librbd: prevent overflow of discard API result code
Prevent discard/writesame lengths larger than 2GB.
Fixes: http://tracker.ceph.com/issues/21966 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3effd324db181e625665be33b5c6529dca723cc5) Signed-off-by: Nathan Cutler <ncutler@suse.com>
Conflicts:
PendingReleaseNotes (adapted for jewel)
src/librbd/librbd.cc (no writesame in jewel)
Sage Weil [Thu, 24 Aug 2017 21:56:13 +0000 (17:56 -0400)]
osd: subscribe to new osdmap while waiting_for_healthy
If we are sitting around waiting until we are able to ping our "up" peers,
we need to be sure that our notion of "up" is still correct and we're not
just stuck on an old, stale OSDMap.
Li Wang [Wed, 1 Nov 2017 09:21:29 +0000 (09:21 +0000)]
rbd-nbd: fix unused nbd device search bug in container
In some container scenarios, the host may choose to
map a specific nbd device, for example, /dev/nbd6 into the
container, in that case, the nbd device available in the
container is not numbered from 0. The current unused
nbd device search function will return no result.
This patch fixes it.
Fixes: http://tracker.ceph.com/issues/22012 Signed-off-by: Li Wang <laurence.liwang@gmail.com> Reviewed-by: Yunchuan Wen <yunchuan.wen@kylin-cloud.com>
(cherry picked from commit be0f9581f9727187ca03232e0b368e7da7a60609)
Jason Dillaman [Fri, 27 Oct 2017 20:45:54 +0000 (16:45 -0400)]
cls/journal: ensure tags are properly expired
Previously, if only the local image was using the journal or if
a disconnected peer was attached, the tag entries could not be
expired even if unreferenced.
Fixes: http://tracker.ceph.com/issues/21960 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 19fa1c7f5b2809e9a223b7b196dfc031e97a5dcd)
Matt Benjamin [Tue, 3 Oct 2017 21:48:29 +0000 (17:48 -0400)]
rgw: release cls lock if taken in RGWCompleteMultipart
Follows Casey's proposal to conditionally release the lock in
::complete(), in order to avoid duplicated code in various early
return cases.
Fixes: http://tracker.ceph.com/issues/21596 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 704f793f08a02760d23eb5778b738bb07be0e7cf)
Conflicts:
qa/tasks/ceph_manager.py (bring in a053ce091e1aa910a1d01aec489203500e67efe5
which has to be cherry-picked manually anyway, because it pre-dates the
ceph-qa-suite move)
Ning Yao [Thu, 7 Sep 2017 10:52:55 +0000 (10:52 +0000)]
test: fix misc fiemap testing
1) Different filesystem will have different behavior to
allocate extents. Therefore, even if write 4000 extents,
the filesystem may not really allocate 4000 extents.
2) kstore always return [0, xxx] even if offset ~= 0. Therefore,
the whole non-zero offset FiemapHoles test should be skipped
3) enable fiemap test for filestore, bluestore, memstore again
Fixes: http://tracker.ceph.com/issues/21716 Signed-off-by: Ning Yao <yaoning@unitedstack.com> Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 87f33376d977962ab7438c46873ea9b6292390d1)
Conflicts:
src/test/objectstore/store_test.cc (master does not set
filestore_op_thread_suicide_timeout; trivial resolution)
Casey Bodley [Thu, 5 Oct 2017 20:39:30 +0000 (16:39 -0400)]
rgw: RGWUser::init no longer overwrites user_id
if an admin op specifies a user_id and does not find a user with that
id, but does find a user based on a later field (email, access key,
etc), RGWUser::user_id will be overwritten with the existing user's id
when this happens on 'radosgw-admin user create', RGWUser::execute_add()
will modify that existing user, instead of trying to create a new user
with the given user_id (and failing due to the conflicting email,
access key, etc)
by preserving the original user_id (when specified), this uid conflict
is detected in RGWUser::check_op() and a "user id mismatch" error is
returned
Conflicts:
src/tools/ceph_objectstore_tool.cc (in jewel, ::encode() takes only two
arguments, while in luminous/master it takes a third which we omit
here)
Matt Benjamin [Fri, 19 Jan 2018 18:05:27 +0000 (13:05 -0500)]
rgw_file: alternate fix deadlock on lru eviction
This change is an alternate fix for two problems found and fixed
by Yao Zongyou <yaozongyou@vip.qq.com>.
The deadlock can be avoided just by not taking it in the recycle
case, which invariantly holds the lock.
The invalidation of the insert iterator by the recyle-path unlink
we'd like to handle as a condition in order to preserve the cached
insertion point optimization we get in the common case. (The
original behavior was, indeed, incorrect.)
Based on feedback from Yao, removed the RGWFileHandle dtor version
of the unlink check, which I think happened twice.
Xuehan Xu [Sat, 6 Jan 2018 02:40:33 +0000 (10:40 +0800)]
common: compute SimpleLRU's size with contents.size() instead of lru.size()
As libstdc++ earlier than version 5 implement the list::size() as a O(n) operation,
this should be needed to avoid regression of various ceph component's performance.
Yan, Zheng [Thu, 11 Jan 2018 09:50:22 +0000 (17:50 +0800)]
client: fix cap revoke race
If caps are been revoking by the auth MDS, don't consider them as
issued even they are still issued by non-auth MDS. The non-auth
MDS should also be revoking/exporting these caps, the client just
hasn't received the cap revoke/export message.
The race I encountered is: When caps are exporting to new MDS, the
client receives cap import message and cap revoke message from the
new MDS, then receives cap export message from the old MDS. When
the client receives cap revoke message from the new MDS, the revoking
caps are still issued by the old MDS, so the client does nothing.
Later when the cap export message is received, the client removes
the caps issued by the old MDS. (Another way to fix the race is
calling ceph_check_caps() in handle_cap_export())
Josh Durgin [Thu, 11 Jan 2018 02:39:28 +0000 (21:39 -0500)]
config: lower default omap entries recovered at once
For large omap DBs, reading 64k leads to heartbeat timeouts. There
are numerous callchains leading to this recovery step, many of which
do not have heartbeat handles, so for an easily backported version
just change the default number of entries read. DBs approaching 100GB
may require an even lower setting, but this should be good enough for
most clusters, without sacrificing recovery speed.
Casey Bodley [Mon, 18 Dec 2017 16:42:21 +0000 (11:42 -0500)]
rgw: dont log EBUSY errors in 'sync error list'
these temporary errors get retried automatically, so no admin
intervention is required. logging them only serves to waste space in
omap and obscure the more serious sync errors
Josh Durgin [Sat, 4 Jun 2016 01:46:15 +0000 (18:46 -0700)]
HashIndex: randomize split threshold by a configurable amount
Store a random value up to the filestore_split_rand_factor for each
collection when it is created or apply-layout-settings is run. This
should help distribute the load of splitting directories across a
longer period of time.
In cls_timeindex_list() though `to_index` has expired for a timespan, the marker is set for a subsequent index during the time boundary check.
This marker is further returned to RGWObjectExpirer::process_single_shard(), where this out_marker is trimmed from the respective shard,
resulting in a lost removal hint and a leaked object.