]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
6 years agoqa: add test that builds example librados programs 24538/head
Nathan Cutler [Thu, 19 Jul 2018 15:59:04 +0000 (17:59 +0200)]
qa: add test that builds example librados programs

Fixes: http://tracker.ceph.com/issues/15100
Signed-off-by: Nathan Cutler <ncutler@suse.com>
(cherry picked from commit c46c890d0241972cee10260f071f65b4beedf92c)

6 years agoMerge pull request #24396 from smithfarm/wip-26932-luminous
Yuri Weinstein [Wed, 10 Oct 2018 18:53:24 +0000 (14:53 -0400)]
Merge pull request #24396 from smithfarm/wip-26932-luminous

luminous: osd: scrub livelock

Reviewed-by: Neha Ojha <nojha@redhat.com>
6 years agoMerge pull request #24479 from neha-ojha/wip-36347-luminous
Neha Ojha [Tue, 9 Oct 2018 01:01:58 +0000 (18:01 -0700)]
Merge pull request #24479 from neha-ojha/wip-36347-luminous

qa/suites/rados/upgrade/jewel-x-singleton: exclude python3-rados, python3-cephfs

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
6 years agoMerge PR #24403 into luminous
Patrick Donnelly [Mon, 8 Oct 2018 20:26:07 +0000 (13:26 -0700)]
Merge PR #24403 into luminous

* refs/pull/24403/head:
qa: add timeout to cleaning up workunit sandbox
qa: cleanup workunit dir for each unit
qa: add timeout to kclient umount
qa: do not cleanup sandbox on error
qa: use default timeout in fs workunits
qa: use sudo to cleanup workspace
qa: cleanup parallel execution of fsstress
qa/workunit: implement cleanup option

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoqa/suites/rados/upgrade/jewel-x-singleton: exclude python3-rados, python3-cephfs 24479/head
Neha Ojha [Mon, 8 Oct 2018 19:12:39 +0000 (15:12 -0400)]
qa/suites/rados/upgrade/jewel-x-singleton: exclude python3-rados, python3-cephfs

This fix goes directly into the luminous branch since these packages do not need
to be installed on jewel, when upgrading to luminous.

Fixes: https://tracker.ceph.com/issues/36347
Signed-off-by: Neha Ojha <nojha@redhat.com>
6 years agoMerge pull request #24410 from smithfarm/wip-36196-luminous
Yuri Weinstein [Fri, 5 Oct 2018 20:57:44 +0000 (13:57 -0700)]
Merge pull request #24410 from smithfarm/wip-36196-luminous

luminous: mds: internal op missing events time 'throttled', 'all_read', 'dispatched'

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24421 from vshankar/wip-35937
Yuri Weinstein [Fri, 5 Oct 2018 20:57:18 +0000 (13:57 -0700)]
Merge pull request #24421 from vshankar/wip-35937

luminous: mds: track average session uptime

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoosd: vary tick interval +/- 5% to avoid scrub livelocks 24396/head
Sage Weil [Thu, 9 Aug 2018 13:33:42 +0000 (08:33 -0500)]
osd: vary tick interval +/- 5% to avoid scrub livelocks

If you have two pgs that need to scrub on two OSDs, each the primary
for one pg and the replica for the other, you can end up in a livelock:

- both osds locally reserve a scrub slot
- both osds send a scrub schedule request
- both scrub requests are rejected
- both osds wait exactly 1 second
- repeat

Seems a bit unlikely, but I've seen test cases where it goes on more an
hour.

Fixes: http://tracker.ceph.com/issues/26890
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2011377c379c9d53a3a0a693a7874fc330278898)

Conflicts:
src/osd/OSD.cc
- luminous does not have src/include/random.h; use #include <random>
  instead, seeding with whoami so each OSD gets a different series
  of pseudo-random numbers

6 years agoMerge pull request #23483 from pdvian/wip-26840-luminous
Yuri Weinstein [Fri, 5 Oct 2018 20:12:15 +0000 (13:12 -0700)]
Merge pull request #23483 from pdvian/wip-26840-luminous

luminous: librados application's symbol could conflict with the libceph-common

Reviewed-by: Kefu Chai <kchai@redhat.com>
6 years agoMerge pull request #24405 from dillaman/wip-36143-luminous
Yuri Weinstein [Fri, 5 Oct 2018 20:11:45 +0000 (13:11 -0700)]
Merge pull request #24405 from dillaman/wip-36143-luminous

luminous: librbd: blacklisted client might not notice it lost the lock

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
6 years agoMerge pull request #24415 from dillaman/wip-36224-luminous
Yuri Weinstein [Fri, 5 Oct 2018 20:11:15 +0000 (13:11 -0700)]
Merge pull request #24415 from dillaman/wip-36224-luminous

luminous: librbd: object map improperly flagged as invalidated

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
6 years agoMerge pull request #24419 from pdvian/wip-36157-luminous
Yuri Weinstein [Fri, 5 Oct 2018 20:10:38 +0000 (13:10 -0700)]
Merge pull request #24419 from pdvian/wip-36157-luminous

luminous: msg: ceph_abort() when there are enough accepter errors in msg server

Reviewed-by: Kefu Chai <kchai@redhat.com>
6 years agoMerge pull request #24424 from theanalyst/wip-luminous-36311
Yuri Weinstein [Fri, 5 Oct 2018 20:09:56 +0000 (13:09 -0700)]
Merge pull request #24424 from theanalyst/wip-luminous-36311

 luminous: multi-site: object name should be urlencoded when we put it into ES

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoosd: tick at OSD_TICK_INTERVAL, not heartbeat interval
Sage Weil [Thu, 9 Aug 2018 13:22:05 +0000 (08:22 -0500)]
osd: tick at OSD_TICK_INTERVAL, not heartbeat interval

Heartbeat inveral is not related!

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 7d76458354661f7575c4a2cae251a9b828513580)

6 years agoqa: add timeout to cleaning up workunit sandbox 24403/head
Patrick Donnelly [Sun, 30 Sep 2018 00:37:12 +0000 (17:37 -0700)]
qa: add timeout to cleaning up workunit sandbox

If there is a bug preventing rm from completing, the workunit will get stuck.

Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 3a10d74f3aa4901dd9edffc0061992073ae67085)

6 years agoqa: cleanup workunit dir for each unit
Patrick Donnelly [Mon, 24 Sep 2018 18:29:10 +0000 (11:29 -0700)]
qa: cleanup workunit dir for each unit

This was wrongly dropped and moved to the finalizer.

Introduced-by: de824f74dd8ac909e47335ccd53d7a085e388e41
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 70844f3f55004024a747854013a1efb409705d81)

6 years agoqa: add timeout to kclient umount
Patrick Donnelly [Sun, 30 Sep 2018 00:34:37 +0000 (17:34 -0700)]
qa: add timeout to kclient umount

Otherwise QA sits forever waiting for the kclient to umount when there is a
problem.

Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 7a64eb9dfb908a1a8e5d2b0dcaa7ca9df52a9ab1)

6 years agoqa: do not cleanup sandbox on error
Patrick Donnelly [Wed, 26 Sep 2018 14:38:58 +0000 (07:38 -0700)]
qa: do not cleanup sandbox on error

Otherwise the command will hang if the mount is broken.

Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit d4b8f94cf8d95ebb277b550fc6ebc3468052a39c)

6 years agoqa: use default timeout in fs workunits
Patrick Donnelly [Mon, 1 Oct 2018 01:10:05 +0000 (18:10 -0700)]
qa: use default timeout in fs workunits

Six hours is unnecessarily long.

Fixes: http://tracker.ceph.com/issues/36184
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit bdd2ddcfd862b65dfd73bc1ea09b0ad07040d445)

6 years agoqa: use sudo to cleanup workspace
Patrick Donnelly [Mon, 24 Sep 2018 18:02:49 +0000 (11:02 -0700)]
qa: use sudo to cleanup workspace

Files in scratch_tmp may not be owned by ubuntu.

Fixes: http://tracker.ceph.com/issues/36165
Introduced-by: de824f74dd8ac909e47335ccd53d7a085e388e41
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 1eaf78a75498d0f739b40bf310d036c851465fad)

6 years agoqa: cleanup parallel execution of fsstress
Patrick Donnelly [Tue, 18 Sep 2018 21:57:05 +0000 (14:57 -0700)]
qa: cleanup parallel execution of fsstress

Two instances of fsstress clobber each other. Just build it in the local sandbox.

Fixes: http://tracker.ceph.com/issues/24177
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit de824f74dd8ac909e47335ccd53d7a085e388e41)

6 years agoqa/workunit: implement cleanup option
Nathan Cutler [Wed, 3 Oct 2018 19:13:11 +0000 (21:13 +0200)]
qa/workunit: implement cleanup option

This is a partial backport of 91942df5a690809ed872f5aa8c35b56e8048e485
just to get the workunit.py changes.

Signed-off-by: Nathan Cutler <ncutler@suse.com>
6 years agoMerge pull request #24327 from smithfarm/wip-24478-luminous
Yuri Weinstein [Thu, 4 Oct 2018 21:49:20 +0000 (14:49 -0700)]
Merge pull request #24327 from smithfarm/wip-24478-luminous

luminous: read object attrs failed at EC recovery

Reviewed-by: David Zafman <dzafman@redhat.com>
6 years agoMerge pull request #24395 from smithfarm/wip-25145-luminous
Yuri Weinstein [Thu, 4 Oct 2018 21:48:34 +0000 (14:48 -0700)]
Merge pull request #24395 from smithfarm/wip-25145-luminous

luminous: mon: Automatically set expected_num_objects for new pools with >=100 PGs per OSD

Reviewed-by: Neha Ojha <nojha@redhat.com>
6 years agoMerge pull request #24397 from smithfarm/wip-36137-luminous
Yuri Weinstein [Thu, 4 Oct 2018 21:48:02 +0000 (14:48 -0700)]
Merge pull request #24397 from smithfarm/wip-36137-luminous

luminous: rgw: multisite: update index segfault on shutdown/realm reload

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24398 from smithfarm/wip-36202-luminous
Yuri Weinstein [Thu, 4 Oct 2018 21:47:37 +0000 (14:47 -0700)]
Merge pull request #24398 from smithfarm/wip-36202-luminous

luminous: multisite: intermittent test_bucket_index_log_trim failures

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24393 from smithfarm/wip-23998-luminous
Yuri Weinstein [Thu, 4 Oct 2018 21:15:45 +0000 (14:15 -0700)]
Merge pull request #24393 from smithfarm/wip-23998-luminous

luminous: osd/EC: slow/hung ops in multimds suite test

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge PR #24375 into luminous
Patrick Donnelly [Thu, 4 Oct 2018 20:26:26 +0000 (13:26 -0700)]
Merge PR #24375 into luminous

* refs/pull/24375/head:
mds: use monotonic waits in Beacon

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24391 from smithfarm/wip-24630-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:07:50 +0000 (13:07 -0700)]
Merge pull request #24391 from smithfarm/wip-24630-luminous

luminous: cls/rgw: don't assert in decode_list_index_key()

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24387 from pdvian/wip-36126-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:07:12 +0000 (13:07 -0700)]
Merge pull request #24387 from pdvian/wip-36126-luminous

luminous: msg/async: clean up local buffers on dispatch

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ricardo Dias <rdias@suse.com>
6 years agoMerge pull request #24389 from pdvian/wip-36128-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:05:54 +0000 (13:05 -0700)]
Merge pull request #24389 from pdvian/wip-36128-luminous

luminous: rgw: abort_bucket_multiparts() ignores individual NoSuchUpload errors

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24316 from smithfarm/wip-26979-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:02:11 +0000 (13:02 -0700)]
Merge pull request #24316 from smithfarm/wip-26979-luminous

luminous: multisite: intermittent failures in test_bucket_sync_disable_enable

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24317 from smithfarm/wip-35703-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:01:46 +0000 (13:01 -0700)]
Merge pull request #24317 from smithfarm/wip-35703-luminous

luminous: multisite: out of order updates to sync status markers

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24318 from smithfarm/wip-35980-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:01:22 +0000 (13:01 -0700)]
Merge pull request #24318 from smithfarm/wip-35980-luminous

luminous: multisite: data sync error repo processing does not back off on empty

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24361 from pdvian/wip-36124-luminous
Yuri Weinstein [Thu, 4 Oct 2018 20:00:53 +0000 (13:00 -0700)]
Merge pull request #24361 from pdvian/wip-36124-luminous

luminous: rgw: fix chunked-encoding for chunks >1MiB

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24123 from pdvian/wip-35713-luminous
Yuri Weinstein [Thu, 4 Oct 2018 15:23:53 +0000 (08:23 -0700)]
Merge pull request #24123 from pdvian/wip-35713-luminous

luminous: librbd: ensure exclusive lock acquired when removing sync point snaps…

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>
6 years agoMerge pull request #24320 from smithfarm/wip-36119-luminous
Yuri Weinstein [Thu, 4 Oct 2018 15:23:00 +0000 (08:23 -0700)]
Merge pull request #24320 from smithfarm/wip-36119-luminous

luminous: [rbd-mirror] failed assertion when updating mirror status

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
6 years agoMerge pull request #24390 from smithfarm/wip-24946-luminous
Yuri Weinstein [Thu, 4 Oct 2018 15:21:26 +0000 (08:21 -0700)]
Merge pull request #24390 from smithfarm/wip-24946-luminous

luminous: librbd: image create request should validate data pool for self-managed snapshot support

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
6 years agolibrbd: keep IO blocked until after snapshot object map created 24415/head
Jason Dillaman [Mon, 24 Sep 2018 19:07:15 +0000 (15:07 -0400)]
librbd: keep IO blocked until after snapshot object map created

The IO was being unblocked before object map was created, allowing
a potential copyup request to fail to update a still-to-be-created
object map.

Fixes: http://tracker.ceph.com/issues/24516
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 1e874403bf861cb8b74261308d8b73434cf90341)

Conflicts:
src/librbd/object_map/SnapshotCreateRequest.cc: trivial resolution
src/librbd/operation/SnapshotCreateRequest.cc: trivial resolution

6 years agolibrbd: do not invalidate object map if update races with copyup
Jason Dillaman [Mon, 24 Sep 2018 18:45:09 +0000 (14:45 -0400)]
librbd: do not invalidate object map if update races with copyup

The copyup state machine needs to iterate over all object maps to update
the existence for the object. If an snapshot is being removed concurrently,
it's possible to invalidate the object map for the image.

Fixes: http://tracker.ceph.com/issues/24516
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 5a1cb469879157297ab456261f9335d8b855684f)

Conflicts:
src/librbd/ObjectMap.cc: trivial resolution
src/librbd/ObjectMap.h: trivial resolution
src/librbd/deep_copy/ObjectCopyRequest.cc: moved to rbd-mirror image sync
src/librbd/io/CopyupRequest.cc: trivial resolution
src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc: moved to rbd-mirror image sync
src/test/librbd/test_mock_ObjectMap.cc: trivial resolution

6 years agolibrbd: do not invalidate object map when attempting to delete non-existent snapshot
Jason Dillaman [Fri, 14 Sep 2018 15:46:13 +0000 (11:46 -0400)]
librbd: do not invalidate object map when attempting to delete non-existent snapshot

If duplicate snapshot remove requests are received by the lock owner from a peer
client, the first request will remove the object map. If the second request
arrives while the first is in-progress, it will again attempt to remove the
object map but fail to load it since it's already been deleted. This incorrectly
results in the next object map being flagged as invalid.

Fixes: http://tracker.ceph.com/issues/24516
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 0a31c55ea83d85da88c7586c9a8fa8d6ec6618a7)

Conflicts:
src/librbd/object_map/SnapshotRemoveRequest.cc: trivial resolution

6 years agorgw: url_encode key name and instance in es sync module 24424/head
Chang Liu [Mon, 5 Mar 2018 07:46:43 +0000 (15:46 +0800)]
rgw: url_encode key name and instance in es sync module

Some objects whose name contains space or other special chars
can't be synced to ES correctly. we need to do url_encode when
we send a HTTP request to ES.

Fixes: tracker.ceph.com/issues/23216
Signed-off-by: Chang Liu <liuchang0812@gmail.com>
(cherry picked from commit 13978bb28b7be809033bf24550b21ed2713ddc9b)

6 years agomds: include session uptime when diplaying session list 24421/head
Venky Shankar [Mon, 30 Jul 2018 05:47:02 +0000 (01:47 -0400)]
mds: include session uptime when diplaying session list

Fixes: http://tracker.ceph.com/issues/35937
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit b23a204cdde2bc5f34304cca3f1bac3496cf7a41)

6 years agomds: track average uptime of sessions
Venky Shankar [Tue, 24 Jul 2018 03:47:02 +0000 (23:47 -0400)]
mds: track average uptime of sessions

Average session age math improvements by Patrick.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit d2627b98d0c1477d664d00384ef033d323b26957)

Conflicts:
        src/mds/SessionMap.h

6 years agomsg: ceph_abort() when there are enough accepter errors in msg server 24419/head
root [Mon, 30 Jul 2018 01:29:48 +0000 (21:29 -0400)]
msg: ceph_abort() when there are enough accepter errors in msg server
In some extrem cases(we have met one in our production cluster), when Accepter thread break out , new client can not connect to the osd. Because the former heartbeat connections are already connected, other osd can not detect failure then notify monitor to mark the failed osd down.
In the patch, we there are abnormal communication errors ,we just ceph_abort  so that osd can go down fastly and other osds can notify monitor to mark the failed osd down.
Signed-off-by: penglaiyxy@gmail.com <penglaiyxy@gmail.com>
(cherry picked from commit 00e0ab407b2e9659d9121be1217e95c8117c411e)

Conflicts:
src/common/legacy_config_opts.h : Resolved for ms_max_accept_failures
src/common/options.cc : Resolved for ms_max_accept_failures
src/msg/async/AsyncMessenger.cc : Resolved in accept
src/msg/simple/Accepter.cc : Resolved in entry

6 years agolibrbd: converted object map snapshot remove state machine to new style
Jason Dillaman [Fri, 14 Sep 2018 15:21:28 +0000 (11:21 -0400)]
librbd: converted object map snapshot remove state machine to new style

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 58770188ab57a53b786cf616ccfbf6acfcdc115a)

Conflicts:
src/librbd/object_map/SnapshotRemoveRequest.cc: trivial resolution
src/librbd/object_map/SnapshotRemoveRequest.h: trivial resolution

6 years agolibrbd: test_flags helper should require snap id parameter
Jason Dillaman [Fri, 14 Sep 2018 13:59:35 +0000 (09:59 -0400)]
librbd: test_flags helper should require snap id parameter

The HEAD and snapshots have potentially different flag states
since object maps get invalidated per revision.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 862082792d9c2ff23823e46937b7de9a42830cfd)

Conflicts:
src/librbd/ObjectMap.cc: trivial resolution
src/librbd/operation/SnapshotRemoveRequest.cc: trivial resolution
src/test/librbd/test_DeepCopy.cc: DNE
src/test/librbd/test_Migration.cc: DNE

6 years agomds/MDCache: fix mds internal op missing events time 24410/head
Yanhu Cao [Wed, 19 Sep 2018 02:32:48 +0000 (10:32 +0800)]
mds/MDCache: fix mds internal op missing events time

Fixes: http://tracker.ceph.com/issues/36114
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
(cherry picked from commit bd6ae6f4e29ac79e5e07373f52099338e6ab5416)

6 years agoMerge pull request #23877 from smithfarm/wip-24842-luminous
Yuri Weinstein [Wed, 3 Oct 2018 19:49:51 +0000 (12:49 -0700)]
Merge pull request #23877 from smithfarm/wip-24842-luminous

luminous: qa: move mds/client config to qa from teuthology ceph.conf.template

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24086 from batrick/i35976
Yuri Weinstein [Wed, 3 Oct 2018 19:46:27 +0000 (12:46 -0700)]
Merge pull request #24086 from batrick/i35976

luminous: mds: configurable timeout for client eviction

Reviewed-by:  Venky Shankar <vshankar@redhat.com>

6 years agoMerge pull request #24376 from smithfarm/wip-35939-luminous
Yuri Weinstein [Wed, 3 Oct 2018 19:45:22 +0000 (12:45 -0700)]
Merge pull request #24376 from smithfarm/wip-35939-luminous

luminous: client: statfs inode count odd

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24378 from smithfarm/wip-36135-luminous
Yuri Weinstein [Wed, 3 Oct 2018 19:44:19 +0000 (12:44 -0700)]
Merge pull request #24378 from smithfarm/wip-36135-luminous

luminous: mds: rctime may go back

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agomds: use monotonic waits in Beacon 24375/head
Patrick Donnelly [Fri, 17 Aug 2018 22:03:56 +0000 (15:03 -0700)]
mds: use monotonic waits in Beacon

This guarantees that the sender thread cannot be disrupted by system clock
changes. This commit also simplifies the sender thread by manually managing the
thread and avoiding unnecessary context creation.

Fixes: http://tracker.ceph.com/issues/26962
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit a5fc29b95281c6ca58c9177c665c379846beb4b3)

Conflicts:
src/mds/Beacon.cc
- g_conf->foo instead of g_conf()->foo
- boost::string_view instead of std::string_view
- always specify template type std::unique_lock<std::mutex>
src/mds/Beacon.h
- time::min() instead of clock::zero()
- always specify template type std::unique_lock<std::mutex>
- std::chrono::seconds instead of "1s" in std::chrono_literals namespace
  (which is a C++14ism)

6 years agolibrbd: use the correct error code when the exclusive lock isn't locked 24405/head
Jason Dillaman [Thu, 6 Sep 2018 21:08:12 +0000 (17:08 -0400)]
librbd: use the correct error code when the exclusive lock isn't locked

If the client is currently blacklisted, use -EBLACKLISTED, otherwise
use -EROFS.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit e8eee15518facf562adf1aaba02d3a9523cdd2c3)

Conflicts:
src/librbd/ExclusiveLock.cc: trivial resolution
src/librbd/image/RemoveRequest.cc: trivial resolution
src/test/rbd_mirror/image_deleter/test_mock_SnapshotPurgeRequest.cc: DNE
src/tools/rbd_mirror/image_deleter/SnapshotPurgeRequest.cc: DNE
src/tools/rbd_mirror/image_deleter/SnapshotPurgeRequest.h: DNE
src/librbd/DeepCopyRequest.cc: (see below)
src/librbd/deep_copy/ObjectCopyRequest.cc: (see below)
src/librbd/deep_copy/ObjectCopyRequest.h: (see below)
src/librbd/deep_copy/SetHeadRequest.cc: (see below)
src/librbd/deep_copy/SetHeadRequest.h: (see below)
src/librbd/deep_copy/SnapshotCopyRequest.cc: (see below)
src/librbd/deep_copy/SnapshotCopyRequest.h: (see below)
src/librbd/deep_copy/SnapshotCreateRequest.cc: (see below)
src/librbd/deep_copy/SnapshotCreateRequest.h: (see below)
src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SetHeadRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SnapshotCopyRequest.cc: (see below)
src/test/librbd/deep_copy/test_mock_SnapshotCreateRequest.cc: (see below)
src/test/librbd/test_mock_DeepCopyRequest.cc
- deep-copy related files were originally derived from rbd-mirror
  equivalents. Similar modifications where made to the associated
  rbd-mirror files.

6 years agoMerge pull request #24382 from alfredodeza/luminous-rm36247
Alfredo Deza [Wed, 3 Oct 2018 15:27:26 +0000 (11:27 -0400)]
Merge pull request #24382 from alfredodeza/luminous-rm36247

luminous ceph-volume: skip processing devices that don't exist when scanning system disks

Reviewed-by: Andrew Schoen <aschoen@redhat.com>
6 years agolibrbd: helper to retrieve the correct error code for read-only op
Jason Dillaman [Thu, 6 Sep 2018 21:15:50 +0000 (17:15 -0400)]
librbd: helper to retrieve the correct error code for read-only op

When the exclusive lock is unlocked, the error code should be
-EBLACKLISTED when the client is blacklisted, otherwise -EROFS.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit a84fbb2565fb603ea809487d920461d14442d188)

6 years agolibrbd: reacquire lock should properly handle failed watcher
Jason Dillaman [Thu, 6 Sep 2018 17:38:17 +0000 (13:38 -0400)]
librbd: reacquire lock should properly handle failed watcher

If the watch has been lost, assume the lock has been lost but attempt
to reacquire it if and when the watch is re-established.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 2057d99f451e3007d4fd05a88faa968319d0ba90)

Conflicts:
src/librbd/ManagedLock.cc: trivial resolution

6 years agolibrbd: assume lock is unlocked if blacklisted or object deleted
Jason Dillaman [Thu, 30 Aug 2018 19:12:27 +0000 (15:12 -0400)]
librbd: assume lock is unlocked if blacklisted or object deleted

This will ensure that it's possible to potentially re-acquire the
lock should the blacklist expire before the image is closed.

Fixes: http://tracker.ceph.com/issues/34534
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 60064f68f5dd2bbf5fbab95564fa522335091f4a)

6 years agolibrbd: watcher should internally track blacklisted state
Jason Dillaman [Thu, 6 Sep 2018 13:44:59 +0000 (09:44 -0400)]
librbd: watcher should internally track blacklisted state

Since it will periodically attempt to re-acquire the watch,
it will know when the RADOS client has been blacklisted and
when the blacklist has been removed.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 9ea94f284061849e452dd61c8f89ecca18642b0d)

Conflicts:
src/librbd/Watcher.cc: trivial resolution
src/test/librbd/mock/MockImageWatcher.h: trivial resolution

6 years agolibrbd: attempt to recover lost image watcher upon all failures
Jason Dillaman [Thu, 30 Aug 2018 20:51:10 +0000 (16:51 -0400)]
librbd: attempt to recover lost image watcher upon all failures

For example, if an image is blacklisted and the blacklist eventually
expires, the image should recover its watch.

Fixes: http://tracker.ceph.com/issues/34534
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 23b7447f6be87a14f84664f29431d2fdd2af4512)

Conflicts:
src/librbd/watcher/RewatchRequest.cc: trivial resolution
src/test/librbd/CMakeLists.txt: trivial resolution
src/test/librbd/test_mock_Watcher.cc: trivial resolution

6 years agorbd-mirror: attempt to re-acquire leader lock if watcher recovered
Jason Dillaman [Thu, 31 May 2018 18:09:30 +0000 (14:09 -0400)]
rbd-mirror: attempt to re-acquire leader lock if watcher recovered

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 69645f5433ce48281d3c6b70d979356c7ede2f88)
(cherry picked from commit a44e583fda52edceb0b20d78f1683a14d0e00f7b)

6 years agolibrbd: ensure managed lock can shut down if stuck waiting for register
Jason Dillaman [Thu, 31 May 2018 18:04:19 +0000 (14:04 -0400)]
librbd: ensure managed lock can shut down if stuck waiting for register

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit cb6712b0d9d5bccadb23a0e011eef05cf4d92280)
(cherry picked from commit f1c0bda32f0d15c0b0808ec1ef4ccfaee8d177b0)

6 years agolibrbd: fix rbd close race with rewatch
Song Shun [Tue, 10 Apr 2018 05:41:18 +0000 (13:41 +0800)]
librbd: fix rbd close race with rewatch

  fix rbd close race with rewatch

Signed-off-by: Song Shun <song.shun3@zte.com.cn>
(cherry picked from commit 8b833a293eac54fd3d38f12660d856ecc310d805)

Conflicts:
src/librbd/Watcher.cc: trivial resolution

6 years agolibrbd: potential race in RewatchRequest when resetting watch_handle
Mykola Golub [Tue, 13 Feb 2018 12:20:09 +0000 (14:20 +0200)]
librbd: potential race in RewatchRequest when resetting watch_handle

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit f5c02adfdbf5d9da0186fd494ee33c469445be83)

6 years agorgw: remove BucketChangeObserver from data sync thread 24398/head
Casey Bodley [Thu, 20 Sep 2018 15:37:06 +0000 (11:37 -0400)]
rgw: remove BucketChangeObserver from data sync thread

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit f05db89637d280505321708683182f0f2c886208)

Conflicts:
src/rgw/rgw_data_sync.cc
src/rgw/rgw_data_sync.h
- argument lists are different in luminous, compared to master

6 years agorgw: add BucketChangeObserver to RGWDataChangesLog
Casey Bodley [Thu, 20 Sep 2018 15:34:42 +0000 (11:34 -0400)]
rgw: add BucketChangeObserver to RGWDataChangesLog

this means that BucketTrimManager will track active buckets based on
local changes, rather than changes in remote datalogs or error repos

Fixes: http://tracker.ceph.com/issues/36034
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit f3c258c49ff6899433e742b10554c83413d64a8a)

6 years agorgw: async sync_object and remove_object does not access coroutine memory 24397/head
Tianshan Qu [Mon, 10 Sep 2018 10:00:45 +0000 (18:00 +0800)]
rgw: async sync_object and remove_object does not access coroutine memory

Fixes: http://tracker.ceph.com/issues/35905
Signed-off-by: Tianshan Qu <tianshan@xsky.com>
(cherry picked from commit 2d38306e9333772a21ffdc9d92838e3b6b5c3148)

6 years agomon/OSDMonitor: Warn if missing expected_num_objects 24395/head
Douglas Fuller [Fri, 29 Jun 2018 17:55:31 +0000 (13:55 -0400)]
mon/OSDMonitor: Warn if missing expected_num_objects

When creating a pool on filestore, warn if the user appears to be
creating a pool to store a large number of objects but omitted the
expected_num_objects parameter. Create the pool anyway.

Fixes: http://tracker.ceph.com/issues/24687
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
(cherry picked from commit 69fb2293c4d38012e7c4781aaa39a47596125bbb)

6 years agomon/OSDMonitor: Warn when expected_num_objects will have no effect
Douglas Fuller [Thu, 28 Jun 2018 15:21:38 +0000 (11:21 -0400)]
mon/OSDMonitor: Warn when expected_num_objects will have no effect

The expected_num_objects argument to ceph osd pool create is
only effective on filestore pools when merging is disabled
(filestore_merge_threshold < 0). Warn and disallow pool creation
in this situation.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
(cherry picked from commit 4c108a50e5f74a56965d49687a8c817f4a5ce42b)

6 years agolibrbd: validate data pool for self-managed snapshot support 24390/head
Mykola Golub [Wed, 27 Jun 2018 14:18:24 +0000 (17:18 +0300)]
librbd: validate data pool for self-managed snapshot support

Fixes: https://tracker.ceph.com/issues/24675
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 08ea7d62ba6eedf614d72ff9d33f2e6a1c0b81fe)

Conflicts:
src/librbd/image/CreateRequest.cc
src/librbd/image/CreateRequest.h
- luminous uses m_ioctx where master has m_io_ctx
- luminous IoCtx does not have get_namespace/set_namespace
- in luminuos CreateRequest state machine is a little different than in
the master (see the diagram in CreateRequest.h). In luminous the next
state after VALIDATE_DATA_POOL is CREATE_ID_OBJECT, so call
create_id_object() instead of add_image_to_directory().

6 years agomon/MDSMonitor: no_reply on MMDSLoadTargets 24393/head
Sage Weil [Wed, 2 May 2018 19:48:31 +0000 (14:48 -0500)]
mon/MDSMonitor: no_reply on MMDSLoadTargets

If we don't note that we don't reply then we don't close out the routed
mon request and the op will appear as slow on the forwarding mon.

Fixes: http://tracker.ceph.com/issues/23769
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b462b59065424520170956581e72e16481b16f0a)

Conflicts:
src/mon/MDSMonitor.cc
- luminous has "ignore:" instead of "done:" in
MDSMonitor::preprocess_offload_targets() and this part of the backport
was already done unintentionally in
7bbc0a7b1670d99e42149fd3a25c24600314ca94

6 years agocls/rgw: don't assert in decode_list_index_key() 24391/head
Yehuda Sadeh [Wed, 6 Jun 2018 17:00:47 +0000 (10:00 -0700)]
cls/rgw: don't assert in decode_list_index_key()

Fixes: http://tracker.ceph.com/issues/24117
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit e71fba0c6f5b3b0572b5136840cf0ed3c9186569)

Conflicts:
src/cls/rgw/cls_rgw.cc
- use original non-C++17 version of escape_str() function in luminous

6 years agorgw: abort_bucket_multiparts() ignores individual NoSuchUpload errors 24389/head
Casey Bodley [Fri, 14 Sep 2018 18:56:23 +0000 (14:56 -0400)]
rgw: abort_bucket_multiparts() ignores individual NoSuchUpload errors

if the bucket index lists multipart meta objects that don't actually
exist in rados, this error prevents the bucket from being deleted

Fixes: http://tracker.ceph.com/issues/35986
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 764d6a8599eb5ea5a6382fea57e4b28f97e26d93)

Conflicts:
src/rgw/rgw_multi.cc : Resolved in abort_bucket_multiparts

6 years agomsg/async: clean up local buffers on dispatch 24387/head
Greg Farnum [Fri, 14 Sep 2018 17:58:49 +0000 (10:58 -0700)]
msg/async: clean up local buffers on dispatch

The AsyncConnection keeps local (member variable) bufferlists of incoming
messages before they're placed into the Message's front/data/middle buffers.
Previously these were reset only when a new Message is being received, which
means in steady state we store a full Message for every Connection even if
it's inactive!

Instead we obviously want to drop our local references to Message state
once it's been dispatched, so that it can go away.

Fixes: http://tracker.ceph.com/issues/35987
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
(cherry picked from commit 47ed036753223c44c7bf66c64d4a4adfe7267c0a)

6 years agoMerge pull request #24347 from pdvian/wip-35981-luminous
Yuri Weinstein [Tue, 2 Oct 2018 21:23:43 +0000 (14:23 -0700)]
Merge pull request #24347 from pdvian/wip-35981-luminous

luminous: ceph-disk: compatibility fix for python 3

Reviewed-by: Nathan Cutler <ncutler@suse.com>
6 years agoMerge pull request #24311 from batrick/i35838
Yuri Weinstein [Tue, 2 Oct 2018 21:02:26 +0000 (14:02 -0700)]
Merge pull request #24311 from batrick/i35838

luminous: mds: use monotonic clock for beacon message timekeeping

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24323 from smithfarm/wip-36133-luminous
Yuri Weinstein [Tue, 2 Oct 2018 21:01:40 +0000 (14:01 -0700)]
Merge pull request #24323 from smithfarm/wip-36133-luminous

luminous: client: update ctime when modifying file content

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24328 from smithfarm/wip-24912-luminous
Yuri Weinstein [Tue, 2 Oct 2018 21:01:14 +0000 (14:01 -0700)]
Merge pull request #24328 from smithfarm/wip-24912-luminous

luminous: qa: multifs requires 4 mds but gets only 2

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoqa: remove check using method from master 24086/head
Patrick Donnelly [Tue, 2 Oct 2018 21:01:06 +0000 (14:01 -0700)]
qa: remove check using method from master

Not essential we check this and it breaks tests in Luminous.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoMerge pull request #24329 from smithfarm/wip-32103-luminous
Yuri Weinstein [Tue, 2 Oct 2018 21:00:44 +0000 (14:00 -0700)]
Merge pull request #24329 from smithfarm/wip-32103-luminous

luminous: mds: allows client to create .. and . dirents

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoceph-volume util.disk when there are no devices mapped, skip to the next one 24382/head
Alfredo Deza [Tue, 2 Oct 2018 15:18:44 +0000 (11:18 -0400)]
ceph-volume util.disk when there are no devices mapped, skip to the next one

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 9b0f472abadde26fce2a603fca5c466ebb770d4a)

6 years agoceph-volume tests.util verify devices that don't exist don't break get_devices
Alfredo Deza [Tue, 2 Oct 2018 15:18:22 +0000 (11:18 -0400)]
ceph-volume tests.util verify devices that don't exist don't break get_devices

Signed-off-by: Alfredo Deza <adeza@redhat.com>
(cherry picked from commit 0176c1965e6ed532b38dfee03af0e9e3a85149d6)

6 years agoMerge pull request #24136 from gregsfortytwo/wip-luminous-make-check
Yuri Weinstein [Tue, 2 Oct 2018 20:09:14 +0000 (13:09 -0700)]
Merge pull request #24136 from gregsfortytwo/wip-luminous-make-check

luminous: build/ops: rpm: selinux-policy fixes

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
6 years agoMerge pull request #24342 from tchaikov/lumious-21769
Yuri Weinstein [Tue, 2 Oct 2018 20:08:01 +0000 (13:08 -0700)]
Merge pull request #24342 from tchaikov/lumious-21769

luminous: osd/ECBackend: don't get result code of subchunk-read overwritten

Reviewed-by: Nathan Cutler <ncutler@suse.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
6 years agoinclude: add utime_t ctor for ceph::coarse_real_time 24318/head
Casey Bodley [Tue, 2 Oct 2018 19:31:09 +0000 (15:31 -0400)]
include: add utime_t ctor for ceph::coarse_real_time

this change differs from the upstream commit 61fb24883e812c11016acea0654f6aef7ddab1f7
because it relied on the std::void_t template, which isn't available in
c++11. just add another explicit constructor for coarse_real_time as a workaround

Signed-off-by: Casey Bodley <cbodley@redhat.com>
6 years agorgw: data sync respects error_retry_time for backoff on error_repo
Casey Bodley [Tue, 14 Aug 2018 15:16:16 +0000 (11:16 -0400)]
rgw: data sync respects error_retry_time for backoff on error_repo

don't restart processing the error_repo until error_retry_time. when
data sync is otherwise idle, don't sleep past error_retry_time

Fixes: http://tracker.ceph.com/issues/26938
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit eb655323781ce4d23d6983aa5164d9dc367497e9)

Conflicts:
src/rgw/rgw_data_sync.cc

6 years agomds: prevent rctime from going back 24378/head
Yan, Zheng [Tue, 11 Sep 2018 02:52:47 +0000 (10:52 +0800)]
mds: prevent rctime from going back

Fixes: http://tracker.ceph.com/issues/35916
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
(cherry picked from commit 920ef964311a61fcc6c0d6671b77ffe98522863d)

Conflicts:
src/mds/Server.cc
- luminous does not increment or decrement pi.inode.rstat.rsnaps

6 years agoclient: set f_files to the total number of files in the filesystem 24376/head
Rishabh Dave [Mon, 30 Jul 2018 05:15:08 +0000 (05:15 +0000)]
client: set f_files to the total number of files in the filesystem

Fixes: http://tracker.ceph.com/issues/24849
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 39467a2c95815a495d75a9ced119975bfe62616c)

Conflicts:
src/client/Client.cc

6 years agocommon: adding missing ceph::coarse_real_clock helpers
Casey Bodley [Tue, 14 Aug 2018 15:12:48 +0000 (11:12 -0400)]
common: adding missing ceph::coarse_real_clock helpers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 11cd4254ff645a306442f88356e8ac3d493c9a3d)

6 years agorgw: data sync uses coarse clock for error_retry_time
Casey Bodley [Tue, 14 Aug 2018 15:11:22 +0000 (11:11 -0400)]
rgw: data sync uses coarse clock for error_retry_time

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 233ee9cf291194f3d8d291a5e4632612612b7731)

6 years agoqa: fix kcephfs/recovery suite 23877/head
Nathan Cutler [Tue, 2 Oct 2018 16:44:09 +0000 (18:44 +0200)]
qa: fix kcephfs/recovery suite

This is a luminous-only commit.

Signed-off-by: Nathan Cutler <ncutler@suse.com>
6 years agoMerge pull request #24242 from jonsger/luminous-backport-pr#23596
Yuri Weinstein [Tue, 2 Oct 2018 15:40:49 +0000 (08:40 -0700)]
Merge pull request #24242 from jonsger/luminous-backport-pr#23596

luminous: rgw: incremental data sync uses truncated flag to detect end of listing

Reviewed-by: Casey Bodley <cbodley@redhat.com>
6 years agoMerge pull request #24358 from alfredodeza/luminous-rm36249
Andrew Schoen [Tue, 2 Oct 2018 11:30:59 +0000 (06:30 -0500)]
Merge pull request #24358 from alfredodeza/luminous-rm36249

luminous ceph-volume:  activate option --auto-detect-objectstore respects --no-systemd

Reviewed-by: Andrew Schoen <aschoen@redhat.com>
6 years agorgw: fix chunked-encoding for chunks >1MiB 24361/head
Robin H. Johnson [Fri, 14 Sep 2018 21:23:49 +0000 (14:23 -0700)]
rgw: fix chunked-encoding for chunks >1MiB

For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: https://github.com/ceph/ceph/pull/23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b864482d6aef2efe0b03be70ea83c38f7a6d99b)

6 years agoqa: add qa helper methods from master
Patrick Donnelly [Fri, 28 Sep 2018 21:50:20 +0000 (14:50 -0700)]
qa: add qa helper methods from master

For Luminous. This is needed by tests in this branch.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
6 years agoqa: whitelist cap revoke warning
Patrick Donnelly [Sat, 25 Aug 2018 19:42:26 +0000 (12:42 -0700)]
qa: whitelist cap revoke warning

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 4367de377e68102f3c17c8dd85321c221d06d9dd)

6 years agodoc: document cap revoke non-responders client eviction
Venky Shankar [Mon, 6 Aug 2018 07:39:11 +0000 (03:39 -0400)]
doc: document cap revoke non-responders client eviction

Fixes: http://tracker.ceph.com/issues/25188
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 28a52d6fa14425fc877a69055dabe4e7c00f6b14)

6 years agotest: validate client eviction for cap revoke non-responders
Venky Shankar [Mon, 6 Aug 2018 03:37:41 +0000 (23:37 -0400)]
test: validate client eviction for cap revoke non-responders

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit c0b1dacc9f9c9b5af07a1b83a0adb53d001c2b79)

6 years agomds: add counter for tracking cap non-responding clients
Venky Shankar [Mon, 6 Aug 2018 07:20:35 +0000 (03:20 -0400)]
mds: add counter for tracking cap non-responding clients

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 8f2de92712a98568b0d07a795f1158868caae550)

Conflicts:
src/mds/Server.cc
src/mds/Server.h

6 years agomds: evict clients that do not respond to cap revoke by MDS
Venky Shankar [Mon, 6 Aug 2018 03:37:18 +0000 (23:37 -0400)]
mds: evict clients that do not respond to cap revoke by MDS

By default, preserve old behaviour. When configured with a non
default value, evict clients that have not responded to cap
revoke by MDS for the configured amount of seconds.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 4cf7815cdcd8efbbb981ef45b3eabee387b4de21)

Conflicts:
src/common/options.cc
src/mds/MDSDaemon.cc
src/mds/MDSRank.h
src/mds/Server.cc
src/mds/Server.h