Patrick Donnelly [Mon, 28 Jan 2019 23:48:38 +0000 (15:48 -0800)]
mds: simplify recall warnings
Instead of a timeout and complicated decisions about whether the client is
releasing caps in an expeditious fashion, just use a DecayCounter that tracks
the number of caps we've recalled. This counter is decremented whenever the
client releases caps. If the counter passes a threshold, then we raise the
warning.
Similar reworking is done for the steady-state recall of client caps. Another
release DecayCounter is added so we can tell when the client is not releasing
any more caps.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 24 Jan 2019 22:23:08 +0000 (14:23 -0800)]
mds: limit maximum number of caps held by session
This is to prevent unsustainable situations where a client has so many
outstanding caps that a linear traversal/operation on the session's caps takes
unacceptable amounts of time.
Fixes: http://tracker.ceph.com/issues/38022 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 19 Jan 2019 00:18:59 +0000 (16:18 -0800)]
mds: add throttle for trimming MDCache
This is necessary when the MDS cache size decreases by a significant amount.
For example, when stopping a large MDS or when the operator makes a large cache
size reduction.
Fixes: http://tracker.ceph.com/issues/37723 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Tue, 29 Jan 2019 09:18:21 +0000 (17:18 +0800)]
osd/HitSet: mark copy ctor of HitSet::Params noexcept
to be returned using seastar::future<...> the value type should satisfy
std::is_nothrow_constructible<T>.
with this change, pg_pool_t will be nothrow_constructible. and hence
can be returned using seastar::future<pg_pool_t>. otherwise
std::is_nothrow_constructible<pg_pool_t>::value would be false.
get_all_versions() is documented as a lower-level api that doesn't
handle paging, ands suggests list_versions() instead. also caches the
results to avoid listing each bucket twice
Sage Weil [Mon, 28 Jan 2019 11:50:59 +0000 (05:50 -0600)]
Merge PR #24546 into master
* refs/pull/24546/head:
msg/async/ProtocolV2: clear dispatch throttle on connection stop
msg/async: fix should_use_msgr2 behavior (including monc)
msg/async/AsyncMessenger: clear need_addr *after* we set our new addr
msg/async/ProtocolV2: fix handling for v2 client connection with v1 addr
ceph_test_msgr: do not connect_to on the client side
msg/async: do not connect from server
msg/async: do not use peer to addr detection; use getsockname()
msg/async/ProtocolV2: always send non-empty addrvec for self
msg/async: never fill out port in myaddr if we didn't bind
ceph_test_msgr: use v2 addrs for simplemessenger
msg/async: msgr2: don't force write event on every message received
msg/async/ProtocolV2: be forgiving in server identity check
msg/async/ProtocolV2: fault if we connect to the wrong peer
msg/async: msgr2: clean cookie if connection failed in ACCEPT_SESSION
msg/async/ProtocolV2: do not bump connect_seq for fault during ACCEPTING_SESSION
msg/async: msgr2: don't send SESSION_RETRY_GLOBAL in handle_existing_connection
msg/async: msgr2: organizing log messages
msg/async: msgr2: fix connection fault when replacing
msg/async: msgr2: fix replacing race handling
msg/async: msgr2: fix connection race when existing connection is newer
msg/async: msgr2: assign recv_stamp in handle_message
msg/async: msgr2: fix peer_addrs discovery
msg/async: msgr2: keep authorizer bufferlist across reconnects
msg/async: msgr2: fix connection secret problems for WITH_SEASTAR builds
msg/async: msgr2: send keepalive on connection race winner
msg/async: msgr2: fix client address learning
msg/async: msgr2: fix keepalive_ack message
msg/async: msgr2: do not force updating rotating keys inline
msg/async: msgr2: fix mark_down vs accept race
msg/async: msgr2: unregister con from accept vs mark_down race
auth/cephx/CephxSessionHandler: use connection_secret for encryption
msg,cephx: establish a unique connection_secret for every connection
msg/async: msgr2: use sha256_digest_t to print signature hex strings
types.h,rgw: merge sha*_digest_t definitions
msg/async: msgr2: close connection when no authorizer is given
msg/async: msgr2: formatting fixes
msg/async: msgr2: send client v2 address when only v1 address is defined
msg/async: msgr2: add payload length to banner
msg/async: msgr2: check protocol state after fast dispatch
msg/async: msgr2: reduce log level for sending messages event
msg/async: msgr2: call verify authorizer when CEPH_AUTH_NONE is used
msg/async: msgr2: store peer entity name in the protocol
msg/async: msgr2: apply sign/encrypt to messages data payload
msg/async: msgr2: encryption/decryption of frames
cephx: added encrypt/decrypt bufferlist method to session handler
msg/async: msgr2: refactored the frame structures
cephx: add sign bufferlist method
options: msgr2 enable/disable signing and encrytion options
msg/async: msgr2: cephx authentication
msg/async: msgr2: implement reconnect
msg/async: msgr2: fault handling
msg/async: msgr2: messange exchange phase
msg/async: msgr2: message flow handshake
msg/async: msgr2: authentication phase
msg/async: msgr2: exchange peer_type in banner phase
test/msgr: cloned test_msgr test for testing msgr2 protocol
msg/async: msgr2: banner exchange
msg/async: asyncconnection: update the source address info
msg/async: move base class Protocol its own source file
Sage Weil [Sat, 26 Jan 2019 23:01:14 +0000 (17:01 -0600)]
msg/async/AsyncMessenger: clear need_addr *after* we set our new addr
We check need_addr at the top without a lock held, so we need to be sure
we finished our work before we clear it, or else when there are two racing
threads the first will get the clock and clear the value and the second
will do nothing and see the unlearned value before the first finishes.
Sage Weil [Sat, 26 Jan 2019 07:28:13 +0000 (01:28 -0600)]
msg/async/ProtocolV2: fix handling for v2 client connection with v1 addr
Switch it to be v2. Reject the case where the client sends and addrvec, though;
that should only happen for clients that did_bind, and they should only connect to
v2 if they have a v2 bound addr.
Sage Weil [Fri, 25 Jan 2019 21:21:45 +0000 (15:21 -0600)]
msg/async: do not connect from server
We could have a fault on a server-side of a non-lossy connectoin where
there is a fault and we have outgoing data queued. Since we are a server,
we cannot connect; we should just go into standby and wait for the other
end to reconnect, or for someone to mark us down.
This fixes a failure reproduced by Messenger/MessengerTest.SyntheticInjectTest/0
where it would assert(!policy.server) in the connect code.
Sage Weil [Fri, 25 Jan 2019 21:14:56 +0000 (15:14 -0600)]
msg/async: do not use peer to addr detection; use getsockname()
If of relying on the peer to tell us what address we are connecting from,
look at how our local socket is bound, and use that address.
This removes the possibility for error because we will infer our address
locally and that will be the one place it is decide; the server will just
use our value. As things were previously, we had to make the local and
remote inference match, which was fragile.
This does take away the client's ability to discover if it is traversing
NAT to reach the server and learning its public/external address. I
don't think anybody has ever tested this, so it probably didn't even work,
and I've never heard it come up as a requirement.
tl;dr: this change addresses the failures of "make check" runs on arm64
builders when they try to build `mgr-dashboard-test-venv` target.
long story: without this change, we will fail to pull in
setuptools >= 36, and as a result pip will fail to import
`setuptools.build_meta` in `pip/_vendor/pep517/_in_process.py`. and will
a `BackendUnavailable` exception thrown by `_call_hook()` in
`pip/_vendor/pep517/wrappers.py`. since the issue addressed by 30ce5e55
has been addressed since setuptools >= 36.0.1, we should be safe to
upgrade to the latest setuptools now.
Sage Weil [Fri, 25 Jan 2019 20:14:55 +0000 (14:14 -0600)]
msg/async/ProtocolV2: always send non-empty addrvec for self
If we don't know our address yet, send the peer a 0.0.0.0 or :: address with an empty
port and a populated nonce. That way the peer can infer our final addr the same way
we do from learned_addr.
Sage Weil [Fri, 25 Jan 2019 20:12:27 +0000 (14:12 -0600)]
msg/async: never fill out port in myaddr if we didn't bind
If we are a client and didn't bind, then we should not fill in the port for our
address. The one the peer sent us is just the random port our outgoing connection
happened to land on!
Patrick Donnelly [Fri, 25 Jan 2019 18:52:47 +0000 (10:52 -0800)]
Merge PR #25254 into master
* refs/pull/25254/head:
mds: optimize resuming stale caps
client: avoid unnecessary wakeup when handling RENEWCAPS
client: set cap->wanted when adding new cap
client: don't wakeup cap waiters twice when mds recovered
mds: optimize revoking stale caps
mds: put notable caps at the front of session's caps list
mds: track if client has writeable range in Capability
mds: add session pointer to Capability
client: skip updating 'wanted' caps if caps are already issued
client: sync 'retain caps' logical from kernel client
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Yan, Zheng [Mon, 10 Dec 2018 03:37:32 +0000 (11:37 +0800)]
mds: optimize resuming stale caps
If client doesn't want any cap, there is no need to re-issue stale
caps.
A special case is that client wants some caps, but skipped updating
'wanted'. For this case, client needs to update 'wanted' when stale
session get renewed.