Changcheng Liu [Mon, 26 Aug 2019 08:57:28 +0000 (16:57 +0800)]
msg/async/rdma: change rdma_event_channel to be non blocked
rdma_event_channel is blocked by default, if there's no event
in the event channel, rdma_get_cm_event could be blocked forever.
This is not "asynchronous" messenger.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
1. log every asynchronous type event
2. Deal with IBV_EVENT_QP_LAST_WQE_REACHED log
The QueuePair is switched into IBV_QPS_ERR before posting
Beacon WR. For SRQ, all the SQ/WRs on that QP will be flushed
into CQ and result in IBV_EVENT_QP_LAST_WQE_REACHED.
The above scenario is what we want, it needn't take it as error
with lderr logging.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 10:57:35 +0000 (18:57 +0800)]
msg/async/rdma: refine handle_rx_handle log under WCE failure case
1. ibv_wc:status IBV_WC_SUCCESS
keep same logic
2. ibv_wc:status IBV_WC_WR_FLUSH_ERR
1) After Beacon is posted into SQ, all the outstanding RQ/WR will
be flushed into CQ with IBV_WC_WR_FLUSH_ERR status. This is right
without special logging.
2) For the other case that trigger IBV_WC_WR_FLUSH_ERR, it need track
more info such as remote QueuePair number and local QP state.
3. ibv_wc:status others
same logic with tracking more info into log
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 10:45:21 +0000 (18:45 +0800)]
msg/async/rdma: refine handle_tx_handle log under WCE fail case
1. ibv_wc:status IBV_WC_RETRY_EXC_ERR
Logging possible reasons:
1) Responder ACK timeout
2) Responder QueuePair in bad status
3) Disconnected
2. ibv_wc:status IBV_WC_WR_FLUSH_ERR
1) After switch QP into error state, all the outstanding SQ/WRs
will be flushed into CQ with IBV_WC_WR_FLUSH_ERR status. This is
right without special logging.
2) For the other case that trigger IBV_WC_WR_FLUSH_ERR, it need track
more info such as remote QueuePair number and local QP state.
3. ibv_wc:status others
same logic with tracking more info into log
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 09:17:28 +0000 (17:17 +0800)]
msg/async/rdma: get local/peer qpn from RDMAConnectedSocketImpl
When remote QP is destroyed, QP will be disconnected. The local QP
is transitioned into error state. Then some asynchronous event or
completion event could be triggered. Need to get the qpn through
RDMAConnectedSocketImpl object.
Add get_local/peer_qpn to get qpn from RDMAConnectedSocketImpl class.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Roman Penyaev [Tue, 27 Aug 2019 09:12:01 +0000 (17:12 +0800)]
msg/async/rdma: remove redundant code
1. remove aync_handler
1). async_handler is never scheduled (which should be scheduled by
center->dispatch_event_external).
2). async_hander wrapper handle_async_event, which will be called
in RDMADispatcher::polling.
So, all async_handler related code are removed.
2. fault won't run to_dead, so removed the commented code
Roman Penyaev [Fri, 30 Aug 2019 06:59:02 +0000 (14:59 +0800)]
msg/async/rdma: no need to audit inflight SQ WQEs
Beacon is used to detect SQ WQEs drained. There's no need to
to use tx_wr_inflight to check whether SQ WQEs has been drained
before destroying the QueuePair.
Roman Penyaev [Tue, 27 Aug 2019 08:44:27 +0000 (16:44 +0800)]
msg/async/rdma: use special Beacon to detect SQ WRs drained
switch the QueuePair to error state, then post Beacon WR to
send queue. All outstanding WQEs will be flushed to CQ.
In CQ, check the completion queue element to detect SQ WRs has
been drained before destroying the QueuePair.
We don't post another Beacon WR to RQ if SRQ is not used/supported,
the reason is that QueuePair could be destroyed only under all
flushed WRs have been polled from CQ.
Refer to page 474 on below spec:
InfiniBandTM Architecture Specification Volume 1, Release 1.3 link: https://cw.infinibandta.org/document/dl/7859 Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Changcheng Liu [Tue, 20 Aug 2019 10:02:43 +0000 (18:02 +0800)]
msg/async/rdma: use QueuePair as args to get more info
ibv_qp is member in class QueuePair. QueuePair has other fields
which is needed in post_chunks_to_rq to be further checked for
different hardware feature e.g. SRQ/iWARP/RoCE
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 07:39:26 +0000 (15:39 +0800)]
msg/async/rdma: refactor QP state switch & ib_cm_meta_t transaction
1. Implement below 3 function in class QueuePair to switch QP state
1) int modify_qp_to_error(void);
2) int modify_qp_to_rts(void);
3) int modify_qp_to_rtr(void);
3. All connection meta data are member of class QueuePair.
So, send/recv connection meta data directly in send/recv_cm_meta i.e.
change send/recv_cm_meta API without using parameter cm_meta_data.
4. RDMAConnectedSocketImpl need members to track peer_qpn and local_qpn.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 05:36:29 +0000 (13:36 +0800)]
msg/async/rdma: implement connection management data in QueuePair
1. It's better to use QueuePair to management connection management
meta data.
2. This patch is to prepare to use seperate function to modify QP to
RTR & RTS state.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 03:48:52 +0000 (11:48 +0800)]
msg/async/rdma: implement send/recv in QueuePair
send/recv is used to transact connection management meta data.
QueuePair is the obj which has the meta data. Use QueuePair to
transact the meta data.
1. rename send_msg to send_cm_meta
2. rename recv_msg to recv_cm_meta
3. move send/recv_cm_meta to QueuePair class scope
4. change code to adapt to the above change
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 03:00:16 +0000 (11:00 +0800)]
msg/async/rdma: implement modify_qp_to_init to init qp
For IB/RoCE, the QP need go through "reset->init->ready to receive
->ready to send" for normal operation under most cases.
This patch implement seperate function to initialize qp.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Tue, 27 Aug 2019 02:38:15 +0000 (10:38 +0800)]
msg/async/rdma: rename type name IBSYNMsg to ib_cm_meta_t
IBSYNMsg is responsible for track ib connection management meta
data.
1. rename IBSYNMsg to be ib_cm_meta_t.
2. rename IBSYNMsg::qpn to be IBSYNMsg::local_qpn
3. rename peer_msg to peer_cm_meta & my_msg to local_cm_meta
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Yan Jun [Mon, 26 Aug 2019 10:34:50 +0000 (18:34 +0800)]
ec/jerasure: save the default m=2 of reed_sol_r6_op in profile
save the default value of m to profile so that it could be dumped
in command 'ceph osd erasure-code-profile get xxx', which is more
useful and friendly to user.
Currently always-on modules are not marked as enabled in the WebUI and can be disabled. This PR will fix that.
Note, this PR will NOT implement code that will prevent a developer from trying to disable an always-on module through the REST API. The Mgr Python extension will throw an adequate exception.
This PR will also do:
* Remove old code fragments from a previous Mgr Module management UI that is obsolete now.
* Cleanup code in BaseMgrModule code.
osd/MissingLoc.cc: do not rely on missing_loc_sources only
In 624ade487ea4aeaf988cc1767e0b293f76addd5b, we relied on missing_loc_sources
to check for strays and remove an OSD from missing_loc. However, it is
possible that missing_loc_sources is empty while there are still OSDs
present in missing_loc. Since the aim is to just remove a stray OSD from
missing_loc, we do not need to rely on missing_loc_sources. We still
clean missing_loc_sources if any stray is present in it.
install-deps.sh: install `python*-devel` for python*rpm-macros
in 087ea813, we installed '*rpm-macros' for the macros, so we can have
access to the latest python packaging related macros for preparing the
build dependencies.
but we could run into https://bugs.centos.org/view.php?id=16379, if
we already have an old version of python-devel installed. as the newer
version of python-rpm-macros conflicts with it.
it was a chicken-and-egg problem, as we don't know the exact name of
*rpm-macros packages. that's why we chose to install all of them. but
we have to upgrade the existing python-devel package to resolve the
conflict. but the since there is no python3-devel in RHEL7/CentOS7,
what they have is python36-devel. so we have to hardwire the
`%{python3_pkgversion}` to "36" even before we have access to this
macro, and upgrade the python36-devel package beforehand. but this
renders installing the rpm-macro package less useful -- we intend to
use the macro offered by the package to figure out "36".
as a workaround, we pretend that we know the "main" version of python3
in current RHEL/CentOS. and always install python36-devel for
python-rpm-macros. as the former requires the latter.
once all python3*-devel on all builders are upgraded, we will be safe
to install '*rpm-macros' again without installing python36-devel first.
by then, we could revert this change, or continue installing
python36-devel until the distro bumps up the "main" python version to 3.7
Sage Weil [Fri, 6 Sep 2019 02:24:38 +0000 (21:24 -0500)]
qa/tasks/ceph: restart: stop osd, mark down, then start
If we stop, start, and then mark down, we may (likely) end up marking
the *new* instance down, which is noisy (generates a cluster warning
message) and inefficient.
so we can reuse run-make.sh for building the artifact used by other
tests than "make check", for instance, dashboard's E2E test and
crimson's performance test.
immutable_object_cache: Endian fix for HeaderHelper
struct HeaderHelper is used to overlay buffer list data, which
was generated via encode, and therefore holds integer data in
fixed little-endian format.
Thus, HeaderHelper needs to use ceph_le32 instead of uint32_t
when accessing the len field.
struct PGTempMap in osd/OSDMap.h tracks a number of int32_t pointers
pointing into a buffer list. But that list was generated via encode,
which means int members are bytes-swapped. Fixed by using ceph_le32
pointers instead.
Use ceph_le16/32/64 instead of __le16/32/64 (which are no-op outside
of kernel code).
Note that this updates only those uses of __le16/32/64 which are
part of data structures that are serialized to disk/network
(i.e. Transaction::Op and Transaction::TransactionData).
Also note that code in this file performs combined operations on
little-endian values (in particular ++, +=, and |=) which are not
supported on the ceph_le16/32/64 class, and are therefore replaced
by more primitive operations.
Use ceph_le16/32/64 instead of __le16/32/64 (which are no-op outside
of kernel code).
Note that I've also changes cephx_calc_client_server_challenge to
use ceph_le64 instead of manually byte-swapping by mswab. (This
is a no-op, but it seems more consistent to use the ceph_le types
throughout.)
Use ceph_le16/32/64 instead of __le16/32/64 (which are no-op outside
of kernel code).
Note that I've changed the Alg::calc routines to return the
init_value_t type instead of value_t, to avoid having to
introduce explicit byte-swapping calls to init_le16/32/64 in
many places. (This way, the byte-swapping happens implicitly
at the very end, wher the init_value_t value is assigned via
a value_t pointer.)
include: Fix new-style encoding routines on big-endian
The new-style encoding routines (denc.h) are broken on big-endian
systems. While there is a lot of infrastucture in place to
recognize data types that need to be byte-swapped during encoding
and decoding on big-endian systems, nothing is actually ever swapped.
Fixed by using ceph_le16/32/64 instead of __le16/32/64 (which are
no-ops outside of kernel code).
include: Simplify usage of init_le16/32/64 routines
These routines currently just return plain __u16/32/64. This patch
changes them to return ceph_le16/32/64 types instead. This has a
number of benefits, in particular it allows the routines to now be
used to directly initialize variables of ceph_le16/32/64 type, as
one would expect from the names of those routines.
This doesn't make much of a difference in the current code base,
but it simplifies future patches to fix endian issues.
include: Endian fix for shared kernel/user headers
Endian swapping is done differently in kernel space vs. user space,
but a few header files are shared between those two user cases.
Current code attempts to handle this by re-defining __le16/32/64
before pulling those headers into user space, but this is not
consistenly done: when ceph_fs.h is included via types.h, the
redefinition happens, but when ceph_fs.h is directly included,
the redefinition does not happen.
Fix this by performing the same redefinition directly *in* those
shared headers, when included by user space.
Note that the redefines where also in effect for rbd_types.h,
which is *not* shared with the kernel, so in the file I'm simply
replacing __le16/32/64 with ceph_le16/32/64 in-line.
Also note that it is now no longer possible to include any of
the three files using the redefined macros in C code (as they
make use of C++ features). However, this currently happens in
exactly one file, src/mds/locks.c, which only uses a few CEPH_CAP_...
constants from the header. To fix this, I've simply duplicated
those definitions, which are unchangeable ABI constants anyway.