root [Mon, 28 Dec 2020 12:41:15 +0000 (18:11 +0530)]
qa: Kafka task files for bucket notification tests
This commit consists of 3 things:
1. Files required for setting up new directory (in order to run the task in teuthology)
2. Kafka task file
3. The new files conataining tests and it's infrastructure for separation of bucket notification tests from pubsub tests
Sage Weil [Wed, 3 Mar 2021 13:38:20 +0000 (08:38 -0500)]
Merge PR #39654 into master
* refs/pull/39654/head:
common/options: drop ms_async_max_op_threads
msg/async: drop Stack::num_workers
msg/async: s/num_workers/workers.size()/
msg/async: use range-based loop in NetworkStack
msg/async: do not pass worker id to Stack::spawn_worker()
async/Stack: pass Worker* to NetworkStack::add_thread()
async/rdma: do not reference worker id in RDMAStack::spawn_worker()
async/dpdk: do not use worker id when creating worker
async/PosixStack: do not reference worker id in ctor
async/rdma: initialize worker in RDMAStack::create_worker()
async/rdma: move RDMAStack::create_worker() to .cc
Reviewed-by: luo runbing <luo.runbing@zte.com.cn> Reviewed-by: Haomai Wang <haomai@xsky.com>
Kefu Chai [Wed, 3 Mar 2021 03:39:36 +0000 (11:39 +0800)]
crimson/mon: keep a copy of sent MMonCommand messages
as per Yingxin Cheng,
> The send process can be asynchronous (there is a conn.out_q, or if the
> underlying socket lives in a different core in the m:n model to be
> ordered there). If user really wants to reuse a message, they must be
> careful not to modify it because it may result in modifing the pending
> messages.
>
> I think the best way is to copy the message if user want to resend it,
> and keep the ceph_assert(!msg->get_seq()). It may looks good to reuse a
> message under lossy policy, but the correctness is now up to user not to
> modify it inplace.
Zac Dover [Tue, 2 Mar 2021 18:16:27 +0000 (04:16 +1000)]
doc/cephadm: add prompts to adoption.rst
This PR formats the bash prompts. It also formats the
bash output so that it appears in the correct (easily
copy-and-pasteable) format. This PR will be followed by
a grammar-improving PR, but this PR is just a
formatting PR.
Sage Weil [Tue, 2 Mar 2021 16:32:12 +0000 (11:32 -0500)]
Merge PR #39739 into master
* refs/pull/39739/head:
cephadm: set CEPH_USE_RANDOM_NONCE if using --init
msg/Messenger: use random nonce if CEPH_USE_RANDOM_NONCE or pid == 1
Revert "Merge PR #39482 into master"
Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <swagner@suse.com>
Merge pull request #38915 from Daniel-Pivonka/clientsoktostop
mgr/cephadm: add ok-to-stop functions for ceph client services
Reviewed-by: Juan Miguel Olmo MartÃnez <jolmomar@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
crimson/monc: close() active_con before destructing it on resets.
`ProtocolV2` expects `AuthClient` implementations to withstand
calling `get_auth_request()` and `handle_auth_reply_more()` even
if `handle_auth_done()` had been already called. This is because
a network fault may happen on e.g. `AuthSignatureFrame` which is
put on the wire after the `AuthDone` handling.
`crimson::mon::Client` deals with that by returning `auth::error`
from both `get_auth_request()` and `handle_auth_reply_more()` as
the preceding invocation of `handle_auth_done()` had already
cleared `pending_conns` (and set `active_con`). This leads to
`abort_in_close()` and finally to dispatching `ms_handle_reset()`
on `mon::Client` which is fine in general but, when comes to the
current implementation, it destroys `active_con` without closing
it first.
One of the consequence is breaking the `mon::Connection::reply`
promise; another one is missed `mark_down()` call.
```
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] GOT AuthDoneFrame: gid=4121, con_mode=se
cure, payload_len=995
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] WRITE AuthSignatureFrame: signature=60ca f49e5a6cf3cc39c4160cb9d09032db5f794e29655dc0124cf5f42b7546fb
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - authenticated_encrypt_update plaintext.length()=80 buffer.length()=80
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - authenticated_encrypt_final buffer.length()=96 final_len=0
INFO 2021-03-01 18:10:50,489 [shard 0] monc - found mon.noname-a
INFO 2021-03-01 18:10:50,489 [shard 0] monc - sending auth(proto 2 2 bytes epoch 0) v1
INFO 2021-03-01 18:10:50,489 [shard 0] monc - waiting
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] GOT AuthSignatureFrame: signature=ea04f1 318cf76808414a853ed37fd232ae886bef036cb4248079c6cba89d669a
DEBUG 2021-03-01 18:10:50,490 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] WRITE ClientIdentFrame: addrs=v2:172.21.
15.110:6800/33954, target=v2:172.21.15.110:3300/0, gid=0, gs=1, features_supported=4540138303579357183, features_required=576460752303432193, flags=1, cookie=9231904580 14536120
...
INFO 2021-03-01 18:10:50,490 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): fault at CONNECTIN
G, going to WAIT -- std::system_error (error crimson::net:4, read eof)
...
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@0 >> mon.? v2:172.21.15.110:3300/0] GOT HelloFrame: my_type=mon peer_addr=v2:172.21.15.110:63960/0
INFO 2021-03-01 18:10:50,690 [shard 0] monc - get_auth_request(con=[osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0], auth_method=0)
ERROR 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] get_initial_auth_request returned crimson::auth::error (unknown connection)
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] closing: reset yes, replace no
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CLOSING, was CONNECTING
...
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] write_event: dropped
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] closing: reset yes, replace no
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CLOSING, was CONNECTING
WARN 2021-03-01 18:10:50,690 [shard 0] osd - ms_handle_reset
WARN 2021-03-01 18:10:50,690 [shard 0] monc - active conn reset v2:172.21.15.110:3300/0
INFO 2021-03-01 18:10:50,690 [shard 0] monc - reopen_session to mon.-1
WARN 2021-03-01 18:10:50,690 [shard 0] monc - mon.0 does not have an addr compatible with me
INFO 2021-03-01 18:10:50,690 [shard 0] monc - connecting to mon.1
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] ProtocolV2::start_connect(): peer_addr=v2:172.21.15.110:3300/0, peer_name=mon.?, cc=14512795460730278364 policy(lossy=true, server=false, standby=false, resetcheck=false)
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CONNECTING, was NONE
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] UPDATE: gs=3 for connect
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] write_event: delay ...
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] write_event: dropped
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
...
"
WARN 2021-03-01 18:10:50,690 [shard 0] seastar - Exceptional future ignored: seastar::broken_promise (broken promise), backtrace: 0x146f364
0x146f6e1
0x146fb01
0x135c2fe
0x135c481
0x6ee079
0x137db87
0x137def2
0x13ab085
0x1347b27
0x6619f5
/lib64/libc.so.6+0x237b2
0x6b217d
--------
N7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZ4mainENKUlvE_clEvEUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SU_EENSA_ISD_E4typeEOT0_EUlOS3_RSU_ONS_12future_stateINS1_9monostateEEEE_vEE
This commit changes the RGWStoreManager to return a RGWStore* rather
than a RGWRadosStore*. This is the thread that unravels the rest of the
Zipper work, removing hard-coded uses of the RGWRados* classes.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Kefu Chai [Tue, 2 Mar 2021 09:52:36 +0000 (17:52 +0800)]
crimson/mon: resend mon command when session established
this behavior matches that of `MonClient::_resend_mon_commands()`. so
far the only user which sends mon command in crimson is
`OSD::_add_me_to_crush()`, but there is still (rare) chance that the connected
monitor cannot be reached when we send the command to it, in that case,
we should retry when the connection is re-established.
Kefu Chai [Tue, 2 Mar 2021 08:18:20 +0000 (16:18 +0800)]
crimson/mon: pending_messages should not be empty if active_conn
we always send all pending_messages, and clear it when establishing a
connection to mon, so there is no need to check for it when calling
`send_message()`.
Kefu Chai [Tue, 2 Mar 2021 08:07:25 +0000 (16:07 +0800)]
crimson/mon: check for active_con before calling send_pendings()
before this change, we guard the `send_pendings()` call only in
`Client::send_message()`, after this change, all of the
`send_pendings()` calls are guarded with this check.
crimson/monc: renew subscriptions when reopening a session.
Lack of this feature was the root cause of an issue in
teuthology testing in which a socket failure injection
happened exactly during `mon_subscribe`; after the OSD
reconnected, the message hasn't been resent and entire
boot process has frozen.
```
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] --> #6 === mon_subscribe({osdmap=1}) v3
(15)
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=80 buffer.length()=80
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=96 final_len=0
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=48 buffer.length()=48
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=16 buffer.length()=64
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=80 final_len=0
INFO 2021-02-25 11:42:53,758 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] execute_ready(): fault at READY on lossy
channel, going to CLOSING -- std::system_error (error crimson::net:4, read eof)
```
Zac Dover [Sun, 28 Feb 2021 12:13:39 +0000 (22:13 +1000)]
doc/cephadm: rewrite "install cephadm"
This PR breaks the "Deploying a New Ceph Cluster"
section into several sub-sections, so that each sub-section
pertains to only one subject. I've also added some explanatory
text that puts the instructions into context more than they were
before.
Zac Dover [Mon, 1 Mar 2021 14:01:05 +0000 (00:01 +1000)]
doc/cephadm: rewrite "b.strap a new cluster"
This PR rewrites the section "Bootstrap A New
Cluster" in the Cephadm Guide, in the Install
Chapter. I've broken this section up into what
seem to me to be the topics that the content
naturally divides into.
Sage Weil [Sat, 27 Feb 2021 20:45:47 +0000 (15:45 -0500)]
msg/Messenger: use random nonce if CEPH_USE_RANDOM_NONCE or pid == 1
If we are in a container, then we do not have a unique pid, and need to
use a random nonce. We normally detect this if our pid is 1, but that
doesn't work when we have a init process--we'll (probably?) have a small
pid (in my tests, the OSDs were getting pid 7).
To be safe, also check for an environment variable set by cephadm.
This avoids problems that arise when we don't have a unique address.
Fixes: https://tracker.ceph.com/issues/49534 Signed-off-by: Sage Weil <sage@newdream.net>
crimson/monc: drop misleading comment about Connection::reply.
Before the commit the `crimson::mon:Connection::reply`
was commented as being specific to the ProtocolV1.
However, the code suggests this member participates also
in V2-related paths like `Connection::renew_tickets()`
where `do_auth()` is called from. `do_auth()` generates
`MAuth` which causes a monitor to send `MAuthReply`.
The `Connection::reply` synchronizes sending the auth
request with response handling.
At the moment crimson offers only a partial and buggy
support for ProtocolV1. When it enters the production,
V1 will be long obsolete, so spending time on improving
it doesn't seem to be a sound investment while offering
half-baked feature can be worse that lacking it entirely.