Marcus Watts [Wed, 3 Feb 2021 19:26:46 +0000 (14:26 -0500)]
rgw/kms/kmip - document configuration for a new feature: kmip kms
I've written up a brief description of using kmip
with ceph. Major features:
* ceph configuration.
* making keys with a "paste-in" python script.
* pointers to PyKMIP and IBM SKLM.
Marcus Watts [Thu, 12 Nov 2020 03:38:18 +0000 (22:38 -0500)]
rgw/kms/kmip - rgw / kmip test integration.
s3tests needs to know key names in order to run kms tests.
It seems desirable to have s3tests default to discovering
the names that were created by the pykmip task, and that
if there is more than one rgw connected to more than one
pykmip, that names belonging to the appropriate pykmip
instance should be used.
This logic does the following:
rgw task: save pykmip role name.
s3tests task: set kms_key (and kms_keyid2) to
these in order of priority
1 s3tests client task property ['kms_key'] (or ['kms_key2'])
2 first (second) secret created in the matching pykmip instance.
3 testkey-1 (testkey-2)
For case 2, names from the secrets have an initial "token-" stripped from them.
The assumption here is that rgw is being run with a setting such as
rgw crypt kmip kms key template: pykmip-$keyid
therefore "pykmip-" will be prefixed back onto the key before use.
Marcus Watts [Thu, 29 Oct 2020 16:04:36 +0000 (12:04 -0400)]
rgw/kms/kmip - correct documentation.
The pykmip task should be after ceph, and before rgw.
kmip needs ssl certs in order to function correctly.
Because the openssl_keys task has an indeterminate
order of execution, it is best to create the ca as
a separate task. The ca can be shared with rgw, but
real life deployments of kmip are likely to have their
own CA.
In order to create kmip secrets, a client certificate
is necessary, so must be supplied to the pykmip task.
Marcus Watts [Thu, 29 Oct 2020 03:40:58 +0000 (23:40 -0400)]
rgw/kms/kmip - pykmip.py needs to make keys too.
The logic to deploy pykmip in teuthology was not complete.
The necessary logic to add kmip keys was missing.
Existing logic for other key services providers could use rest based
protocols directly from the teuthology host. For kmip, it is necessary
to use a special protocol, and it is more convenient to run this directly
on the pykmip server.
Marcus Watts [Tue, 27 Oct 2020 21:16:14 +0000 (17:16 -0400)]
rgw/kms/kmip - pykmip.py should actually run pykmip.
The logic to deploy pykmip in teuthology was not complete.
While it deployed all the code and certs to run pykmip,
it didn't actually run it. This commit fixes that.
Marcus Watts [Fri, 23 Oct 2020 23:07:09 +0000 (19:07 -0400)]
rgw/kms/kmip - python3 changes for testing.
python3 requires different imports and there's a different
way to get at the first element in a view.
This is to match changes introduced in the rest of ceph in these
commits: 24e7acc261a4d7258ea7fdcd
Marcus Watts [Sun, 16 Feb 2020 02:08:29 +0000 (21:08 -0500)]
kmip: first pass at implementation logic.
This implements SSE-KMS for the radosgw using kmip.
This uses symmetric raw keys with a name attribute in kmip,
so providing the same functionality as the "kv" key store
in hashicorp vault.
Nathan Cutler [Tue, 2 Mar 2021 16:00:53 +0000 (17:00 +0100)]
rpm: limit build jobs by system memory on SUSE
43b441f9a3bc907c17d52385251001ffcd5d3ff9 removed a bunch of code which the SUSE
builds were relying on to avoid OOM. This commit brings back that code in
a much-streamlined form: the SUSE-specific %limit_build macro.
This also has the advantage of not breaking the build on older RPMs which only
know about %_smp_mflags, and not the newer %_smp_build_ncpus etc. macros.
Sage Weil [Wed, 3 Mar 2021 13:38:20 +0000 (08:38 -0500)]
Merge PR #39654 into master
* refs/pull/39654/head:
common/options: drop ms_async_max_op_threads
msg/async: drop Stack::num_workers
msg/async: s/num_workers/workers.size()/
msg/async: use range-based loop in NetworkStack
msg/async: do not pass worker id to Stack::spawn_worker()
async/Stack: pass Worker* to NetworkStack::add_thread()
async/rdma: do not reference worker id in RDMAStack::spawn_worker()
async/dpdk: do not use worker id when creating worker
async/PosixStack: do not reference worker id in ctor
async/rdma: initialize worker in RDMAStack::create_worker()
async/rdma: move RDMAStack::create_worker() to .cc
Reviewed-by: luo runbing <luo.runbing@zte.com.cn> Reviewed-by: Haomai Wang <haomai@xsky.com>
Kefu Chai [Wed, 3 Mar 2021 03:39:36 +0000 (11:39 +0800)]
crimson/mon: keep a copy of sent MMonCommand messages
as per Yingxin Cheng,
> The send process can be asynchronous (there is a conn.out_q, or if the
> underlying socket lives in a different core in the m:n model to be
> ordered there). If user really wants to reuse a message, they must be
> careful not to modify it because it may result in modifing the pending
> messages.
>
> I think the best way is to copy the message if user want to resend it,
> and keep the ceph_assert(!msg->get_seq()). It may looks good to reuse a
> message under lossy policy, but the correctness is now up to user not to
> modify it inplace.
Zac Dover [Tue, 2 Mar 2021 18:16:27 +0000 (04:16 +1000)]
doc/cephadm: add prompts to adoption.rst
This PR formats the bash prompts. It also formats the
bash output so that it appears in the correct (easily
copy-and-pasteable) format. This PR will be followed by
a grammar-improving PR, but this PR is just a
formatting PR.
Sage Weil [Tue, 2 Mar 2021 16:32:12 +0000 (11:32 -0500)]
Merge PR #39739 into master
* refs/pull/39739/head:
cephadm: set CEPH_USE_RANDOM_NONCE if using --init
msg/Messenger: use random nonce if CEPH_USE_RANDOM_NONCE or pid == 1
Revert "Merge PR #39482 into master"
Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <swagner@suse.com>
Merge pull request #38915 from Daniel-Pivonka/clientsoktostop
mgr/cephadm: add ok-to-stop functions for ceph client services
Reviewed-by: Juan Miguel Olmo MartÃnez <jolmomar@redhat.com> Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
crimson/monc: close() active_con before destructing it on resets.
`ProtocolV2` expects `AuthClient` implementations to withstand
calling `get_auth_request()` and `handle_auth_reply_more()` even
if `handle_auth_done()` had been already called. This is because
a network fault may happen on e.g. `AuthSignatureFrame` which is
put on the wire after the `AuthDone` handling.
`crimson::mon::Client` deals with that by returning `auth::error`
from both `get_auth_request()` and `handle_auth_reply_more()` as
the preceding invocation of `handle_auth_done()` had already
cleared `pending_conns` (and set `active_con`). This leads to
`abort_in_close()` and finally to dispatching `ms_handle_reset()`
on `mon::Client` which is fine in general but, when comes to the
current implementation, it destroys `active_con` without closing
it first.
One of the consequence is breaking the `mon::Connection::reply`
promise; another one is missed `mark_down()` call.
```
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] GOT AuthDoneFrame: gid=4121, con_mode=se
cure, payload_len=995
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] WRITE AuthSignatureFrame: signature=60ca f49e5a6cf3cc39c4160cb9d09032db5f794e29655dc0124cf5f42b7546fb
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - authenticated_encrypt_update plaintext.length()=80 buffer.length()=80
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - authenticated_encrypt_final buffer.length()=96 final_len=0
INFO 2021-03-01 18:10:50,489 [shard 0] monc - found mon.noname-a
INFO 2021-03-01 18:10:50,489 [shard 0] monc - sending auth(proto 2 2 bytes epoch 0) v1
INFO 2021-03-01 18:10:50,489 [shard 0] monc - waiting
DEBUG 2021-03-01 18:10:50,489 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] GOT AuthSignatureFrame: signature=ea04f1 318cf76808414a853ed37fd232ae886bef036cb4248079c6cba89d669a
DEBUG 2021-03-01 18:10:50,490 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] WRITE ClientIdentFrame: addrs=v2:172.21.
15.110:6800/33954, target=v2:172.21.15.110:3300/0, gid=0, gs=1, features_supported=4540138303579357183, features_required=576460752303432193, flags=1, cookie=9231904580 14536120
...
INFO 2021-03-01 18:10:50,490 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@56752 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): fault at CONNECTIN
G, going to WAIT -- std::system_error (error crimson::net:4, read eof)
...
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@0 >> mon.? v2:172.21.15.110:3300/0] GOT HelloFrame: my_type=mon peer_addr=v2:172.21.15.110:63960/0
INFO 2021-03-01 18:10:50,690 [shard 0] monc - get_auth_request(con=[osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0], auth_method=0)
ERROR 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] get_initial_auth_request returned crimson::auth::error (unknown connection)
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] closing: reset yes, replace no
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CLOSING, was CONNECTING
...
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] write_event: dropped
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] closing: reset yes, replace no
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CLOSING, was CONNECTING
WARN 2021-03-01 18:10:50,690 [shard 0] osd - ms_handle_reset
WARN 2021-03-01 18:10:50,690 [shard 0] monc - active conn reset v2:172.21.15.110:3300/0
INFO 2021-03-01 18:10:50,690 [shard 0] monc - reopen_session to mon.-1
WARN 2021-03-01 18:10:50,690 [shard 0] monc - mon.0 does not have an addr compatible with me
INFO 2021-03-01 18:10:50,690 [shard 0] monc - connecting to mon.1
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] ProtocolV2::start_connect(): peer_addr=v2:172.21.15.110:3300/0, peer_name=mon.?, cc=14512795460730278364 policy(lossy=true, server=false, standby=false, resetcheck=false)
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] TRIGGER CONNECTING, was NONE
DEBUG 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] UPDATE: gs=3 for connect
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954 >> mon.? v2:172.21.15.110:3300/0] write_event: delay ...
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] write_event: dropped
INFO 2021-03-01 18:10:50,690 [shard 0] ms - [osd.0(client) v2:172.21.15.110:6800/33954@63960 >> mon.? v2:172.21.15.110:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
...
"
WARN 2021-03-01 18:10:50,690 [shard 0] seastar - Exceptional future ignored: seastar::broken_promise (broken promise), backtrace: 0x146f364
0x146f6e1
0x146fb01
0x135c2fe
0x135c481
0x6ee079
0x137db87
0x137def2
0x13ab085
0x1347b27
0x6619f5
/lib64/libc.so.6+0x237b2
0x6b217d
--------
N7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZ4mainENKUlvE_clEvEUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SU_EENSA_ISD_E4typeEOT0_EUlOS3_RSU_ONS_12future_stateINS1_9monostateEEEE_vEE
This commit changes the RGWStoreManager to return a RGWStore* rather
than a RGWRadosStore*. This is the thread that unravels the rest of the
Zipper work, removing hard-coded uses of the RGWRados* classes.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Kefu Chai [Tue, 2 Mar 2021 09:52:36 +0000 (17:52 +0800)]
crimson/mon: resend mon command when session established
this behavior matches that of `MonClient::_resend_mon_commands()`. so
far the only user which sends mon command in crimson is
`OSD::_add_me_to_crush()`, but there is still (rare) chance that the connected
monitor cannot be reached when we send the command to it, in that case,
we should retry when the connection is re-established.
Kefu Chai [Tue, 2 Mar 2021 08:18:20 +0000 (16:18 +0800)]
crimson/mon: pending_messages should not be empty if active_conn
we always send all pending_messages, and clear it when establishing a
connection to mon, so there is no need to check for it when calling
`send_message()`.
Kefu Chai [Tue, 2 Mar 2021 08:07:25 +0000 (16:07 +0800)]
crimson/mon: check for active_con before calling send_pendings()
before this change, we guard the `send_pendings()` call only in
`Client::send_message()`, after this change, all of the
`send_pendings()` calls are guarded with this check.
crimson/monc: renew subscriptions when reopening a session.
Lack of this feature was the root cause of an issue in
teuthology testing in which a socket failure injection
happened exactly during `mon_subscribe`; after the OSD
reconnected, the message hasn't been resent and entire
boot process has frozen.
```
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] --> #6 === mon_subscribe({osdmap=1}) v3
(15)
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=80 buffer.length()=80
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=96 final_len=0
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=48 buffer.length()=48
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=16 buffer.length()=64
DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=80 final_len=0
INFO 2021-02-25 11:42:53,758 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] execute_ready(): fault at READY on lossy
channel, going to CLOSING -- std::system_error (error crimson::net:4, read eof)
```
Zac Dover [Sun, 28 Feb 2021 12:13:39 +0000 (22:13 +1000)]
doc/cephadm: rewrite "install cephadm"
This PR breaks the "Deploying a New Ceph Cluster"
section into several sub-sections, so that each sub-section
pertains to only one subject. I've also added some explanatory
text that puts the instructions into context more than they were
before.
Zac Dover [Mon, 1 Mar 2021 14:01:05 +0000 (00:01 +1000)]
doc/cephadm: rewrite "b.strap a new cluster"
This PR rewrites the section "Bootstrap A New
Cluster" in the Cephadm Guide, in the Install
Chapter. I've broken this section up into what
seem to me to be the topics that the content
naturally divides into.