From: Radosław Zarzyński Date: Thu, 16 Mar 2023 17:05:17 +0000 (+0100) Subject: msg/async: don't abort when public addrs mismatch bind addrs X-Git-Tag: v17.2.7~521^2 X-Git-Url: http://git.apps.os.sepia.ceph.com/?a=commitdiff_plain;h=3fb810451f2522e7b381ecb577d6484a464be72c;p=ceph.git msg/async: don't abort when public addrs mismatch bind addrs Before the 69b47c805fdd2ecd4f58547d58c9f019fc62d447 (PR #50153) a mismatch (in number or types of stored `entity_addr_t`) between public addrs and bind addrs vectors was ignored and the former was taking over anything else -- it was possible to e.g. bind to both v1 and v2 addresses but expose v2 only. Unfortunately, that's exactly how Rook configures ceph-mon: ``` debug 2023-03-16T21:01:48.389+0000 7f99822bf8c0 0 starting mon.a rank 0 at public addrs v2:172.30.122.144:3300/0 at bind addrs [v2:10.129.2.21:3300/0,v1:10.129.2.21:6789/0] mon_data /var/lib/ceph/mon/ceph-a fsid acc14d1b-fb2b-4f01-8b61-6e7cb26e9200 ``` The consequnece is the following abort: ``` ceph version 17.2.5-1338.el9cp (5adce3015143c7c2cc135a71368be194744f5761) quincy (stable) 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string, std::allocator > const&)+0xd3) [0x7ff05392c1a6] 2: /usr/lib64/ceph/libceph-common.so.2(+0x165ac1) [0x7ff05394eac1] 3: (AsyncMessenger::bindv(entity_addrvec_t const&, std::optional)+0x1fe) [0x7ff053baa0ce] 4: main() 5: /lib64/libc.so.6(+0x3feb0) [0x7ff053048eb0] 6: __libc_start_main() 7: _start() debug *** Caught signal (Aborted) ** in thread 7ff052ab18c0 thread_name:ceph-mon 2023-03-16T09:56:35.995+0000 7ff052ab18c0 -1 /builddir/build/BUILD/ceph-17.2.5-1338-g484e8dbb/src/msg/msg_types.h: In function 'void entity_addr_t::set_port(int)' thread 7ff052ab18c0 time 2023-03-16T09:56:35.996339+0000 /builddir/build/BUILD/ceph-17.2.5-1338-g484e8dbb/src/msg/msg_types.h: 359: ceph_abort_msg("abort() called") ``` This commit brings the original logic back but in a way that preserves the port numbers figured out by. e.g. `Processor::bind`. Fixes: https://tracker.ceph.com/issues/59100 Signed-off-by: Radosław Zarzyński (cherry picked from commit 68fbdcf9b0937e8102b86d0bf84815554b62328d) --- diff --git a/src/msg/async/AsyncMessenger.cc b/src/msg/async/AsyncMessenger.cc index 7572827b37be8..3971e91ffd015 100644 --- a/src/msg/async/AsyncMessenger.cc +++ b/src/msg/async/AsyncMessenger.cc @@ -501,22 +501,28 @@ void AsyncMessenger::_finish_bind(const entity_addrvec_t& bind_addrs, if (get_myaddrs().front().get_port() == 0) { set_myaddrs(listen_addrs); } - entity_addrvec_t newaddrs = *my_addrs; - for (auto& a : newaddrs.v) { - a.set_nonce(nonce); - if (saved_public_addrs) { - // transplantate network layer addresses while keeping ports - // (as they can be figured out by msgr from the allowed range [1]) - // unless they are explicitly specified (NATing both IP/port?) - // - // [1]: the low-level `Processor::bind` scans for free ports in - // a range controlled by ms_bind_port_min and ms_bind_port_max - const auto& public_addr = - saved_public_addrs->addr_of_type(a.get_type()); - const auto public_port = public_addr.get_port(); - const auto bound_port = a.get_port(); - a.set_sockaddr(public_addr.get_sockaddr()); - a.set_port(public_port == 0 ? bound_port : public_port); + + entity_addrvec_t newaddrs; + if (saved_public_addrs) { + newaddrs = *saved_public_addrs; + for (auto& public_addr : newaddrs.v) { + public_addr.set_nonce(nonce); + if (public_addr.is_ip() && public_addr.get_port() == 0) { + // port is not explicitly set. This is fine as it can be figured + // out by msgr. For instance, the low-level `Processor::bind` + // scans for free ports in a range controlled by ms_bind_port_min + // and ms_bind_port_max. + for (const auto& a : my_addrs->v) { + if (public_addr.get_type() == a.get_type() && a.is_ip()) { + public_addr.set_port(a.get_port()); + } + } + } + } + } else { + newaddrs = *my_addrs; + for (auto& a : newaddrs.v) { + a.set_nonce(nonce); } } set_myaddrs(newaddrs);