From: Radoslaw Zarzynski Date: Thu, 21 Oct 2021 12:55:19 +0000 (+0000) Subject: crimson/osd: fix network address selection for hearbeat's messengers. X-Git-Tag: v17.1.0~595^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F43648%2Fhead;p=ceph.git crimson/osd: fix network address selection for hearbeat's messengers. Public and cluster messenger instances binds to `INADDR_ANY` and learns their local addresses later which allows to deal with some exotic network arrangements. ```` INFO 2021-10-21 12:18:57,740 [shard 0] osd - picked address v2:0.0.0.0:0/0 ERROR 2021-10-21 12:18:57,740 [shard 0] none - Falling back to public interface INFO 2021-10-21 12:18:57,740 [shard 0] osd - picked address v2:0.0.0.0:0/0 DEBUG 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(cluster) v2:0.0.0.0:6800/3159356168] do_listen: try listen v2:0.0.0.0:6800/3159356168... DEBUG 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(client) v2:0.0.0.0:6800/3159356168] do_listen: try listen v2:0.0.0.0:6800/3159356168... INFO 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(cluster) v2:0.0.0.0:6800/3159356168] try_bind: done INFO 2021-10-21 12:18:57,741 [shard 0] osd - pg_epoch 84 pg[1.c( v 84'194 (0'0,84'194] local-lis/les=82/83 n=1 ec=13/7 lis/c=82/82 les/c/f=83/83/0 sis=82) [0] r=0 lpr=0 crt=84'194 lcod 0'0 mlcod 0'0 unknown exit Initial 0.723432 0 0.000000 INFO 2021-10-21 12:18:57,741 [shard 0] osd - Exiting state: Initial, entered at 1634818737.017936, 0.0 spent on 0 events INFO 2021-10-21 12:18:57,741 [shard 0] osd - pg_epoch 84 pg[1.c( v 84'194 (0'0,84'194] local-lis/les=82/83 n=1 ec=13/7 lis/c=82/82 les/c/f=83/83/0 sis=82) [0] r=0 lpr=0 crt=84'194 lcod 0'0 mlcod 0'0 unknown enter Reset INFO 2021-10-21 12:18:57,741 [shard 0] osd - Entering state: Reset INFO 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(cluster) v2:0.0.0.0:6800/3159356168] try_bind: done DEBUG 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(client) v2:0.0.0.0:6801/3159356168] do_listen: try listen v2:0.0.0.0:6801/3159356168... INFO 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(client) v2:0.0.0.0:6801/3159356168] try_bind: done INFO 2021-10-21 12:18:57,741 [shard 0] ms - [osd.0(client) v2:0.0.0.0:6801/3159356168] try_bind: done ``` The two messenger instances dedicated to hearbeat should follow this policy. Unfortunately, crimson -- in contrast to the classical OSD -- uses the public's and cluster's addresses *learned from clients*. ``` INFO 2021-10-21 12:18:57,747 [shard 0] osd - heartbeat: start front_addrs=v2:172.17.0.1:6801/3159356168, back_addrs=v2:172.17.0.1:6800/3159356168 DEBUG 2021-10-21 12:18:57,748 [shard 0] ms - [osd.0(hb_back) v2:172.17.0.1:6800/3159356168] do_listen: try listen v2:172.17.0.1:6800/3159356168... DEBUG 2021-10-21 12:18:57,748 [shard 0] ms - [osd.0(hb_front) v2:172.17.0.1:6800/3159356168] do_listen: try listen v2:172.17.0.1:6800/3159356168... DEBUG 2021-10-21 12:18:57,748 [shard 0] ms - [osd.0(hb_back) v2:172.17.0.1:6801/3159356168] do_listen: try listen v2:172.17.0.1:6801/3159356168... DEBUG 2021-10-21 12:18:57,748 [shard 0] ms - [osd.0(hb_front) v2:172.17.0.1:6801/3159356168] do_listen: try listen v2:172.17.0.1:6801/3159356168... ``` If a network interface's address is different than the one visible-to and learned-from clients, all attempts to bind fail with `EADDRNOTAVAIL` (usually `99` for `errno`). ``` DEBUG 2021-10-21 12:19:13,284 [shard 0] ms - [osd.0(hb_back) v2:172.17.0.1:7550/3159356168] do_listen: try listen v2:172.17.0.1:7550/3159356168... DEBUG 2021-10-21 12:19:13,284 [shard 0] ms - [osd.0(hb_front) v2:172.17.0.1:7568/3159356168] do_listen: try listen v2:172.17.0.1:7568/3159356168... INFO 2021-10-21 12:19:13,284 [shard 0] ms - [osd.0(hb_front) v2:172.17.0.1:7568/3159356168] was unable to bind after 3 attempts: generic:99 ERROR 2021-10-21 12:19:13,284 [shard 0] osd - heartbeat messenger bind(v2:172.17.0.1:0/3159356168): generic:99 ``` Signed-off-by: Radoslaw Zarzynski --- diff --git a/src/crimson/osd/osd.cc b/src/crimson/osd/osd.cc index f2b4cde54bcc..c5753a556af2 100644 --- a/src/crimson/osd/osd.cc +++ b/src/crimson/osd/osd.cc @@ -371,8 +371,8 @@ seastar::future<> OSD::start() return seastar::now(); } }).then([this] { - return heartbeat->start(public_msgr->get_myaddrs(), - cluster_msgr->get_myaddrs()); + return heartbeat->start(pick_addresses(CEPH_PICK_ADDRESS_PUBLIC), + pick_addresses(CEPH_PICK_ADDRESS_CLUSTER)); }).then([this] { // create the admin-socket server, and the objects that register // to handle incoming commands