From: Radoslaw Zarzynski Date: Tue, 8 Jun 2021 10:33:31 +0000 (+0000) Subject: crimson/monc: fix races between on_session_opened() and the reset sequence. X-Git-Tag: v17.1.0~1717^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=b5f3dc4d9d537ee6bece8a70f5b3243817041b44;p=ceph.git crimson/monc: fix races between on_session_opened() and the reset sequence. The `active_con` can get invalidated every single time when there is a preemption point. This includes even the middle connection open sequence as it's spread across multiple continuations! Unfortunately, we don't check for `active_con` in the lambdas inside the `on_session_opened()` method. That was the reason of the following crash at Sepia [1]: ``` INFO 2021-06-08 09:36:23,992 [shard 0] monc - do_auth_single: connection closed /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4967-g96cdf983/rpm/el8/BUILD/ceph-17.0.0-4967-g96cdf983/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection' Segmentation fault on shard 0. Backtrace: 0# 0x000055C3C1CA860F in ceph-osd 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd 3# 0x00007FAAE1713B20 in /lib64/libpthread.so.0 4# crimson::mon::Connection::get_conn() in ceph-osd 5# 0x000055C3C2532DA8 in ceph-osd 6# 0x000055C3C2535CB5 in ceph-osd 7# 0x000055C3BBC9FC70 in ceph-osd 8# 0x000055C3C76FAE5F in ceph-osd 9# 0x000055C3C77003A0 in ceph-osd 10# 0x000055C3C78B240B in ceph-osd 11# 0x000055C3C740FE8A in ceph-osd 12# 0x000055C3C7419FAE in ceph-osd 13# main in ceph-osd 14# __libc_start_main in /lib64/libc.so.6 15# _start in ceph-osd Fault at location: 0x98 ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-06-08_09:11:05-rados-master-distro-basic-smithi/6159602 Signed-off-by: Radoslaw Zarzynski --- diff --git a/src/crimson/mon/MonClient.cc b/src/crimson/mon/MonClient.cc index 64261865ce5..cdbdbf2a958 100644 --- a/src/crimson/mon/MonClient.cc +++ b/src/crimson/mon/MonClient.cc @@ -1045,6 +1045,11 @@ seastar::future<> Client::send_message(MessageURef m) seastar::future<> Client::on_session_opened() { return active_con->renew_rotating_keyring().then([this] { + if (!active_con) { + // the connection can be closed even in the middle of the opening sequence + logger().info("on_session_opened {}: connection closed", __LINE__); + return seastar::now(); + } for (auto& m : pending_messages) { (void) active_con->get_conn()->send(std::move(m.msg)); m.pr.set_value(); @@ -1053,6 +1058,10 @@ seastar::future<> Client::on_session_opened() ready_to_send = true; return sub.reload() ? renew_subs() : seastar::now(); }).then([this] { + if (!active_con) { + logger().info("on_session_opened {}: connection closed", __LINE__); + return seastar::now(); + } return seastar::parallel_for_each(mon_commands, [this](auto &command) { return send_message(crimson::make_message(*command.req));