]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commitdiff
crimson/monc: fix races between on_session_opened() and the reset sequence. 41756/head
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Tue, 8 Jun 2021 10:33:31 +0000 (10:33 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Tue, 8 Jun 2021 10:57:03 +0000 (10:57 +0000)
The `active_con` can get invalidated every single time when there is
a preemption point. This includes even the middle connection open
sequence as it's spread across multiple continuations!

Unfortunately, we don't check for `active_con` in the lambdas inside
the `on_session_opened()` method. That was the reason of the following
crash at Sepia [1]:

```
INFO  2021-06-08 09:36:23,992 [shard 0] monc - do_auth_single: connection closed
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4967-g96cdf983/rpm/el8/BUILD/ceph-17.0.0-4967-g96cdf983/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x000055C3C1CA860F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007FAAE1713B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_conn() in ceph-osd
 5# 0x000055C3C2532DA8 in ceph-osd
 6# 0x000055C3C2535CB5 in ceph-osd
 7# 0x000055C3BBC9FC70 in ceph-osd
 8# 0x000055C3C76FAE5F in ceph-osd
 9# 0x000055C3C77003A0 in ceph-osd
10# 0x000055C3C78B240B in ceph-osd
11# 0x000055C3C740FE8A in ceph-osd
12# 0x000055C3C7419FAE in ceph-osd
13# main in ceph-osd
14# __libc_start_main in /lib64/libc.so.6
15# _start in ceph-osd
Fault at location: 0x98
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-06-08_09:11:05-rados-master-distro-basic-smithi/6159602

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/mon/MonClient.cc

index 64261865ce551ec5232a75bf4272c1aac62d9d2c..cdbdbf2a958eba35d285eae75b463781227e1dc7 100644 (file)
@@ -1045,6 +1045,11 @@ seastar::future<> Client::send_message(MessageURef m)
 seastar::future<> Client::on_session_opened()
 {
   return active_con->renew_rotating_keyring().then([this] {
+    if (!active_con) {
+      // the connection can be closed even in the middle of the opening sequence
+      logger().info("on_session_opened {}: connection closed", __LINE__);
+      return seastar::now();
+    }
     for (auto& m : pending_messages) {
       (void) active_con->get_conn()->send(std::move(m.msg));
       m.pr.set_value();
@@ -1053,6 +1058,10 @@ seastar::future<> Client::on_session_opened()
     ready_to_send = true;
     return sub.reload() ? renew_subs() : seastar::now();
   }).then([this] {
+    if (!active_con) {
+      logger().info("on_session_opened {}: connection closed", __LINE__);
+      return seastar::now();
+    }
     return seastar::parallel_for_each(mon_commands,
       [this](auto &command) {
       return send_message(crimson::make_message<MMonCommand>(*command.req));