]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/commitdiff
crimson/mgr: don't report if there is no connection available.
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Sat, 17 Apr 2021 17:14:06 +0000 (17:14 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Sat, 17 Apr 2021 20:59:05 +0000 (20:59 +0000)
During a teuthology run [1] following crash happended:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696$ less remote/smithi052/log/ceph-osd.3.log.gz
...
DEBUG 2021-04-08 10:32:58,548 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62168 >> mon.0 v2:172.21.15.52:3300/0] <== #3 === mgrmap(e 4) v1 (1796)
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closing: reset no, replace no
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] TRIGGER CLOSING, was READY
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] execute_ready(): protocol aborted at CLOSING -- std::system_error (error crimson::net:4, read eof)
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closed!
Segmentation fault on shard 0.
Backtrace:
  0x000000000151765c
  0x00000000014d9600
  0x00000000014d9902
  0x00000000014d9972
  /lib64/libpthread.so.0+0x0000000000012b1f
  0x0000000000e59cba
  0x00000000014dc8a6
  0x00000000014cdd1c
  0x0000000001503053
  0x000000000149fab7
  0x00000000006e0ef5
  /lib64/libc.so.6+0x00000000000237b2
  0x000000000072a23d
daemon-helper: command crashed with signal 11
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696/

GDB testifies the `conn` during the execution of `ceph::mgr:report()` was null:

```
(gdb) frame 7
154 in /usr/src/debug/ceph-17.0.0-2935.g4153f8c2.el8.x86_64/src/crimson/mgr/client.cc
(gdb) print conn
$1 = {_b = 0x0, _p = 0x0}
```

Taken altogether with the `mgr.4100 v2:172.21.15.52:6800/30259] closed!`
debug this suggests that a call to `report()` occurred (likely from the
timer) but we were in the middle of the unatomic reconnect sequence:

```cpp
seastar::future<> Client::reconnect()
{
  if (conn) {
    conn->mark_down();
    conn = {};
  }
  // ...
  return seastar::sleep(a_while).then([this] {
    // ...
    conn = msgr.connect(peer, CEPH_ENTITY_TYPE_MGR);
  });
}
```

This commit alters the `mgr::report()` to skip reporting is the `conn`
is unavailable.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/mgr/client.cc

index 5aa8a88ba214a7eb59ba7734411264321b662ed6..db888fe470f886b88b66734a626f087931c760a7 100644 (file)
@@ -152,7 +152,10 @@ seastar::future<> Client::handle_mgr_conf(crimson::net::ConnectionRef,
 void Client::report()
 {
   gate.dispatch_in_background(__func__, *this, [this] {
-    assert(conn);
+    if (!conn) {
+      logger().warn("report: no conn available; raport skipped");
+      return seastar::now();
+    }
     auto pg_stats = with_stats.get_stats();
     return conn->send(std::move(pg_stats));
   });