]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
crimson/mgr: don't report if there is no connection available. 40898/head
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Sat, 17 Apr 2021 17:14:06 +0000 (17:14 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Sat, 17 Apr 2021 20:59:05 +0000 (20:59 +0000)
commit728be14cd9dac814f66cbf70a756e290aeb2c75a
tree5dcec084e8c1ca21ff89eb92b4b4d9a150b68693
parente09f8d7dd027735ecc015f0497a1668e6cbcf1d2
crimson/mgr: don't report if there is no connection available.

During a teuthology run [1] following crash happended:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696$ less remote/smithi052/log/ceph-osd.3.log.gz
...
DEBUG 2021-04-08 10:32:58,548 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62168 >> mon.0 v2:172.21.15.52:3300/0] <== #3 === mgrmap(e 4) v1 (1796)
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closing: reset no, replace no
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] TRIGGER CLOSING, was READY
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] execute_ready(): protocol aborted at CLOSING -- std::system_error (error crimson::net:4, read eof)
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closed!
Segmentation fault on shard 0.
Backtrace:
  0x000000000151765c
  0x00000000014d9600
  0x00000000014d9902
  0x00000000014d9972
  /lib64/libpthread.so.0+0x0000000000012b1f
  0x0000000000e59cba
  0x00000000014dc8a6
  0x00000000014cdd1c
  0x0000000001503053
  0x000000000149fab7
  0x00000000006e0ef5
  /lib64/libc.so.6+0x00000000000237b2
  0x000000000072a23d
daemon-helper: command crashed with signal 11
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696/

GDB testifies the `conn` during the execution of `ceph::mgr:report()` was null:

```
(gdb) frame 7
154 in /usr/src/debug/ceph-17.0.0-2935.g4153f8c2.el8.x86_64/src/crimson/mgr/client.cc
(gdb) print conn
$1 = {_b = 0x0, _p = 0x0}
```

Taken altogether with the `mgr.4100 v2:172.21.15.52:6800/30259] closed!`
debug this suggests that a call to `report()` occurred (likely from the
timer) but we were in the middle of the unatomic reconnect sequence:

```cpp
seastar::future<> Client::reconnect()
{
  if (conn) {
    conn->mark_down();
    conn = {};
  }
  // ...
  return seastar::sleep(a_while).then([this] {
    // ...
    conn = msgr.connect(peer, CEPH_ENTITY_TYPE_MGR);
  });
}
```

This commit alters the `mgr::report()` to skip reporting is the `conn`
is unavailable.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/mgr/client.cc