In `crimson/osd/main.cc` we instruct Seastar to handle `SIGHUP`.
```
// just ignore SIGHUP, we don't reread settings
seastar::engine().handle_signal(SIGHUP, [] {})
```
This happens using the Seastar's signal handling infrastructure
which is incompliant with the alien world.
```
void
reactor::signals::handle_signal(int signo, noncopyable_function<void ()>&& handler) {
// ...
struct sigaction sa;
sa.sa_sigaction = [](int sig, siginfo_t *info, void *p) {
engine()._backend->signal_received(sig, info, p);
};
// ...
}
```
```
extern __thread reactor* local_engine;
extern __thread size_t task_quota;
inline reactor& engine() {
return *local_engine;
}
```
The low-level signal handler above assumes `local_engine._backend`
is not null which stays true only for threads from the S*'s world.
Unfortunately, as we don't block the `SIGHUP` for alien threads,
kernel is perfectly authorized to pick up one them to run the handler
leading to weirdly-looking segfaults like this one:
```
INFO 2021-04-23 07:06:57,807 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:58,753 [shard 0] ms - [osd.1(client) v2:172.21.15.100:6802/30478@51064 >> mgr.4105 v2:172.21.15.109:6800/29891] --> #7 === pg_stats(0 pgs seq
55834574872 v 0) v2 (87)
...
INFO 2021-04-23 07:06:58,813 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,753 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:06:59,753 [shard 0] osd - asok response length: 2947
INFO 2021-04-23 07:06:59,817 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,865 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:06:59,866 [shard 0] osd - asok response length: 2947
DEBUG 2021-04-23 07:07:00,020 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO 2021-04-23 07:07:00,020 [shard 0] osd - asok response length: 2947
INFO 2021-04-23 07:07:00,820 [shard 0] bluestore - stat
...
Backtrace:
0# 0x00005600CD0D6AAF in ceph-osd
1# FatalSignal::signaled(int) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
4# 0x00005600CD830B81 in ceph-osd
5# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
6# pthread_cond_timedwait in /lib64/libpthread.so.0
7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
8# 0x00007F5877999BA3 in /lib64/libstdc++.so.6
9# 0x00007F5877C7414A in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
daemon-helper: command crashed with signal 11
```
Ultimately, it turned out the thread came out from a syscall (`futex`)
and started crunching the `SIGHUP` handler's code in which a nullptr
dereference happened.
This patch blocks `SIGHUP` for all threads spawned by `AlienStore`.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
if (!cpus.empty()) {
pin(cpus);
}
+ block_sighup();
(void) pthread_setname_np(pthread_self(), "alien-store-tp");
loop(queue_max_wait, i);
});
ceph_assert(r == 0);
}
+void ThreadPool::block_sighup()
+{
+ sigset_t sigs;
+ sigemptyset(&sigs);
+ // alien threads must ignore the SIGHUP. It's necessary as in
+ // `crimson/osd/main.cc` we set a handler using the Seastar's
+ // signal handling infrastrucute which assumes the `_backend`
+ // of `seastar::engine()` is not null. Grep `reactor.cc` for
+ // `sigaction` or just visit `reactor::signals::handle_signal()`.
+ sigaddset(&sigs, SIGHUP);
+ pthread_sigmask(SIG_BLOCK, &sigs, nullptr);
+}
+
void ThreadPool::loop(std::chrono::milliseconds queue_max_wait, size_t shard)
{
auto& pending = pending_queues[shard];
return stopping.load(std::memory_order_relaxed);
}
static void pin(const std::vector<uint64_t>& cpus);
+ static void block_sighup();
seastar::semaphore& local_free_slots() {
return submit_queue.local().free_slots;
}
ceph_abort_msg(fmt::format("pidfile_write failed with {} {}",
ret, cpp_strerror(-ret)));
}
- // just ignore SIGHUP, we don't reread settings
+ // just ignore SIGHUP, we don't reread settings. keep in mind signals
+ // handled by S* must be blocked for alien threads (see AlienStore).
seastar::engine().handle_signal(SIGHUP, [] {});
const int whoami = std::stoi(local_conf()->name.get_id());
const auto nonce = get_nonce();