From 543fe9f1fbb5257693b83b8002f31625cafbce56 Mon Sep 17 00:00:00 2001 From: Kefu Chai Date: Thu, 7 Mar 2024 19:48:54 +0800 Subject: [PATCH] msg: do not abort when driver->del_event() returns -ENOENT when shutting down a connection, we call into `EpollDriver::del_event(..., EVENT_READABLE | EVENT_WRITABLE)`, and its caller, `EventCenter::delete_file_event()` considers a negative return value from this function a signal of bug and aborts in that case. but in linux, if a nic is hot unplugged, all the socket file descriptors associated with it are closed, and we would have following chain: __fput() -> eventpoll_release() -> eventpoll_release_file() -> __ep_remove() in __ep_remove(), the epitem representing the fd is removed from the list. so if we perform the cleanup when shutting down the TCP connection, and try to unregister the fd from the interest list, -ENOENT is returned. librbd is using EpollDriver as well, and it sits at the client side. the machine on which librbd is running could unplug its NIC without shutting down librbd first. so, if librbd happen to be reading/writing to the socket associated with the NIC being unplugged, there are chances that librbd could crash due to the `ceph_abort_msg()` call in `EventCenter::delete_file_event()`. but this is not a fatal error, as we are unregistering the fd anyway. in this change, in order to avoid the crash, we don't consider it a bug if `driver->del_event()` returns -ENOENT anymore. Fixes: https://tracker.ceph.com/issues/64788 Co-Authored-by: Zhang Jiao Signed-off-by: Kefu Chai --- src/msg/async/Event.cc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/msg/async/Event.cc b/src/msg/async/Event.cc index 4662e42bd144d..926fdcdb1cc3d 100644 --- a/src/msg/async/Event.cc +++ b/src/msg/async/Event.cc @@ -16,6 +16,7 @@ #include "include/compat.h" #include "common/errno.h" +#include #include "Event.h" #ifdef HAVE_DPDK @@ -285,7 +286,10 @@ void EventCenter::delete_file_event(int fd, int mask) return ; int r = driver->del_event(fd, event->mask, mask); - if (r < 0) { + if (r < 0 && r != -ENOENT) { + // if the socket fd is closed by the underlying nic driver, the + // corresponding epoll item would be removed from the interest list, that'd + // lead to ENOENT when removing the fd from the list. // see create_file_event ceph_abort_msg("BUG!"); } -- 2.39.5