krbd: retry on transient errors from udev_enumerate_scan_devices()
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist. Some of these errors are suppressed, but
this isn't reliable and varies across versions. In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum(). In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).
Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices(). Don't limit the number, but log
each retry.
Fixes: https://tracker.ceph.com/issues/41036
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
e5921ef4a89f497a0bff6510fce0bb5c242d6172)
Conflicts:
src/krbd.cc [ rbd namespaces not in mimic ]