git.apps.os.sepia.ceph.com Git

author	Ilya Dryomov <idryomov@gmail.com>
	Mon, 7 Oct 2019 13:32:39 +0000 (15:32 +0200)
committer	Ilya Dryomov <idryomov@gmail.com>
	Fri, 1 Nov 2019 16:26:37 +0000 (17:26 +0100)
commit	342447e90e32fbe46ce738f55d39ad3f70f5dff5
tree	254c5fca4d8260dc52796a2a6c9c8136f6889274	tree \| snapshot
parent	f6cc5cf641e93e17b60ef8fc47eec5aec0d85fb8	commit \| diff

krbd: retry on transient errors from udev_enumerate_scan_devices()

udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist.  Some of these errors are suppressed, but
this isn't reliable and varies across versions.  In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum().  In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).

Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices().  Don't limit the number, but log
each retry.

Fixes: https://tracker.ceph.com/issues/41036
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e5921ef4a89f497a0bff6510fce0bb5c242d6172)

Conflicts:
src/krbd.cc [ rbd namespaces not in mimic ]