From: Kefu Chai Date: Wed, 16 Sep 2020 01:28:04 +0000 (+0800) Subject: blk/kernel: retry forever if bdev_flock_retry is 0 X-Git-Tag: v16.1.0~978^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=743b5bda6559c9be0e64617aa43ef5e06a5a6e60;p=ceph.git blk/kernel: retry forever if bdev_flock_retry is 0 retry forever if cct->_conf->bdev_flock_retry is 0. systemd-udevd is most likely the reason why ceph-osd fails to acquire the flock when "mkfs", because systemd-udevd probes all block devices when the device changes in the system using libblkid, and when systemd-udevd starts looking at the device it takes a `LOCK_SH|LOCK_NB` lock. and it releases the lock right after done with it. so normally, it only takes a jiffy, see https://github.com/systemd/systemd/blob/ee0b9e721a368742ac6fa9c3d9a33e45dc3203a2/src/shared/lockfile-util.c#L18 so, we just need to retry couple times before acquiring the lock. Fixes: https://tracker.ceph.com/issues/46124 Signed-off-by: Kefu Chai --- diff --git a/src/blk/kernel/KernelDevice.cc b/src/blk/kernel/KernelDevice.cc index 7684ad60b5f..58142b3c106 100644 --- a/src/blk/kernel/KernelDevice.cc +++ b/src/blk/kernel/KernelDevice.cc @@ -108,7 +108,7 @@ int KernelDevice::_lock() dout(1) << __func__ << " flock busy on " << path << dendl; if (const uint64_t max_retry = cct->_conf.get_val("bdev_flock_retry"); - nr_tries++ == max_retry) { + max_retry > 0 && nr_tries++ == max_retry) { return -EAGAIN; } double retry_interval = diff --git a/src/common/options.cc b/src/common/options.cc index 6e73fb942dd..5ff30939968 100644 --- a/src/common/options.cc +++ b/src/common/options.cc @@ -4072,7 +4072,11 @@ std::vector