"ceph-disk trigger" invocation is currently performed in a mutually
exclusive fashion, with each call first taking an flock on the path
/var/lock/ceph-disk. On systems with a lot of osds, this leads to a
large amount of lock contention during boot-up, and can cause some
service instances to trip the 120 second timeout.
Take an flock on a device specific path instead of /var/lock/ceph-disk,
so that concurrent "ceph-disk trigger" invocations are permitted for
independent osds. This greatly reduces lock contention and consequently
the chance of service timeout. Per-device concurrency restrictions
required for http://tracker.ceph.com/issues/13160 are maintained.
Fixes: http://tracker.ceph.com/issues/18049
Signed-off-by: David Disseldorp <ddiss@suse.de>
[Service]
Type=oneshot
KillMode=none
-ExecStart=/bin/sh -c 'timeout 120 flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
+ExecStart=/bin/sh -c 'timeout 120 flock /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
TimeoutSec=0