git-server-git.apps.pok.os.sepia.ceph.com Git

author	Erwan Velu <erwan@redhat.com>
	Fri, 31 Mar 2017 12:54:33 +0000 (14:54 +0200)
committer	Nathan Cutler <ncutler@suse.com>
	Tue, 4 Apr 2017 20:56:52 +0000 (22:56 +0200)
commit	a20d2b89ee13e311cf1038c54ecadae79b68abd5
tree	25ae81a9eb2b84479b18af6758cbb10f4ebe91c5	tree \| snapshot
parent	2d5d0aec60ec9689d44a53233268e9b9dd25df95	commit \| diff

ceph-disk: Adding retry loop in get_partition_dev()

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <erwan@redhat.com>
(cherry picked from commit 93e7b95ed8b4c78daebf7866bb1f0826d7199075)