From: Blaine Gardner Date: Mon, 17 Nov 2014 23:17:15 +0000 (-0600) Subject: Fix bug #10096 (ceph-disk umount race condition) X-Git-Tag: v0.90~77^2 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=refs%2Fpull%2F2947%2Fhead;p=ceph.git Fix bug #10096 (ceph-disk umount race condition) Bug: http://tracker.ceph.com/issues/10096 Brief: Unmounting temporary mount point failed due to file being 'busy'. Root cause could not be easily determined due to timing variances caused by debug attempts. Race condition exists. Solution: Implement a retry with incremental backoff as a viable workaround. This workaround is okay because (1) Finding the root cause would take a not insignificant amount of time/effort. (2) The workaround is a more general fix for any process that might cause the exhibited behavior. Signed-off-by: Blaine Gardner --- diff --git a/src/ceph-disk b/src/ceph-disk index 012d9e57e0a0..20fd5b3a4f8f 100755 --- a/src/ceph-disk +++ b/src/ceph-disk @@ -29,6 +29,7 @@ import stat import sys import tempfile import uuid +import time """ Prepare: @@ -900,17 +901,25 @@ def unmount( """ Unmount and removes the given mount point. """ - try: - LOG.debug('Unmounting %s', path) - command_check_call( - [ - '/bin/umount', - '--', - path, - ], - ) - except subprocess.CalledProcessError as e: - raise UnmountError(e) + retries = 0 + while True: + try: + LOG.debug('Unmounting %s', path) + command_check_call( + [ + '/bin/umount', + '--', + path, + ], + ) + break + except subprocess.CalledProcessError as e: + # on failure, retry 3 times with incremental backoff + if retries == 3: + raise UnmountError(e) + else: + time.sleep(0.5 + retries * 1.0) + retries += 1 os.rmdir(path)