If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up. Instead, make it time out after 10 seconds and then
abort startup. This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.
In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!
Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
get_conf osd_weight "" "osd crush initial weight"
defaultweight="$(do_cmd "df $osd_data/. | tail -1 | awk '{ d= \$2/1073741824 ; r = sprintf(\"%.2f\", d); print r }'")"
get_conf osd_keyring "$osd_data/keyring" "keyring"
- do_cmd "$BINDIR/ceph \
+ do_cmd "timeout 10 $BINDIR/ceph \
--name=osd.$id \
--keyring=$osd_keyring \
osd crush create-or-move \
${osd_weight:-${defaultweight:-1}} \
root=default \
host=$host \
- $osd_location \
- || :"
+ $osd_location"
fi
fi