On machines with MON and OSDs (on boot) OSDs started shortly after MON startup
but MON needs time to become oprational so OSDs fail to start due to short
timeout because they don't have enough time to establish communication with
cluster. This is even more likely to happen when there are other monitors down
which is not unusual when servers are rebooting after power failure.
Increasing timeout significantly improves chances for successful OSD start.
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
get_conf osd_weight "" "osd crush initial weight"
defaultweight="$(df -P -k $osd_data/. | tail -1 | awk '{ print sprintf("%.2f",$2/1073741824) }')"
get_conf osd_keyring "$osd_data/keyring" "keyring"
- do_cmd "timeout 10 $BINDIR/ceph -c $conf --name=osd.$id --keyring=$osd_keyring osd crush create-or-move -- $id ${osd_weight:-${defaultweight:-1}} $osd_location"
+ do_cmd "timeout 30 $BINDIR/ceph -c $conf --name=osd.$id --keyring=$osd_keyring osd crush create-or-move -- $id ${osd_weight:-${defaultweight:-1}} $osd_location"
fi
fi