generic/166 is takes way too long to run on iscsi disks - over an
*hour* on flash based iscsi targets. In comparison, it takes 18s to
run on a pmem device.
The issue is that it takes 3-4s per file write cycle on slow disks,
and it does a thousand write cycles. The problem is taht reflink is
so much faster than the write cycle that it's doing many more
snapshots on slow disks than fast disks, and this slows it down even
more.
e.g. the pmem system that takes 18s to run does just under 1000
snapshots - roughly one per file write. 20 minutes into the iscsi
based test, it's only done ~300 write cycles but there are almost
10,000 snapshots been taken. IOWs, we're doing 30 snapshots a file
write, not ~1.
Fix this by rate limiting snapshots to at most 1 per whole file
write. This reduces the number of snapshots taken on fast devices by
~50% (runtime on pmem device went from 18s -> 8s) but reduced it to
1000 on slow devices and reduced runtime from 3671s to just 311s.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eryu Guan <eguan@redhat.com>
testdir=$SCRATCH_MNT/test-$seq
finished_file=/tmp/finished
+do_snapshot=/tmp/snapshot
rm -rf $finished_file
mkdir $testdir
_scratch_cycle_mount
# Snapshot creator...
+#
+# We rate limit the snapshot creator to one snapshot per full file write. this
+# limits the runtime on slow devices, whilst not substantially reducing the the
+# number of snapshots taken on fast devices.
snappy() {
n=0
while [ ! -e $finished_file ]; do
+ if [ ! -e $do_snapshot ]; then
+ sleep 0.01
+ continue;
+ fi
out="$(_cp_reflink $testdir/file1 $testdir/snap_$n 2>&1)"
res=$?
echo "$out" | grep -q "No space left" && break
test -n "$out" && echo "$out"
test $res -ne 0 && break
n=$((n + 1))
+ rm -f $do_snapshot
done
}
snappy &
seq $nr_loops -1 0 | while read i; do
_pwrite_byte 0x63 $((i * blksz)) $blksz -d $testdir/file1 >> $seqres.full
+ touch $do_snapshot
done
touch $finished_file
wait