git-server-git.apps.pok.os.sepia.ceph.com Git

generic/778: fix background loop control with sentinel files

This test fails on my slowish QA VM with 32k-fsblock xfs:

--- /run/fstests/bin/tests/generic/778.out      2025-10-20 10:03:43.432910446 -0700
+++ /var/tmp/fstests/generic/778.out.bad        2025-11-04 12:01:31.137813652 -0800
@@ -1,2 +1,137 @@
  QA output created by 778
-Silence is golden
+umount: /opt: target is busy.
+mount: /opt: /dev/sda4 already mounted on /opt.
+       dmesg(1) may have more information after failed mount system call.
+cycle mount failed
+(see /var/tmp/fstests/generic/778.full for details)

Injecting a 'ps auxfww' into the _scratch_cycle_mount helper reveals
that this process is still sitting on /opt:

root     1804418  9.0  0.8 144960 134368 pts/0   Dl+  12:01   0:00 /run/fstests/xfsprogs/io/xfs_io -i -c open -fsd /opt/testfile -c pwrite -S 0x61 -DA -V1 -b 134217728 134217728 134217728

Yes, that's the xfs_io process started by atomic_write_loop.
Inexplicably, the awloop killing code terminates the subshell running
the for loop in atomic_write_loop but only waits for the subshell itself
to exit.  It doesn't wait for any of that subshell's children, and
that's why the unmount fails.

A bare "wait" (without the $awloop_pid parameter) also doesn't wait for
the xfs_io because the parent shell sees the subshell exit and treats
that as job completion.  We can't use killall here because the system
could be running check-parallel, nor can we use pkill here because the
pid namespace containment code was removed.

The simplest stupid answer is to use sentinel files to control the loop.

Cc: fstests@vger.kernel.org # v2025.10.20
Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zorro Lang <zlang@kernel.org>

generic/778: fix severe performance problems

This test takes 4800s to run, which is horrible.  AFAICT it starts out
by timing how much can be written atomically to a new file in 0.2
seconds, then scales up the file size by 3x.  On not very fast storage,
this can result in file_size being set to ~250MB on a 4k fsblock
filesystem.  That's about 64,000 blocks.

The next thing this test does is try to create a file of that size
(250MB) of alternating written and unwritten blocks.  For some reason,
it sets up this file by invoking xfs_io 64,000 times to write small
amounts of data, which takes 3+ minutes on the author's system because
exec overhead is pretty high when you do that.

As a result, one loop through the test takes almost 4 minutes.  The test
loops 20 times, so it runs for 80 minutes(!!) which is a really long
time.

So the first thing we do is observe that the giant slow loop is being
run as a single thread on an empty filesystem.  Most of the time the
allocator generates a mostly physically contiguous file.  We could
fallocate the whole file instead of fallocating one block every other
time through the loop.  This halves the setup time.

Next, we can also stuff the remaining pwrite commands into a bash array
and only invoke xfs_io once every 128x through the loop.  This amortizes
the xfs_io startup time, which reduces the test loop runtime to about 20
seconds.

Finally, replace the 20x loop with a _soak_loop_running 5x loop because
5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
SOAK_DURATION to get more intensive testing.  On my system this cuts the
runtime to 75 seconds.

Cc: fstests@vger.kernel.org # v2025.10.20
Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zorro Lang <zlang@kernel.org>

common: leave any breadcrumbs when _link_out_file_named can't find the output file

_link_out_file_named is an obnoxiously complicated helper involving a
perl script embedded inside a bash subshell that does ... a lookup of
some sort involving comparing the comma-separated list in its second
argument against a comma-separated list in a config file that then maps
to an output file suffix. I don't know what it really does. The .cfg
file format is undocumented except for the perl script.

This is really irritating every time I have to touch any of these tests
with flexible golden outputs, and I frequently screw up the mapping.
The helper is not very helpful when you do this, because it doesn't even
try to tell you *which* suffix it found, let alone how it got there.

Fix this up so that the .full file gets some diagnostics, even if the
stdout text is "no qualified output".

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zorro Lang <zlang@kernel.org>

generic/773: fix expected output "QA output created by 1226"

The test generic/773 was apparently submitted as generic/1226, but
when it was renamed to pack the test namespace, apparently the test
output wasn't adjusted to reflect the new test name, leading to the
test failing on sytems that have devices that support atomic writes.

Fixes: 1499d4ff2365 ("generic: Add atomic write test using fio crc ...")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Zorro Lang <zlang@kernel.org>