]> git.apps.os.sepia.ceph.com Git - xfsprogs-dev.git/log
xfsprogs-dev.git
2 years agomkfs: check dirent names when reading protofile
Darrick J. Wong [Wed, 1 Mar 2023 16:05:34 +0000 (08:05 -0800)]
mkfs: check dirent names when reading protofile

The protofile parser in mkfs does not check directory entry names when
populating the filesystem.  The libxfs directory code doesn't check them
either, since they depend on the Linux VFS to sanitize incoming names.
If someone puts a slash in the first (name) column in the protofile,
this results in a successful format and xfs_repair -n immediately
complains.

Screen the names that are being read from the protofile.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoRemove several implicit function declarations origin/for-next_2023-03-01 origin/for-next_2023-03-14
Arjun Shankar [Wed, 8 Feb 2023 14:34:16 +0000 (15:34 +0100)]
Remove several implicit function declarations

During configure, several ioctl checks omit the corresponding include
and a pwritev2 check uses the wrong feature test macro.
This commit fixes the same.

Signed-off-by: Arjun Shankar <arjun@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_db: make flist_find_ftyp() to check for field existance on disk
Andrey Albershteyn [Wed, 8 Feb 2023 11:02:22 +0000 (12:02 +0100)]
xfs_db: make flist_find_ftyp() to check for field existance on disk

flist_find_ftyp() searches for the field of the requested type. The
first found field/path is returned. However, this doesn't work when
there are multiple fields of the same type. For example, attr3 type
have a few CRC fields. Leaf block (xfs_attr_leaf_hdr ->
xfs_da3_blkinfo) and remote value block (xfs_attr3_rmt_hdr) both
have CRC but goes under attr3 type. This causes 'crc' command to be
unable to find CRC field when we are at remote attribute block as it
tries to use leaf block CRC path:

$ dd if=/dev/zero bs=4k count=10 | tr '\000' '1' > test.img
$ touch test.file
$ setfattr -n user.bigattr -v "$(cat test.img)" test.file

$ # CRC of the leaf block
$ xfs_db -r -x /dev/sda5 -c 'inode 132' -c 'ablock 0' -c 'crc'
Verifying CRC:
hdr.info.crc = 0x102b5cbf (correct)

$ # CRC of the remote value block
$ xfs_db -r -x /dev/sda5 -c 'inode 132' -c 'ablock 1' -c 'crc'
field info not found
parsing error

Solve this by making flist_find_ftyp() to also check that field in
question have non-zero count (exist at the current block).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_io: fix bmap command not detecting realtime files with xattrs
Darrick J. Wong [Thu, 16 Feb 2023 21:53:10 +0000 (13:53 -0800)]
xfs_io: fix bmap command not detecting realtime files with xattrs

Fix the bmap command so that it will detect a realtime file if any of
the other file flags (e.g. xattrs) are set.  Observed via xfs/556.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_io: set fs_path when opening files on foreign filesystems
Darrick J. Wong [Thu, 16 Feb 2023 21:53:04 +0000 (13:53 -0800)]
xfs_io: set fs_path when opening files on foreign filesystems

Ted noticed that the following command:

$ xfs_io -c 'fsmap -d 0 0' /mnt
xfs_io: xfsctl(XFS_IOC_GETFSMAP) iflags=0x0 ["/mnt"]: Invalid argument

doesn't work on an ext4 filesystem.  The above command is supposed to
issue a GETFSMAP query against the "data" device.  Although the manpage
doesn't claim support for ext4, it turns out that this you get this
trace data:

          xfs_io-4144  [002]   210.965642: ext4_getfsmap_low_key: dev
7:0 keydev 163:2567 block 0 len 0 owner 0 flags 0x0
          xfs_io-4144  [002]   210.965645: ext4_getfsmap_high_key: dev
7:0 keydev 32:5277:0 block 0 len 0 owner -1 flags 0xffffffff

Notice the random garbage in the keydev field -- this happens because
openfile (in xfs_io) doesn't initialize *fs_path if the caller doesn't
supply a geometry structure or the opened file isn't on an XFS
filesystem.  IOWs, we feed random heap garbage to the kernel, and the
kernel rejects the call unnecessarily.

Fix this to set the fspath information even for foreign filesystems.

Reported-by: tytso@mit.edu
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_scrub: fix broken realtime free blocks unit conversions
Darrick J. Wong [Thu, 16 Feb 2023 21:52:58 +0000 (13:52 -0800)]
xfs_scrub: fix broken realtime free blocks unit conversions

r_blocks is in units of fs blocks, but freertx is in units of realtime
extents.  Add the missing conversion factor so we don't end up with
bogus things like this:

Pretend that sda and sdb are both 100T volumes.

# mkfs.xfs -f /dev/sda -b -r rtdev=/dev/sdb,extsize=2m
# mount /dev/sda /mnt -o rtdev=/dev/sdb
# xfs_scrub -dTvn /mnt
<snip>
Phase 7: Check summary counters.
3.5TiB data used;  99.8TiB realtime data used;  55 inodes used.
2.0GiB data found; 50.0MiB realtime data found; 55 inodes found.
55 inodes counted; 0 inodes checked.

We just created the filesystem, the realtime volume should be empty.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_spaceman: fix broken -g behavior in freesp command
Darrick J. Wong [Thu, 16 Feb 2023 21:52:53 +0000 (13:52 -0800)]
xfs_spaceman: fix broken -g behavior in freesp command

Don't zero out the histogram bucket count when turning on group summary
mode -- this will screw up the data structures and it's pointless.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_admin: get/set label of mounted filesystem origin/for-next_2023-02-14 origin/for-next_2023-02-16 origin/for-next_2023-02-17
Catherine Hoang [Thu, 26 Jan 2023 00:33:11 +0000 (16:33 -0800)]
xfs_admin: get/set label of mounted filesystem

Adapt this tool to call xfs_io to get/set the label of a mounted filesystem.

Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_admin: correctly parse IO_OPTS parameters
Catherine Hoang [Thu, 26 Jan 2023 00:33:10 +0000 (16:33 -0800)]
xfs_admin: correctly parse IO_OPTS parameters

Change exec to eval so that the IO_OPTS parameters are parsed correctly
when the parameters contain quotations.

Fixes: e7cd89b2da72 ("xfs_admin: get UUID of mounted filesystem")
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoprogs: just use libtoolize origin/for-next_2023-01-31
Dave Chinner [Thu, 19 Jan 2023 23:39:06 +0000 (10:39 +1100)]
progs: just use libtoolize

We no longer support xfsprogs on random platforms other than Linux,
so drop the complexity in detecting the libtoolize binary on MacOS
from the main makefile.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoprogs: autoconf fails during debian package builds
Dave Chinner [Thu, 19 Jan 2023 23:39:05 +0000 (10:39 +1100)]
progs: autoconf fails during debian package builds

For some reason, a current debian testing build system will fail to
build debian packages because the build environment is not correctly
detecting that libtoolize needs the "-i" parameter to copy in the
files needed by autoconf.

My build scripts run "make -j 16 realclean; make -j 16 deb", and the
second step is failing immediately with:

libtoolize -c `libtoolize -n -i >/dev/null 2>/dev/null && echo -i` -f
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, '.'.
libtoolize: copying file './ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
libtoolize: Consider adding '-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
cp include/install-sh .
aclocal -I m4
autoconf
./configure $LOCAL_CONFIGURE_OPTIONS
configure: error: cannot find required auxiliary files: config.guess config.sub
make: *** [Makefile:131: include/builddefs] Error 1

If I run 'make realclean; make deb' from the command line, the
package build runs to completion.  I have not been able to work out
why the initial build fails, but then succeeds after a 'make
realclean' has been run, and I don't feel like spending hours
running down this rabbit hole.

This conditional "-i" flag detection was added back in *2009* when
default libtoolize behaviour was changed to not copy the config
files into the build area, and the "-i" flag was added to provide
that behaviour. It is detecting that the "-i" flag is needed that is
now failing, but it is most definitely still needed.

Rather than ispending lots of time trying to understand this and
then making the detection more complex, just use the "-i" flag
unconditionally and require any userspace that this now breaks on to
upgrade their 15+ year old version of libtoolize something a little
more modern.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_admin: get UUID of mounted filesystem
Catherine Hoang [Thu, 5 Jan 2023 00:36:13 +0000 (16:36 -0800)]
xfs_admin: get UUID of mounted filesystem

Adapt this tool to call xfs_io to retrieve the UUID of a mounted filesystem.
This is a precursor to enabling xfs_admin to set the UUID of a mounted
filesystem.

Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_io: add fsuuid command
Catherine Hoang [Thu, 5 Jan 2023 00:36:12 +0000 (16:36 -0800)]
xfs_io: add fsuuid command

Add support for the fsuuid command to retrieve the UUID of a mounted
filesystem.

Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfsprogs: Release v6.1.1 v6.1.1
Carlos Maiolino [Fri, 13 Jan 2023 18:06:37 +0000 (19:06 +0100)]
xfsprogs: Release v6.1.1

Update all the necessary files for a 6.1.1 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoAdd pkg version to debian changelog
Carlos Maiolino [Fri, 13 Jan 2023 17:55:32 +0000 (18:55 +0100)]
Add pkg version to debian changelog

Previous 2 release versions didn't include the pkg version
on debian/changelog.

Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfsprogs: scrub: fix warnings/errors due to missing include
Holger Hoffstätte [Fri, 6 Jan 2023 09:36:39 +0000 (10:36 +0100)]
xfsprogs: scrub: fix warnings/errors due to missing include

Gentoo is currently trying to rebuild the world with clang-16, uncovering exciting
new errors in many packages since several warnings have been turned into errors,
among them missing prototypes, as documented at:
https://discourse.llvm.org/t/clang-16-notice-of-potentially-breaking-changes/65562

xfsprogs came up, with details at https://bugs.gentoo.org/875050.

The problem was easy to find: a missing include for the u_init/u_cleanup
prototypes. The error:

Building scrub
     [CC]     unicrash.o
unicrash.c:746:2: error: call to undeclared function 'u_init'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
         u_init(&uerr);
         ^
unicrash.c:746:2: note: did you mean 'u_digit'?
/usr/include/unicode/uchar.h:4073:1: note: 'u_digit' declared here
u_digit(UChar32 ch, int8_t radix);
^
unicrash.c:754:2: error: call to undeclared function 'u_cleanup'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
         u_cleanup();
         ^
2 errors generated.

The complaint is valid and the fix is easy enough: just add the missing include.

Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfsprogs: Release v6.1.0 v6.1.0
Carlos Maiolino [Fri, 23 Dec 2022 09:58:01 +0000 (10:58 +0100)]
xfsprogs: Release v6.1.0

Update all the necessary files for a 6.1.0 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_db: fix dir3 block magic check origin/for-next_2022-12-29 origin/for-next_2022-12-30
Darrick J. Wong [Wed, 21 Dec 2022 00:53:34 +0000 (16:53 -0800)]
xfs_db: fix dir3 block magic check

Fix this broken check, which (amazingly) went unnoticed until I cranked
up the warning level /and/ built the system for s390x.

Fixes: e96864ff4d4 ("xfs_db: enable blockget for v5 filesystems")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agofsck.xfs: mount/umount xfs fs to replay log before running xfs_repair origin/for-next_2022-12-17 origin/for-next_2022-12-20
Srikanth C S [Tue, 13 Dec 2022 17:15:43 +0000 (22:45 +0530)]
fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair

After a recent data center crash, we had to recover root filesystems
on several thousands of VMs via a boot time fsck. Since these
machines are remotely manageable, support can inject the kernel
command line with 'fsck.mode=force fsck.repair=yes' to kick off
xfs_repair if the machine won't come up or if they suspect there
might be deeper issues with latent errors in the fs metadata, which
is what they did to try to get everyone running ASAP while
anticipating any future problems. But, fsck.xfs does not address the
journal replay in case of a crash.

fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
possible that when the machine crashes, the fs is in inconsistent
state with the journal log not yet replayed. This can drop the machine
into the rescue shell because xfs_fsck.sh does not know how to clean the
log. Since the administrator told us to force repairs, address the
deficiency by cleaning the log and rerunning xfs_repair.

Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
Replay the logs only if fsck.mode=force and fsck.repair=yes. For
other option -fa and -f drop to the rescue shell if repair detects
any corruptions.

Signed-off-by: Srikanth C S <srikanth.c.s@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_db: create separate struct and field definitions for finobts
Darrick J. Wong [Tue, 13 Dec 2022 19:39:48 +0000 (11:39 -0800)]
xfs_db: create separate struct and field definitions for finobts

Create separate field_t definitions for the free inode btree because db
needs to know that the interior block pointers point to finobt blocks,
not inobt blocks.  This is critical now because the buffer ops contain
magic numbers, the ->verify_struct routines use the magics listed in the
buffer ops, and the xfs_db iocursor calls the verifier functions.

Without this patch, xfs_db emits bizarre output like this:

# xfs_db -x /dev/sde -c 'agi 1' -c 'addr free_root' -c 'addr ptrs[1]' -c print 2>&1 | head
Metadata corruption detected at 0x55dda21258b0, xfs_inobt block 0x275c20/0x1000
magic = 0x46494233

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_io: don't display stripe alignment flags for realtime files
Darrick J. Wong [Tue, 13 Dec 2022 19:39:43 +0000 (11:39 -0800)]
xfs_io: don't display stripe alignment flags for realtime files

The stripe unit/width optimizations only occur on the data device, which
means that it makes no sense to report non-stripe-aligned realtime
extents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_repair: Fix rmaps_verify_btree() error path origin/for-next_2022-12-13
Carlos Maiolino [Thu, 1 Dec 2022 09:34:08 +0000 (10:34 +0100)]
xfs_repair: Fix rmaps_verify_btree() error path

Add proper exit error paths to avoid checking all pointers at the current path

Fixes-coverity-id: 1512654

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_repair: Fix check_refcount() error path
Carlos Maiolino [Thu, 1 Dec 2022 09:34:07 +0000 (10:34 +0100)]
xfs_repair: Fix check_refcount() error path

Add proper exit error paths to avoid checking all pointers at the current path

Fixes-coverity-id: 1512651

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agomkfs.xfs: add mkfs config file for the 6.1 LTS kernel
Darrick J. Wong [Wed, 23 Nov 2022 17:09:44 +0000 (09:09 -0800)]
mkfs.xfs: add mkfs config file for the 6.1 LTS kernel

Add a new mkfs config file to reflect the default featureset for the 6.1
LTS release.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_{db,repair}: fix XFS_REFC_COW_START usage
Darrick J. Wong [Wed, 23 Nov 2022 17:09:39 +0000 (09:09 -0800)]
xfs_{db,repair}: fix XFS_REFC_COW_START usage

This is really a bit field stashed in the upper bit of the rc_startblock
field, so change its usage patterns to use masking instead of integer
addition and subtraction.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_repair: retain superblock buffer to avoid write hook deadlock
Darrick J. Wong [Wed, 23 Nov 2022 17:09:33 +0000 (09:09 -0800)]
xfs_repair: retain superblock buffer to avoid write hook deadlock

Every now and then I experience the following deadlock in xfs_repair
when I'm running the offline repair fuzz tests:

#0  futex_wait (private=0, expected=2, futex_word=0x55555566df70) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x55555566df70, private=0) at ./nptl/lowlevellock.c:49
#2  lll_mutex_lock_optimized (mutex=0x55555566df70) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x55555566df70) at ./nptl/pthread_mutex_lock.c:93
#4  cache_shake (cache=cache@entry=0x55555566de60, priority=priority@entry=2, purge=purge@entry=false) at cache.c:231
#5  cache_node_get (cache=cache@entry=0x55555566de60, key=key@entry=0x7fffe55e01b0, nodep=nodep@entry=0x7fffe55e0168) at cache.c:452
#6  __cache_lookup (key=key@entry=0x7fffe55e01b0, flags=0, bpp=bpp@entry=0x7fffe55e0228) at rdwr.c:405
#7  libxfs_getbuf_flags (btp=0x55555566de00, blkno=0, len=<optimized out>, flags=<optimized out>, bpp=0x7fffe55e0228) at rdwr.c:457
#8  libxfs_buf_read_map (btp=0x55555566de00, map=map@entry=0x7fffe55e0280, nmaps=nmaps@entry=1, flags=flags@entry=0, bpp=bpp@entry=0x7fffe55e0278, ops=0x5555556233e0 <xfs_sb_buf_ops>)
    at rdwr.c:704
#9  libxfs_buf_read (ops=<optimized out>, bpp=0x7fffe55e0278, flags=0, numblks=<optimized out>, blkno=0, target=<optimized out>)
    at /storage/home/djwong/cdev/work/xfsprogs/build-x86_64/libxfs/libxfs_io.h:195
#10 libxfs_getsb (mp=mp@entry=0x7fffffffd690) at rdwr.c:162
#11 force_needsrepair (mp=0x7fffffffd690) at xfs_repair.c:924
#12 repair_capture_writeback (bp=<optimized out>) at xfs_repair.c:1000
#13 libxfs_bwrite (bp=0x7fffe011e530) at rdwr.c:869
#14 cache_shake (cache=cache@entry=0x55555566de60, priority=priority@entry=2, purge=purge@entry=false) at cache.c:240
#15 cache_node_get (cache=cache@entry=0x55555566de60, key=key@entry=0x7fffe55e0470, nodep=nodep@entry=0x7fffe55e0428) at cache.c:452
#16 __cache_lookup (key=key@entry=0x7fffe55e0470, flags=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:405
#17 libxfs_getbuf_flags (btp=0x55555566de00, blkno=12736, len=<optimized out>, flags=<optimized out>, bpp=0x7fffe55e0538) at rdwr.c:457
#18 __libxfs_buf_get_map (btp=<optimized out>, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flags=flags@entry=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:501
#19 libxfs_buf_get_map (btp=<optimized out>, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flags=flags@entry=1, bpp=bpp@entry=0x7fffe55e0538) at rdwr.c:525
#20 pf_queue_io (args=args@entry=0x5555556722c0, map=map@entry=0x7fffe55e05b0, nmaps=<optimized out>, flag=flag@entry=11) at prefetch.c:124
#21 pf_read_bmbt_reclist (args=0x5555556722c0, rp=<optimized out>, numrecs=78) at prefetch.c:220
#22 pf_scan_lbtree (dbno=dbno@entry=1211, level=level@entry=1, isadir=isadir@entry=1, args=args@entry=0x5555556722c0, func=0x55555557f240 <pf_scanfunc_bmap>) at prefetch.c:298
#23 pf_read_btinode (isadir=1, dino=<optimized out>, args=0x5555556722c0) at prefetch.c:385
#24 pf_read_inode_dirs (args=args@entry=0x5555556722c0, bp=bp@entry=0x7fffdc023790) at prefetch.c:459
#25 pf_read_inode_dirs (bp=<optimized out>, args=0x5555556722c0) at prefetch.c:411
#26 pf_batch_read (args=args@entry=0x5555556722c0, which=which@entry=PF_PRIMARY, buf=buf@entry=0x7fffd001d000) at prefetch.c:609
#27 pf_io_worker (param=0x5555556722c0) at prefetch.c:673
#28 start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

>From this stack trace, we see that xfs_repair's prefetch module is
getting some xfs_buf objects ahead of initiating a read (#19).  The
buffer cache has hit its limit, so it calls cache_shake (#14) to free
some unused xfs_bufs.  The buffer it finds is a dirty buffer, so it
calls libxfs_bwrite to flush it out to disk, which in turn invokes the
buffer write hook that xfs_repair set up in 3b7667cb to mark the ondisk
filesystem's superblock as NEEDSREPAIR until repair actually completes.

Unfortunately, the NEEDSREPAIR handler itself needs to grab the
superblock buffer, so it makes another call into the buffer cache (#9),
which sees that the cache is full and tries to shake it(#4).  Hence we
deadlock on cm_mutex because shaking is not reentrant.

Fix this by retaining a reference to the superblock buffer when possible
so that the writeback hook doesn't have to access the buffer cache to
set NEEDSREPAIR.

Fixes: 3b7667cb ("xfs_repair: set NEEDSREPAIR the first time we write to a filesystem")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_repair: don't crash on unknown inode parents in dry run mode
Darrick J. Wong [Wed, 23 Nov 2022 17:09:28 +0000 (09:09 -0800)]
xfs_repair: don't crash on unknown inode parents in dry run mode

Fuzz testing of directory block headers exposed a debug assertion vector
in xfs_repair.  In normal (aka fixit) mode, if a single-block directory
has a totally trashed block, repair will zap the entire directory.
Phase 4 ignores any dirents pointing to the zapped directory, phase 6
ignores the freed directory, and everything is good.

However, in dry run mode, we don't actually free the inode.  Phase 4
still ignores any dirents pointing to the zapped directory, but phase 6
thinks the inode is still live and tries to walk it.  xfs_repair doesn't
know of any parents for the zapped directory and so trips the assertion.

The assertion is critical for fixit mode because we need all the parent
information to ensure consistency of the directory tree.  In dry run
mode we don't care, because we only have to print inconsistencies and
return 1.  Worse yet, (our) customers file bugs when xfs_repair crashes
during a -n scan, so this will generate support calls.

Make everyone's life easier by downgrading the assertion to a warning if
we're running in dry run mode.

Found by fuzzing bhdr.hdr.bno = zeroes in xfs/471.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_db: fix printing of reverse mapping record blockcounts
Darrick J. Wong [Wed, 23 Nov 2022 17:09:22 +0000 (09:09 -0800)]
xfs_db: fix printing of reverse mapping record blockcounts

FLDT_EXTLEN is the correct type for a 32-bit block count within an AG;
FLDT_REXTLEN is the type for a 21-bit file mapping block count.  This
code should have been using the first type, not the second.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs_db: fix octal conversion logic
Darrick J. Wong [Wed, 23 Nov 2022 17:09:17 +0000 (09:09 -0800)]
xfs_db: fix octal conversion logic

Fix the backwards boolean logic here, which results in weird behavior.

# xfs_db -x -c /dev/sda
xfs_db> print fname
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
xfs_db> write fname "mo\0h5o"
fname = "mo\005o\000\000\000\000\000\000\000\000"
xfs_db> print fname
fname = "mo\005o\000\000\000\000\000\000\000\000"

Notice that we passed in octal-zero, 'h', '5', 'o', but the fs label is
set to octal-5, 'o' because of the incorrect loop logic.  -Wlogical-op
found this one.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agomisc: add missing includes
Darrick J. Wong [Wed, 23 Nov 2022 17:09:11 +0000 (09:09 -0800)]
misc: add missing includes

Add missing #include directives so that the compiler can typecheck
functions against their declarations.  IOWs, -Wmissing-declarations
found some things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agomisc: add static to various sourcefile-local functions
Darrick J. Wong [Wed, 23 Nov 2022 17:09:05 +0000 (09:09 -0800)]
misc: add static to various sourcefile-local functions

These helper functions are not referenced outside the source file
they're defined in.  Mark them static.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agolibxfs: consume the xfs_warn mountpoint argument
Darrick J. Wong [Wed, 23 Nov 2022 17:09:00 +0000 (09:09 -0800)]
libxfs: consume the xfs_warn mountpoint argument

Fix these warnings because xfs_warn doesn't do anything in userspace:

xfs_alloc.c: In function ‘xfs_alloc_get_rec’:
xfs_alloc.c:246:34: warning: unused variable ‘mp’ [-Wunused-variable]
  246 |         struct xfs_mount        *mp = cur->bc_mp;
      |                                  ^~

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: fix sb write verify for lazysbcount origin/for-next_2022-11-30
Long Li [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: fix sb write verify for lazysbcount

Source kernel commit: 7cecd500d90164419add650e26cc1de03a7a66cb

When lazysbcount is enabled, fsstress and loop mount/unmount test report
the following problems:

XFS (loop0): SB summary counter sanity check failed
XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460,
xfs_sb block 0x0
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00  XFSB.........(..
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a  i.|._.D..t....4Z
00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80  ..... ..........
00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00  ................
00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00  ................
00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19  ................
XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply
+0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580).  Shutting down filesystem.
XFS (loop0): Please unmount the filesystem and rectify the problem(s)
XFS (loop0): log mount/recovery failed: error -117
XFS (loop0): log mount failed

This corruption will shutdown the file system and the file system will
no longer be mountable. The following script can reproduce the problem,
but it may take a long time.

#!/bin/bash

device=/dev/sda
testdir=/mnt/test
round=0

function fail()
{
echo "$*"
exit 1
}

mkdir -p $testdir
while [ $round -lt 10000 ]
do
echo "******* round $round ********"
mkfs.xfs -f $device
mount $device $testdir || fail "mount failed!"
fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null &
sleep 4
killall -w fsstress
umount $testdir
xfs_repair -e $device > /dev/null
if [ $? -eq 2 ];then
echo "ERR CODE 2: Dirty log exception during repair."
exit 1
fi
round=$(($round+1))
done

With lazysbcount is enabled, There is no additional lock protection for
reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the
m_ifree, this will make the m_ifree greater than m_icount. For example,
consider the following sequence and ifreedelta is postive:

CPU0                            CPU1
xfs_log_sb                      xfs_trans_unreserve_and_mod_sb
----------                      ------------------------------
percpu_counter_sum(&mp->m_icount)
percpu_counter_add_batch(&mp->m_icount,
idelta, XFS_ICOUNT_BATCH)
percpu_counter_add(&mp->m_ifree, ifreedelta);
percpu_counter_sum(&mp->m_ifree)

After this, incorrect inode count (sb_ifree > sb_icount) will be writen to
the log. In the subsequent writing of sb, incorrect inode count (sb_ifree >
sb_icount) will fail to pass the boundary check in xfs_validate_sb_write()
that cause the file system shutdown.

When lazysbcount is enabled, we don't need to guarantee that Lazy sb
counters are completely correct, but we do need to guarantee that sb_ifree
<= sb_icount. On the other hand, the constraint that m_ifree <= m_icount
must be satisfied any time that there /cannot/ be other threads allocating
or freeing inode chunks. If the constraint is violated under these
circumstances, sb_i{count,free} (the ondisk superblock inode counters)
maybe incorrect and need to be marked sick at unmount, the count will
be rebuilt on the next mount.

Fixes: 8756a5af1819 ("libxfs: add more bounds checking to sb sanity checks")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: rename XFS_REFC_COW_START to _COWFLAG
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: rename XFS_REFC_COW_START to _COWFLAG

Source kernel commit: 8b972158afcaa66c538c3ee1d394f096fcd238a8

We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
a bit flag, even though it's *only* a bit flag.  Rename the variable to
reflect its nature and update the cast target since we're not supposed
to be comparing it to xfs_agblock_t now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: fix uninitialized list head in struct xfs_refcount_recovery
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: fix uninitialized list head in struct xfs_refcount_recovery

Source kernel commit: c1ccf967bf962b998f0c096e06a658ece27d10a0

We're supposed to initialize the list head of an object before adding it
to another list.  Fix that, and stop using the kmem_{alloc,free} calls
from the Irix days.

Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: fix agblocks check in the cow leftover recovery function
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: fix agblocks check in the cow leftover recovery function

Source kernel commit: f1fdc8207840672a46f26414f2c989ec078a153b

As we've seen, refcount records use the upper bit of the rc_startblock
field to ensure that all the refcount records are at the right side of
the refcount btree.  This works because an AG is never allowed to have
more than (1U << 31) blocks in it.  If we ever encounter a filesystem
claiming to have that many blocks, we absolutely do not want reflink
touching it at all.

However, this test at the start of xfs_refcount_recover_cow_leftovers is
slightly incorrect -- it /should/ be checking that agblocks isn't larger
than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the
constant is never large enough to conflict with that CoW flag.

Note that the V5 superblock verifier has not historically rejected
filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this
ended up in the COW recovery routine.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: check record domain when accessing refcount records
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: check record domain when accessing refcount records

Source kernel commit: f62ac3e0ac33d366fe81e194fee81de9be2cd886

Now that we've separated the startblock and CoW/shared extent domain in
the incore refcount record structure, check the domain whenever we
retrieve a record to ensure that it's still in the domain that we want.
Depending on the circumstances, a change in domain either means we're
done processing or that we've found a corruption and need to fail out.

The refcount check in xchk_xref_is_cow_staging is redundant since
_get_rec has done that for a long time now, so we can get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: remove XFS_FIND_RCEXT_SHARED and _COW
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: remove XFS_FIND_RCEXT_SHARED and _COW

Source kernel commit: 68d0f389179a52555cfd8fa3254e4adcd7576904

Now that we have an explicit enum for shared and CoW staging extents, we
can get rid of the old FIND_RCEXT flags.  Omit a couple of conversions
that disappear in the next patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: refactor domain and refcount checking
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: refactor domain and refcount checking

Source kernel commit: f492135df0aa0417337f9b8b1cc6d6a994d61d25

Create a helper function to ensure that CoW staging extent records have
a single refcount and that shared extent records have more than 1
refcount.  We'll put this to more use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: report refcount domain in tracepoints
Darrick J. Wong [Fri, 18 Nov 2022 11:23:57 +0000 (12:23 +0100)]
xfs: report refcount domain in tracepoints

Source kernel commit: 571423a162cd86acb1b010a01c6203369586daa6

Now that we've broken out the startblock and shared/cow domain in the
incore refcount extent record structure, update the tracepoints to
report the domain.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: track cow/shared record domains explicitly in xfs_refcount_irec
Darrick J. Wong [Fri, 18 Nov 2022 11:23:54 +0000 (12:23 +0100)]
xfs: track cow/shared record domains explicitly in xfs_refcount_irec

Source kernel commit: 9a50ee4f8db6e4dd0d8d757b7adaf0591776860a

Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents.  The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
blocks, so the bit was free.

Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set.  This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.

Fuzzing demonstrates practical exploits of this weakness.  If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain.  This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.

To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.

Found by fuzzing keys[2].cowflag = add in xfs/464.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: move _irec structs to xfs_types.h
Darrick J. Wong [Fri, 18 Nov 2022 10:03:18 +0000 (11:03 +0100)]
xfs: move _irec structs to xfs_types.h

Source kernel commit: 9e7e2436c159490fbbadbc4b5a4ee6bc30dae02e

Structure definitions for incore objects do not belong in the ondisk
format header.  Move them to the incore types header where they belong.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: check deferred refcount op continuation parameters
Darrick J. Wong [Fri, 18 Nov 2022 10:02:45 +0000 (11:02 +0100)]
xfs: check deferred refcount op continuation parameters

Source kernel commit: 8edbe0cf8b4bbe2cf47513998641797b0aca8ee2

If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent.  Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.

If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block.  If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.

The previous patch should take care of that problem, but let's add this
extra sanity check to stop corruption problems sooner than later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: create a predicate to verify per-AG extents
Darrick J. Wong [Fri, 18 Nov 2022 10:01:45 +0000 (11:01 +0100)]
xfs: create a predicate to verify per-AG extents

Source kernel commit: b65e08f83b119ae9345ed23d4da357a72b3cb55c

Create a predicate function to verify that a given agbno/blockcount pair
fit entirely within a single allocation group and don't suffer
mathematical overflows.  Refactor the existng open-coded logic; we're
going to add more calls to this function in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: make sure aglen never goes negative in xfs_refcount_adjust_extents
Darrick J. Wong [Fri, 18 Nov 2022 10:00:45 +0000 (11:00 +0100)]
xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents

Source kernel commit: f850995f60e49818093ef5e477cdb0ff2c11a0a4

Prior to calling xfs_refcount_adjust_extents, we trimmed agbno/aglen
such that the end of the range would not be in the middle of a refcount
record.  If this is no longer the case, something is seriously wrong
with the btree.  Bail out with a corruption error.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: refactor all the EFI/EFD log item sizeof logic
Darrick J. Wong [Fri, 18 Nov 2022 09:59:45 +0000 (10:59 +0100)]
xfs: refactor all the EFI/EFD log item sizeof logic

Source kernel commit: 3c5aaaced99912c9fb3352fc5af5b104df67d4aa

Refactor all the open-coded sizeof logic for EFI/EFD log item and log
format structures into common helper functions whose names reflect the
struct names.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: fix memcpy fortify errors in EFI log format copying
Darrick J. Wong [Fri, 18 Nov 2022 09:54:10 +0000 (10:54 +0100)]
xfs: fix memcpy fortify errors in EFI log format copying

Source kernel commit: 03a7485cd701e1c08baadcf39d9592d83715e224

Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy.  Since we're already fixing problems with BUI item copying, we
should fix it everything else.

An extra difficulty here is that the ef[id]_extents arrays are declared
as single-element arrays.  This is not the convention for flex arrays in
the modern kernel, and it causes all manner of problems with static
checking tools, since they often cannot tell the difference between a
single element array and a flex array.

So for starters, change those array[1] declarations to array[]
declarations to signal that they are proper flex arrays and adjust all
the "size-1" expressions to fit the new declaration style.

Next, refactor the xfs_efi_copy_format function to handle the copying of
the head and the flex array members separately.  While we're at it, fix
a minor validation deficiency in the recovery function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: increase rename inode reservation
Allison Henderson [Fri, 18 Nov 2022 09:48:26 +0000 (10:48 +0100)]
xfs: increase rename inode reservation

Source kernel commit: e07ee6fe21f47cfd72ae566395c67a80e7c66163

xfs_rename can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip
and wip.  So we need to increase the inode reservation to match.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: fix exception caused by unexpected illegal bestcount in leaf dir
Guo Xuenan [Fri, 18 Nov 2022 09:48:09 +0000 (10:48 +0100)]
xfs: fix exception caused by unexpected illegal bestcount in leaf dir

Source kernel commit: 13cf24e00665c9751951a422756d975812b71173

For leaf dir, In most cases, there should be as many bestfree slots
as the dir data blocks that can fit under i_size (except for [1]).

Root cause is we don't examin the number bestfree slots, when the slots
number less than dir data blocks, if we need to allocate new dir data
block and update the bestfree array, we will use the dir block number as
index to assign bestfree array, while we did not check the leaf buf
boundary which may cause UAF or other memory access problem. This issue
can also triggered with test cases xfs/473 from fstests.

According to Dave Chinner & Darrick's suggestion, adding buffer verifier
to detect this abnormal situation in time.
Simplify the testcase for fstest xfs/554 [1]

The error log is shown as follows:
==================================================================
BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0
Write of size 2 at addr ffff88810168b000 by task touch/1552
CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
print_report.cold+0xf6/0x691
kasan_report+0xa8/0x120
xfs_dir2_leaf_addname+0x1995/0x1ac0
xfs_dir_createname+0x58c/0x7f0
xfs_create+0x7af/0x1010
xfs_generic_create+0x270/0x5e0
path_openat+0x270b/0x3450
do_filp_open+0x1cf/0x2b0
do_sys_openat2+0x46b/0x7a0
do_sys_open+0xb7/0x130
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000
</TASK>

The buggy address belongs to the physical page:
page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x10168b
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000
raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78
XDD3[S5........x
XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file
fs/xfs/libxfs/xfs_dir2_data.c.  Caller
xfs_dir2_data_use_free+0x28a/0xeb0
CPU: 5 PID: 1552 Comm: touch Tainted: G    B              6.0.0-rc3+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
xfs_corruption_error+0x132/0x150
xfs_dir2_data_use_free+0x198/0xeb0
xfs_dir2_leaf_addname+0xa59/0x1ac0
xfs_dir_createname+0x58c/0x7f0
xfs_create+0x7af/0x1010
xfs_generic_create+0x270/0x5e0
path_openat+0x270b/0x3450
do_filp_open+0x1cf/0x2b0
do_sys_openat2+0x46b/0x7a0
do_sys_open+0xb7/0x130
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000
</TASK>
XFS (sdb): Corruption detected. Unmount and run xfs_repair

[1] https://lore.kernel.org/all/20220928095355.2074025-1-guoxuenan@huawei.com/
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agotreewide: use get_random_u32() when possible
Jason A. Donenfeld [Fri, 18 Nov 2022 09:47:58 +0000 (10:47 +0100)]
treewide: use get_random_u32() when possible

Source kernel commit: a251c17aa558d8e3128a528af5cf8b9d7caae4fd

The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agotreewide: use prandom_u32_max() when possible, part 1
Jason A. Donenfeld [Fri, 18 Nov 2022 09:46:42 +0000 (10:46 +0100)]
treewide: use prandom_u32_max() when possible, part 1

Source kernel commit: 81895a65ec63ee1daec3255dc1a06675d2fbe915

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
-       T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx
Shida Zhang [Fri, 18 Nov 2022 09:46:41 +0000 (10:46 +0100)]
xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx

Source kernel commit: c098576f5f63bc0ee2424bba50892514a71d54e8

xfs_dir2_isleaf is used to see if the directory is a single-leaf
form directory instead, as commented right above the function.

Besides getting rid of the broken comment, we rearrange the logic by
converting everything over to standard formatting and conventions,
at the same time, to make it easier to understand and self documenting.

Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: trim the mapp array accordingly in xfs_da_grow_inode_int
Shida Zhang [Fri, 18 Nov 2022 09:46:33 +0000 (10:46 +0100)]
xfs: trim the mapp array accordingly in xfs_da_grow_inode_int

Source kernel commit: 44159659df8ca381b84261e11058b2176fa03ba0

Take a look at the for-loop in xfs_da_grow_inode_int:
======
for(){
nmap = min(XFS_BMAP_MAX_NMAP, count);
...
error = xfs_bmapi_write(...,&mapp[mapi], &nmap);//(..., $1, $2)
...
mapi += nmap;
}
=====
where $1 stands for the start address of the array,
while $2 is used to indicate the size of the array.

The array $1 will advance by $nmap in each iteration after
the allocation of extents.
But the size $2 still remains unchanged, which is determined by
min(XFS_BMAP_MAX_NMAP, count).

It seems that it has forgotten to trim the mapp array after each
iteration, so change it.

Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: Remove the unneeded result variable
ye xingchen [Fri, 18 Nov 2022 09:46:32 +0000 (10:46 +0100)]
xfs: Remove the unneeded result variable

Source kernel commit: abda5271f8ec6e9a84ae8129ddc59226c89def7a

Return the value xfs_dir_cilookup_result() directly instead of storing it
in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfs: clean up "%Ld/%Lu" which doesn't meet C standard
Zeng Heng [Fri, 18 Nov 2022 09:46:04 +0000 (10:46 +0100)]
xfs: clean up "%Ld/%Lu" which doesn't meet C standard

Source kernel commit: 78b0f58bdfef45aa9f3c7fbbd9b4d41abad6d85f

The "%Ld" specifier, which represents long long unsigned,
doesn't meet C language standard, and even more,
it makes people easily mistake with "%ld", which represent
long unsigned. So replace "%Ld" with "lld".

Do the same with "%Lu".

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoxfsprogs: Release v6.0.0 v6.0.0
Carlos Maiolino [Fri, 11 Nov 2022 10:48:00 +0000 (11:48 +0100)]
xfsprogs: Release v6.0.0

Update all the necessary files for a 6.0.0 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>
2 years agoPolish translation update for xfsprogs 5.19.0.
Jakub Bogusz [Mon, 14 Nov 2022 11:04:37 +0000 (12:04 +0100)]
Polish translation update for xfsprogs 5.19.0.

Signed-off-by: Jakub Bogusz <qboosh@pld-linux.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>