This only happens when the omap fetch fails or the fnode is corrupt. MDS can't
presently repair that damage. Without this change, the MDS enters an infinite loop of repair:
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 MDSContext::complete: 12C_RetryScrub
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs entering with 0 in progress and 1 in the stack
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 ap=1+0 state=
1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=1 scrubqueue=1 0x55b1a50fa880]
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.den(0x10000000000 dir_xx) scrubbing [dentry #0x1/dir_x/dir_xx [2,head] auth (dversion lock) pv=0 v=8 ino=0x10000000001 state=
1073741824 0x55b1a50eaf00] next_seq = 2
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b1a50da240) get_snaps (seq 1 cached_seq 1)
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.scrubstack _enqueue with {[inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680]}, top=0
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.ino(0x10000000001) scrub_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.ino(0x10000000001) uninline_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack enqueue [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680] to bottom of ScrubStack
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000000) get_num_head_items() = 1; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000000) total of child dentries: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000000) my rstats: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.cache.dir(0x10000000000) check_rstats complete on 0x55b1a50fa880
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000000) scrub_finished
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b1a4f7b600 on [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=
1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] count now 0
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag done
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs dirfrag, done
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack dequeue [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=
1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] from ScrubStack
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000000) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.scrubstack scrub_dir_inode [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
2025-01-28T19:25:46.153+0000
7f9626cc5640 15 mds.0.cache.ino(0x10000000001) maybe_export_pin update=0 [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.cache.dir(0x10000000001) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode barebones [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=
1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.cache.dir(0x10000000001) fetch_keys 0 keys on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=
1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000
7f9626cc5640 10 mds.0.cache.dir(0x10000000001) auth_pin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=
1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180] count now 1
2025-01-28T19:25:46.153+0000
7f9626cc5640 1 -- [v2:172.21.10.4:6867/
526112796,v1:172.21.10.4:6872/
526112796] --> [v2:172.21.10.4:6802/
3852331191,v1:172.21.10.4:6803/
3852331191] -- osd_op(unknown.0.340:50 42.7 42:
e2e07930:::
10000000001.
00000000:head [omap-get-header,omap-get-vals-by-keys in=4b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e564) -- 0x55b1a50d8c00 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000
7f9626cc5640 20 mds.0.bal hit_dir 3 pop is 1, frag * size 0 [pop IRD:[C 0.00e+00] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 1.00e+00] STR:[C 0.00e+00] *LOAD:2.0]
2025-01-28T19:25:46.153+0000
7f962ecd5640 1 -- [v2:172.21.10.4:6867/
526112796,v1:172.21.10.4:6872/
526112796] <== osd.0 v2:172.21.10.4:6802/
3852331191 3 ==== osd_op_reply(50
10000000001.
00000000 [omap-get-header,omap-get-vals-by-keys,getxattr] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) ==== 248+0+0 (crc 0 0 0) 0x55b1a4444280 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000
7f96254c2640 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000
7f96254c2640 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000
7f96254c2640 10 mds.0.cache.dir(0x10000000001) _fetched header 0 bytes 0 keys for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=
1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000
7f96254c2640 0 mds.0.cache.dir(0x10000000001) _fetched missing object for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=
1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000
7f96254c2640 -1 log_channel(cluster) log [ERR] : dir 0x10000000001 object missing on disk; some files may be lost (/dir_x/dir_xx)
2025-01-28T19:25:46.153+0000
7f96254c2640 10 mds.0.cache.dir(0x10000000001) go_bad *
2025-01-28T19:25:46.153+0000
7f96254c2640 10 mds.0.cache.dir(0x10000000001) auth_unpin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=
1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180] count now 0
2025-01-28T19:25:46.153+0000
7f96254c2640 11 mds.0.cache.dir(0x10000000001) finish_waiting mask 2 result -5 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=
1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000
7f96254c2640 10 MDSContext::complete: 12C_RetryScrub
Note that this partially reverts
5b56098f17. That commit incorrectly marked a
dirfrag as repaired when it may not even exist in the metadata pool.
Fixes: 5b56098f17dd9abe4c15cbc7f487c0e94841beaf
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>