Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=
25003319/
25003319 fqs=829
rcu: (t=2101 jiffies g=
79058445 q=698988 ncpus=192)
CPU: 87 UID:
2952868916 PID:
3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:
ff745a70c8253898 EFLAGS:
00000282
RAX:
ff26e470054cb208 RBX:
ff26e470054cb208 RCX:
000000006e958966
RDX:
ff26e48267340000 RSI:
ff745a70c82539b0 RDI:
ff26e458f74655c0
RBP:
000000006e958966 R08:
0000000000000180 R09:
9cd08d909b919a89
R10:
ff26e458f74655c0 R11:
0000000000000000 R12:
ff26e458f74655c0
R13:
ff745a70c82539b0 R14:
d0d0d0d0d0d0d0d0 R15:
2f2f2f2f2f2f2f2f
FS:
00007f5770896980(0000) GS:
ff26e482c5d88000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f5764de50c0 CR3:
000000a72abb5001 CR4:
0000000000771ef0
PKRU:
55555554
Call Trace:
<TASK>
lookup_fast+0x9f/0x100
walk_component+0x1f/0x150
link_path_walk+0x20e/0x3d0
path_lookupat+0x68/0x180
filename_lookup+0xdc/0x1e0
vfs_statx+0x6c/0x140
vfs_fstatat+0x67/0xa0
__do_sys_newfstatat+0x24/0x60
do_syscall_64+0x6a/0x230
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.
Fixes: 2817b000b02c ("ceph: directory operations")
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>