syzbot |
sign-in | mailing list | source | docs |
| ID | Workflow | Result | Correct | Bug | Created | Started | Finished | Revision | Error |
|---|---|---|---|---|---|---|---|---|---|
| f90997fa-eb7a-4b86-943a-7b9ee6957e56 | assessment-kcsan | Benign: ✅ Confident: ✅ | ❓ | KCSAN: data-race in __d_lookup_rcu_op_compare / __d_rehash (6) | 2026/02/24 16:46 | 2026/02/24 16:46 | 2026/02/24 16:57 | 305c0ec5cd886e2d13738e28e1b2df9b0ec20fc9 |
The data race occurs on the `pprev` field of `struct hlist_bl_node` within a `struct dentry`'s `d_hash` field. ### Analysis **1. The Accesses:** * **Reader:** `__d_lookup_rcu_op_compare` calls `d_unhashed(dentry)`, which in turn calls `hlist_bl_unhashed(&dentry->d_hash)`. This function checks if `dentry->d_hash.pprev` is NULL. * **Writer:** `__d_rehash` calls `hlist_bl_add_head_rcu(&entry->d_hash, b)`. This function sets `entry->d_hash.pprev = &b->first` before making the dentry visible to RCU readers by updating the hash bucket head (`b->first`) using `rcu_assign_pointer`. **2. RCU Guarantees:** The dentry being added was previously unhashed (its `pprev` was NULL). The writer initializes the dentry's hash list pointers (`next` and `pprev`) and then uses `rcu_assign_pointer` to insert it into the hash table. `rcu_assign_pointer` includes a release barrier, ensuring that the initialization of `pprev` is visible to any reader that sees the dentry via the hash table. The reader uses `hlist_bl_for_each_entry_rcu`, which employs `rcu_dereference` (or equivalent) to follow the pointers. This provides the necessary consume/acquire semantics to ensure that the reader sees the initialized `pprev` value. **3. Why KCSAN reports it:** KCSAN flags this as a data race because the read in `hlist_bl_unhashed` and the write in `hlist_bl_add_head_rcu` are plain memory accesses without explicit atomic markers (like `READ_ONCE`/`WRITE_ONCE`). While technically a data race according to the memory model, it is a standard and safe RCU pattern in the kernel. **4. Impact of the Race:** Even if the reader were to see a NULL value for `pprev` due to some extreme reordering or compiler optimization (which RCU is designed to prevent), the result would be that `d_unhashed` returns true. The reader would then `continue` and skip this dentry. This is a safe "false negative" in the RCU lookup path. The VFS handles such misses by falling back to a slower, locked path (`d_alloc_parallel` slow path or full lookup), which will correctly find or create the dentry. No memory corruption, crash, or inconsistent state occurs. ### Conclusion The data race is **benign**. It is a consequence of standard RCU-protected list insertion where the initialization of a node's metadata races with a reader's check of that metadata, but the RCU model ensures logical consistency and the fallback paths handle any transient inconsistencies.
================================================================== BUG: KCSAN: data-race in __d_lookup_rcu_op_compare / __d_rehash read to 0xffff8881006c5490 of 8 bytes by task 3320 on cpu 0: hlist_bl_unhashed include/linux/list_bl.h:54 [inline] d_unhashed include/linux/dcache.h:366 [inline] __d_lookup_rcu_op_compare+0x104/0x270 fs/dcache.c:2244 __d_lookup_rcu+0x2bf/0x2d0 fs/dcache.c:2307 d_alloc_parallel+0x24d/0xcd0 fs/dcache.c:2613 lookup_open fs/namei.c:4366 [inline] open_last_lookups fs/namei.c:4549 [inline] path_openat+0x8c6/0x23b0 fs/namei.c:4793 do_filp_open+0x109/0x230 fs/namei.c:4823 do_sys_openat2+0xa6/0x150 fs/open.c:1430 do_sys_open fs/open.c:1436 [inline] __do_sys_openat fs/open.c:1452 [inline] __se_sys_openat fs/open.c:1447 [inline] __x64_sys_openat+0xf2/0x120 fs/open.c:1447 x64_sys_call+0x2b07/0x3000 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xca/0x2b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f write to 0xffff8881006c5490 of 8 bytes by task 3319 on cpu 1: hlist_bl_add_head_rcu include/linux/rculist_bl.h:81 [inline] __d_rehash+0x9e/0x1f0 fs/dcache.c:2534 __d_add+0x344/0x4a0 fs/dcache.c:2759 d_splice_alias_ops+0xdc/0x280 fs/dcache.c:3100 proc_sys_lookup+0x1ed/0x310 fs/proc/proc_sysctl.c:544 lookup_open fs/namei.c:4422 [inline] open_last_lookups fs/namei.c:4549 [inline] path_openat+0xed5/0x23b0 fs/namei.c:4793 do_filp_open+0x109/0x230 fs/namei.c:4823 do_sys_openat2+0xa6/0x150 fs/open.c:1430 do_sys_open fs/open.c:1436 [inline] __do_sys_openat fs/open.c:1452 [inline] __se_sys_openat fs/open.c:1447 [inline] __x64_sys_openat+0xf2/0x120 fs/open.c:1447 x64_sys_call+0x2b07/0x3000 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xca/0x2b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 3319 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(voluntary) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 ================================================================== chnl_net:caif_netlink_parms(): no params data found bridge0: port 1(bridge_slave_0) entered blocking state bridge0: port 1(bridge_slave_0) entered disabled state bridge_slave_0: entered allmulticast mode bridge_slave_0: entered promiscuous mode bridge0: port 2(bridge_slave_1) entered blocking state bridge0: port 2(bridge_slave_1) entered disabled state bridge_slave_1: entered allmulticast mode bridge_slave_1: entered promiscuous mode bond0: (slave bond_slave_0): Enslaving as an active interface with an up link bond0: (slave bond_slave_1): Enslaving as an active interface with an up link team0: Port device team_slave_0 added team0: Port device team_slave_1 added batman_adv: batadv0: Adding interface: batadv_slave_0 batman_adv: batadv0: The MTU of interface batadv_slave_0 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1532 would solve the problem. batman_adv: batadv0: Not using interface batadv_slave_0 (retrying later): interface not active batman_adv: batadv0: Adding interface: batadv_slave_1 batman_adv: batadv0: The MTU of interface batadv_slave_1 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1532 would solve the problem. batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active hsr_slave_0: entered promiscuous mode hsr_slave_1: entered promiscuous mode debugfs: 'hsr0' already exists in 'hsr' Cannot create hsr debugfs directory netdevsim netdevsim4 netdevsim0: renamed from eth0 netdevsim netdevsim4 netdevsim1: renamed from eth1 netdevsim netdevsim4 netdevsim2: renamed from eth2 netdevsim netdevsim4 netdevsim3: renamed from eth3 8021q: adding VLAN 0 to HW filter on device bond0 8021q: adding VLAN 0 to HW filter on device team0 8021q: adding VLAN 0 to HW filter on device batadv0 veth0_vlan: entered promiscuous mode veth1_vlan: entered promiscuous mode veth0_macvtap: entered promiscuous mode veth1_macvtap: entered promiscuous mode batman_adv: batadv0: Interface activated: batadv_slave_0 batman_adv: batadv0: Interface activated: batadv_slave_1
| Seq | Timestamp | Type | Name | Duration |
|---|---|---|---|---|
| 0/0 | 2026/02/24 16:46 | flow | assessment-kcsan |
11mResults:map[Benign:true Confident:true Explanation:The data race occurs on the `pprev` field of `struct hlist_bl_node` within a `struct dentry`'s `d_hash` field. ### Analysis **1. The Accesses:** * **Reader:** `__d_lookup_rcu_op_compare` calls `d_unhashed(dentry)`, which in turn calls `hlist_bl_unhashed(&dentry->d_hash)`. This function checks if `dentry->d_hash.pprev` is NULL. * **Writer:** `__d_rehash` calls `hlist_bl_add_head_rcu(&entry->d_hash, b)`. This function sets `entry->d_hash.pprev = &b->first` before making the dentry visible to RCU readers by updating the hash bucket head (`b->first`) using `rcu_assign_pointer`. **2. RCU Guarantees:** The dentry being added was previously unhashed (its `pprev` was NULL). The writer initializes the dentry's hash list pointers (`next` and `pprev`) and then uses `rcu_assign_pointer` to insert it into the hash table. `rcu_assign_pointer` includes a release barrier, ensuring that the initialization of `pprev` is visible to any reader that sees the dentry via the hash table. The reader uses `hlist_bl_for_each_entry_rcu`, which employs `rcu_dereference` (or equivalent) to follow the pointers. This provides the necessary consume/acquire semantics to ensure that the reader sees the initialized `pprev` value. **3. Why KCSAN reports it:** KCSAN flags this as a data race because the read in `hlist_bl_unhashed` and the write in `hlist_bl_add_head_rcu` are plain memory accesses without explicit atomic markers (like `READ_ONCE`/`WRITE_ONCE`). While technically a data race according to the memory model, it is a standard and safe RCU pattern in the kernel. **4. Impact of the Race:** Even if the reader were to see a NULL value for `pprev` due to some extreme reordering or compiler optimization (which RCU is designed to prevent), the result would be that `d_unhashed` returns true. The reader would then `continue` and skip this dentry. This is a safe "false negative" in the RCU lookup path. The VFS handles such misses by falling back to a slower, locked path (`d_alloc_parallel` slow path or full lookup), which will correctly find or create the dentry. No memory corruption, crash, or inconsistent state occurs. ### Conclusion The data race is **benign**. It is a consequence of standard RCU-protected list insertion where the initialization of a node's metadata races with a reader's check of that metadata, but the RCU model ensures logical consistency and the fallback paths handle any transient inconsistencies.] |
| 1/1 | 2026/02/24 16:46 | action | kernel-checkouter |
0mResults:map[KernelSrc:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/src/bb9c10b1d9a9ef990c29d8abcb2067137f291792] |
| 2/1 | 2026/02/24 16:47 | action | kernel-builder |
3mResults:map[KernelObj:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/build/0b9189a93b55effa815ab256918d4409fe273214] |
| 3/1 | 2026/02/24 16:51 | action | codesearch-prepare |
3mResults:map[Index:codesearch-index] |
| 4/1 | 2026/02/24 16:54 | agent | expert |
3mModel:gemini-3-flash-preview Results: map[Benign:true Confident:true] Instruction: You are an experienced Linux kernel developer tasked with determining if the given kernel data race is benign or not. The data race report is from KCSAN tool. It contains 2 stack traces of the memory accesses that constitute a data race. A "benign" data races are on a simple int/bool variable or similar field, and the accesses are not supposed to be protected by any mutual exclusion primitives. Common examples of such "benign" data races are accesses to various flags fields, statistics counters, and similar. A "benign" data race does not lead to memory corruption/crash with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A non-benign (or "harmful" data race) can lead to corruption/crash even with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A common example of a "harmful" data races is race on a complex container (list/hashmap/etc), where accesses are supposed to be protected by a mutual exclusion primitive. Also consider races that happen at the same time with the given one. If there is no synchronization in between, other memory accesses in the involved threads race with each other if they access the same memory. For example, if both threads execute: some_struct->have_elements = true; list_add(new_node, &some_struct->list_head); the race on some_struct->have_elements may appear benign, however it also implies there is a race on some_struct->list_head which is not benign, since the list is not thread-safe. Take into account that on 32-bit systems 64-bit memory accesses may be split into two accesses, and thus even with a conservative compiler may not be fully atomic. However, such races may still be benign depending on how writes are done, and how read data is used. In the final reply explain why you think the given data race is benign or is harmful. Use the provided tools to confirm any assumptions, variables/fields being accessed, etc. In particular, don't make assumptions about the kernel source code, use codesearch tools to read the actual source code. Prefer calling several tools at the same time to save round-trips. Use set-results tool to provide results of the analysis. It must be called exactly once before the final reply. Ignore results of this tool. Prompt: The data race report is: ================================================================== BUG: KCSAN: data-race in __d_lookup_rcu_op_compare / __d_rehash read to 0xffff8881006c5490 of 8 bytes by task 3320 on cpu 0: hlist_bl_unhashed include/linux/list_bl.h:54 [inline] d_unhashed include/linux/dcache.h:366 [inline] __d_lookup_rcu_op_compare+0x104/0x270 fs/dcache.c:2244 __d_lookup_rcu+0x2bf/0x2d0 fs/dcache.c:2307 d_alloc_parallel+0x24d/0xcd0 fs/dcache.c:2613 lookup_open fs/namei.c:4366 [inline] open_last_lookups fs/namei.c:4549 [inline] path_openat+0x8c6/0x23b0 fs/namei.c:4793 do_filp_open+0x109/0x230 fs/namei.c:4823 do_sys_openat2+0xa6/0x150 fs/open.c:1430 do_sys_open fs/open.c:1436 [inline] __do_sys_openat fs/open.c:1452 [inline] __se_sys_openat fs/open.c:1447 [inline] __x64_sys_openat+0xf2/0x120 fs/open.c:1447 x64_sys_call+0x2b07/0x3000 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xca/0x2b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f write to 0xffff8881006c5490 of 8 bytes by task 3319 on cpu 1: hlist_bl_add_head_rcu include/linux/rculist_bl.h:81 [inline] __d_rehash+0x9e/0x1f0 fs/dcache.c:2534 __d_add+0x344/0x4a0 fs/dcache.c:2759 d_splice_alias_ops+0xdc/0x280 fs/dcache.c:3100 proc_sys_lookup+0x1ed/0x310 fs/proc/proc_sysctl.c:544 lookup_open fs/namei.c:4422 [inline] open_last_lookups fs/namei.c:4549 [inline] path_openat+0xed5/0x23b0 fs/namei.c:4793 do_filp_open+0x109/0x230 fs/namei.c:4823 do_sys_openat2+0xa6/0x150 fs/open.c:1430 do_sys_open fs/open.c:1436 [inline] __do_sys_openat fs/open.c:1452 [inline] __se_sys_openat fs/open.c:1447 [inline] __x64_sys_openat+0xf2/0x120 fs/open.c:1447 x64_sys_call+0x2b07/0x3000 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xca/0x2b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 3319 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(voluntary) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 ================================================================== chnl_net:caif_netlink_parms(): no params data found bridge0: port 1(bridge_slave_0) entered blocking state bridge0: port 1(bridge_slave_0) entered disabled state bridge_slave_0: entered allmulticast mode bridge_slave_0: entered promiscuous mode bridge0: port 2(bridge_slave_1) entered blocking state bridge0: port 2(bridge_slave_1) entered disabled state bridge_slave_1: entered allmulticast mode bridge_slave_1: entered promiscuous mode bond0: (slave bond_slave_0): Enslaving as an active interface with an up link bond0: (slave bond_slave_1): Enslaving as an active interface with an up link team0: Port device team_slave_0 added team0: Port device team_slave_1 added batman_adv: batadv0: Adding interface: batadv_slave_0 batman_adv: batadv0: The MTU of interface batadv_slave_0 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1532 would solve the problem. batman_adv: batadv0: Not using interface batadv_slave_0 (retrying later): interface not active batman_adv: batadv0: Adding interface: batadv_slave_1 batman_adv: batadv0: The MTU of interface batadv_slave_1 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1532 would solve the problem. batman_adv: batadv0: Not using interface batadv_slave_1 (retrying later): interface not active hsr_slave_0: entered promiscuous mode hsr_slave_1: entered promiscuous mode debugfs: 'hsr0' already exists in 'hsr' Cannot create hsr debugfs directory netdevsim netdevsim4 netdevsim0: renamed from eth0 netdevsim netdevsim4 netdevsim1: renamed from eth1 netdevsim netdevsim4 netdevsim2: renamed from eth2 netdevsim netdevsim4 netdevsim3: renamed from eth3 8021q: adding VLAN 0 to HW filter on device bond0 8021q: adding VLAN 0 to HW filter on device team0 8021q: adding VLAN 0 to HW filter on device batadv0 veth0_vlan: entered promiscuous mode veth1_vlan: entered promiscuous mode veth0_macvtap: entered promiscuous mode veth1_macvtap: entered promiscuous mode batman_adv: batadv0: Interface activated: batadv_slave_0 batman_adv: batadv0: Interface activated: batadv_slave_1 Reply: The data race occurs on the `pprev` field of `struct hlist_bl_node` within a `struct dentry`'s `d_hash` field. ### Analysis **1. The Accesses:** * **Reader:** `__d_lookup_rcu_op_compare` calls `d_unhashed(dentry)`, which in turn calls `hlist_bl_unhashed(&dentry->d_hash)`. This function checks if `dentry->d_hash.pprev` is NULL. * **Writer:** `__d_rehash` calls `hlist_bl_add_head_rcu(&entry->d_hash, b)`. This function sets `entry->d_hash.pprev = &b->first` before making the dentry visible to RCU readers by updating the hash bucket head (`b->first`) using `rcu_assign_pointer`. **2. RCU Guarantees:** The dentry being added was previously unhashed (its `pprev` was NULL). The writer initializes the dentry's hash list pointers (`next` and `pprev`) and then uses `rcu_assign_pointer` to insert it into the hash table. `rcu_assign_pointer` includes a release barrier, ensuring that the initialization of `pprev` is visible to any reader that sees the dentry via the hash table. The reader uses `hlist_bl_for_each_entry_rcu`, which employs `rcu_dereference` (or equivalent) to follow the pointers. This provides the necessary consume/acquire semantics to ensure that the reader sees the initialized `pprev` value. **3. Why KCSAN reports it:** KCSAN flags this as a data race because the read in `hlist_bl_unhashed` and the write in `hlist_bl_add_head_rcu` are plain memory accesses without explicit atomic markers (like `READ_ONCE`/`WRITE_ONCE`). While technically a data race according to the memory model, it is a standard and safe RCU pattern in the kernel. **4. Impact of the Race:** Even if the reader were to see a NULL value for `pprev` due to some extreme reordering or compiler optimization (which RCU is designed to prevent), the result would be that `d_unhashed` returns true. The reader would then `continue` and skip this dentry. This is a safe "false negative" in the RCU lookup path. The VFS handles such misses by falling back to a slower, locked path (`d_alloc_parallel` slow path or full lookup), which will correctly find or create the dentry. No memory corruption, crash, or inconsistent state occurs. ### Conclusion The data race is **benign**. It is a consequence of standard RCU-protected list insertion where the initialization of a node's metadata races with a reader's check of that metadata, but the RCU model ensures logical consistency and the fallback paths handle any transient inconsistencies. |
| 5/2 | 2026/02/24 16:54 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 4920 output: 24 thoughts: 274 Thoughts: **Analyzing the Data Race in the Dentry Hashing**
Okay, so the report indicates a data race. My expertise tells me we're looking at concurrent access issues within the kernel, specifically concerning dentry lookups and rehashing. The race involves `__d_lookup_rcu_op_compare` reading and `__d_rehash` writing to the memory location `0xffff8881006c5490`. We've got a read from `hlist_bl_unhashed`, called by `d_unhashed`, and a write from `hlist_bl_add_head_rcu` called by `__d_rehash`. Sounds like contention within the hash table structures.
Let's dive into the code. The immediate focus is on the field being accessed. `d_unhashed` is used to check if a dentry is, well, unhashed – meaning not in the hash table. `hlist_bl_unhashed` is the function doing the work, checking if `hlist_bl_node->pprev` is NULL or if its LSB is set.
Wait a second. I remember the `hlist_bl` implementation. Let's see...
```c
static inline int hlist_bl_unhashed(const struct hlist_bl_node *h)
{
return !h->pprev;
}
```
Right, of course. `hlist_bl` uses the least significant bit of `pprev` for locking. It's a classic trick for concurrent lists, I should remember this. The problem is almost certainly about the locking mechanism not working correctly. I need to refresh my memory on the implementation of `include/linux/list_bl.h` to see how these locking bits work. I need to understand how the read and write operations interact with the locking bits.
|
| 6/2 | 2026/02/24 16:54 | tool | read-file |
0mArgs:map[File:include/linux/list_bl.h] Results: map[Contents:/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_LIST_BL_H
#define _LINUX_LIST_BL_H
#include <linux/list.h>
#include <linux/bit_spinlock.h>
/*
* Special version of lists, where head of the list has a lock in the lowest
* bit. This is useful for scalable hash tables without increasing memory
* footprint overhead.
*
* For modification operations, the 0 bit of hlist_bl_head->first
* pointer must be set.
*
* With some small modifications, this can easily be adapted to store several
* arbitrary bits (not just a single lock bit), if the need arises to store
* some fast and compact auxiliary data.
*/
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
#define LIST_BL_LOCKMASK 1UL
#else
#define LIST_BL_LOCKMASK 0UL
#endif
#ifdef CONFIG_DEBUG_LIST
#define LIST_BL_BUG_ON(x) BUG_ON(x)
#else
#define LIST_BL_BUG_ON(x)
#endif
struct hlist_bl_head {
struct hlist_bl_node *first;
};
struct hlist_bl_node {
struct hlist_bl_node *next, **pprev;
};
#define INIT_HLIST_BL_HEAD(ptr) \
((ptr)->first = NULL)
static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h)
{
h->next = NULL;
h->pprev = NULL;
}
#define hlist_bl_entry(ptr, type, member) container_of(ptr,type,member)
static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
{
return !h->pprev;
}
static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)h->first & ~LIST_BL_LOCKMASK);
}
static inline void hlist_bl_set_first(struct hlist_bl_head *h,
struct hlist_bl_node *n)
{
LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK);
LIST_BL_BUG_ON(((unsigned long)h->first & LIST_BL_LOCKMASK) !=
LIST_BL_LOCKMASK);
h->first = (struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK);
}
static inline bool hlist_bl_empty(const struct hlist_bl_head *h)
{
return !((unsigned long)READ_ONCE(h->first) & ~LIST_BL_LOCKMASK);
}
static inline void hlist_bl_add_head(struct hlist_bl_node *n,
struct hlist_bl_head *h)
{
struct hlist_bl_node *first = hlist_bl_first(h);
n->next = first;
if (first)
first->pprev = &n->next;
n->pprev = &h->first;
hlist_bl_set_first(h, n);
}
static inline void hlist_bl_add_before(struct hlist_bl_node *n,
struct hlist_bl_node *next)
{
struct hlist_bl_node **pprev = next->pprev;
n->pprev = pprev;
n->next = next;
next->pprev = &n->next;
/* pprev may be `first`, so be careful not to lose the lock bit */
WRITE_ONCE(*pprev,
(struct hlist_bl_node *)
((uintptr_t)n | ((uintptr_t)*pprev & LIST_BL_LOCKMASK)));
}
static inline void hlist_bl_add_behind(struct hlist_bl_node *n,
struct hlist_bl_node *prev)
{
n->next = prev->next;
n->pprev = &prev->next;
prev->next = n;
if (n->next)
n->next->pprev = &n->next;
}
static inline void __hlist_bl_del(struct hlist_bl_node *n)
{
struct hlist_bl_node *next = n->next;
struct hlist_bl_node **pprev = n->pprev;
LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK);
/* pprev may be `first`, so be careful not to lose the lock bit */
WRITE_ONCE(*pprev,
(struct hlist_bl_node *)
((unsigned long)next |
((unsigned long)*pprev & LIST_BL_LOCKMASK)));
if (next)
next->pprev = pprev;
}
static inline void hlist_bl_del(struct hlist_bl_node *n)
{
__hlist_bl_del(n);
n->next = LIST_POISON1;
n->pprev = LIST_POISON2;
}
static inline void hlist_bl_del_init(struct hlist_bl_node *n)
{
if (!hlist_bl_unhashed(n)) {
__hlist_bl_del(n);
INIT_HLIST_BL_NODE(n);
}
}
static inline void hlist_bl_lock(struct hlist_bl_head *b)
{
bit_spin_lock(0, (unsigned long *)b);
}
static inline void hlist_bl_unlock(struct hlist_bl_head *b)
{
__bit_spin_unlock(0, (unsigned long *)b);
}
static inline bool hlist_bl_is_locked(struct hlist_bl_head *b)
{
return bit_spin_is_locked(0, (unsigned long *)b);
}
/**
* hlist_bl_for_each_entry - iterate over list of given type
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_node to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
*
*/
#define hlist_bl_for_each_entry(tpos, pos, head, member) \
for (pos = hlist_bl_first(head); \
pos && \
({ tpos = hlist_bl_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)
/**
* hlist_bl_for_each_entry_safe - iterate over list of given type safe against removal of list entry
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_node to use as a loop cursor.
* @n: another &struct hlist_node to use as temporary storage
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
*/
#define hlist_bl_for_each_entry_safe(tpos, pos, n, head, member) \
for (pos = hlist_bl_first(head); \
pos && ({ n = pos->next; 1; }) && \
({ tpos = hlist_bl_entry(pos, typeof(*tpos), member); 1;}); \
pos = n)
#endif
] |
| 7/2 | 2026/02/24 16:54 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 7502 output: 26 thoughts: 34 Thoughts: Here's my take on the thought process, as if I were the one thinking it: **Diving into the RCU-List Implementation** Okay, so I'm looking at `hlist_bl_add_head_rcu`. Time to trace this down and figure out what's going on. Let's start with the definition, which likely lives in `include/linux/rculist_bl.h`. This is a classic pattern – the header files are the best place to find the data structures and the macros that define the behaviour of the various operations. I'll need to refresh my memory on the underlying RCU (Read-Copy-Update) semantics to fully grasp what's happening. I expect to see some manipulation of pointers, possibly some atomic operations or memory barriers, and probably some checks relating to RCU's quiescent state handling. Let's see what's actually under the hood. Time to dig in and understand this function. |
| 8/2 | 2026/02/24 16:54 | tool | read-file |
0mArgs:map[File:include/linux/rculist_bl.h] Results: map[Contents:/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_RCULIST_BL_H
#define _LINUX_RCULIST_BL_H
/*
* RCU-protected bl list version. See include/linux/list_bl.h.
*/
#include <linux/list_bl.h>
#include <linux/rcupdate.h>
static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h,
struct hlist_bl_node *n)
{
LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK);
LIST_BL_BUG_ON(((unsigned long)h->first & LIST_BL_LOCKMASK) !=
LIST_BL_LOCKMASK);
rcu_assign_pointer(h->first,
(struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK));
}
static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK);
}
/**
* hlist_bl_del_rcu - deletes entry from hash list without re-initialization
* @n: the element to delete from the hash list.
*
* Note: hlist_bl_unhashed() on entry does not return true after this,
* the entry is in an undefined state. It is useful for RCU based
* lockfree traversal.
*
* In particular, it means that we can not poison the forward
* pointers that may still be used for walking the hash list.
*
* The caller must take whatever precautions are necessary
* (such as holding appropriate locks) to avoid racing
* with another list-mutation primitive, such as hlist_bl_add_head_rcu()
* or hlist_bl_del_rcu(), running on this same list.
* However, it is perfectly legal to run concurrently with
* the _rcu list-traversal primitives, such as
* hlist_bl_for_each_entry().
*/
static inline void hlist_bl_del_rcu(struct hlist_bl_node *n)
{
__hlist_bl_del(n);
n->pprev = LIST_POISON2;
}
/**
* hlist_bl_add_head_rcu
* @n: the element to add to the hash list.
* @h: the list to add to.
*
* Description:
* Adds the specified element to the specified hlist_bl,
* while permitting racing traversals.
*
* The caller must take whatever precautions are necessary
* (such as holding appropriate locks) to avoid racing
* with another list-mutation primitive, such as hlist_bl_add_head_rcu()
* or hlist_bl_del_rcu(), running on this same list.
* However, it is perfectly legal to run concurrently with
* the _rcu list-traversal primitives, such as
* hlist_bl_for_each_entry_rcu(), used to prevent memory-consistency
* problems on Alpha CPUs. Regardless of the type of CPU, the
* list-traversal primitive must be guarded by rcu_read_lock().
*/
static inline void hlist_bl_add_head_rcu(struct hlist_bl_node *n,
struct hlist_bl_head *h)
{
struct hlist_bl_node *first;
/* don't need hlist_bl_first_rcu because we're under lock */
first = hlist_bl_first(h);
n->next = first;
if (first)
first->pprev = &n->next;
n->pprev = &h->first;
/* need _rcu because we can have concurrent lock free readers */
hlist_bl_set_first_rcu(h, n);
}
/**
* hlist_bl_for_each_entry_rcu - iterate over rcu list of given type
* @tpos: the type * to use as a loop cursor.
* @pos: the &struct hlist_bl_node to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_bl_node within the struct.
*
*/
#define hlist_bl_for_each_entry_rcu(tpos, pos, head, member) \
for (pos = hlist_bl_first_rcu(head); \
pos && \
({ tpos = hlist_bl_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(pos->next))
#endif
] |
| 9/2 | 2026/02/24 16:54 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 8944 output: 46 thoughts: 143 Thoughts: **Analyzing the RCU-protected Hlist and Its Unhashing** Okay, so the race condition is clearly related to `n->pprev`. I need to dig into this. I'm seeing it specifically within the context of `hlist_bl_add_head_rcu`. The crucial line here is `n->pprev = &h->first;`. This establishes the "previous" pointer for a node within the bl-list. Now, `hlist_bl_unhashed` is the function to understand next, because it is called in `d_unhashed`. It seems to check if the node is in the hashed list. The core of this seems to be: `return !h->pprev;`. So, if `pprev` is null, the node is unhashed. Makes sense. I need to see where this all fits in, so I'm going to jump to `__d_lookup_rcu_op_compare` within `fs/dcache.c`. I need to figure out how these unhashed entries are being treated within the overall dcache lookup process, and how RCU synchronization is used for ensuring the safety and correctness. This might give me a clearer picture of the race and its impact on the data structures. |
| 10/2 | 2026/02/24 16:54 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_lookup_rcu_op_compare] Results: map[SourceCode:2221: /*
2222: * This is __d_lookup_rcu() when the parent dentry has
2223: * DCACHE_OP_COMPARE, which makes things much nastier.
2224: */
2225: static noinline struct dentry *__d_lookup_rcu_op_compare(
2226: const struct dentry *parent,
2227: const struct qstr *name,
2228: unsigned *seqp)
2229: {
2230: u64 hashlen = name->hash_len;
2231: struct hlist_bl_head *b = d_hash(hashlen);
2232: struct hlist_bl_node *node;
2233: struct dentry *dentry;
2234:
2235: hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
2236: int tlen;
2237: const char *tname;
2238: unsigned seq;
2239:
2240: seqretry:
2241: seq = raw_seqcount_begin(&dentry->d_seq);
2242: if (dentry->d_parent != parent)
2243: continue;
2244: if (d_unhashed(dentry))
2245: continue;
2246: if (dentry->d_name.hash != hashlen_hash(hashlen))
2247: continue;
2248: tlen = dentry->d_name.len;
2249: tname = dentry->d_name.name;
2250: /* we want a consistent (name,len) pair */
2251: if (read_seqcount_retry(&dentry->d_seq, seq)) {
2252: cpu_relax();
2253: goto seqretry;
2254: }
2255: if (parent->d_op->d_compare(dentry, tlen, tname, name) != 0)
2256: continue;
2257: *seqp = seq;
2258: return dentry;
2259: }
2260: return NULL;
2261: }
SourceFile:fs/dcache.c] |
| 11/2 | 2026/02/24 16:54 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 10050 output: 40 thoughts: 218 Thoughts: **Analyzing the Dentry Hashing Mechanism** Okay, so I'm looking at this dentry code. Specifically, this section using `hlist_bl_for_each_entry_rcu` to iterate through entries, and it's skipping over `dentry` if it is unhashed. Let me verify the details on unhashed. I see that `d_unhashed(dentry)` calls `hlist_bl_unhashed(&dentry->d_hash)`. Makes sense, straightforward. Now, I need to see *when* it's re-hashed. Let's look at `__d_rehash`. Ah, here it is. It acquires a lock on the bucket `b` corresponding to the hash of the entry's name, then adds the `dentry`'s `d_hash` to the head of that bucket using `hlist_bl_add_head_rcu`. Finally, it releases the lock. So the hashing/re-hashing process involves a lock, which makes sense for concurrency protection. The `rcu` usage here is standard for this kind of structure and access pattern. Good. |
| 12/2 | 2026/02/24 16:54 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_rehash] Results: map[SourceCode:2529: static void __d_rehash(struct dentry *entry)
2530: {
2531: struct hlist_bl_head *b = d_hash(entry->d_name.hash);
2532:
2533: hlist_bl_lock(b);
2534: hlist_bl_add_head_rcu(&entry->d_hash, b);
2535: hlist_bl_unlock(b);
2536: }
SourceFile:fs/dcache.c] |
| 13/2 | 2026/02/24 16:54 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 10715 output: 42 thoughts: 1363 Thoughts: **Analyzing a Potential Race Condition in Dentry Rehash**
Okay, here's what's running through my head. We've got a classic reader-writer problem concerning dentry hash table management. The core contention seems to be around `dentry->d_hash.pprev`.
My understanding is this:
* **Reader:** The reader uses `hlist_bl_unhashed(&dentry->d_hash)` as the initial check to see if the element is in the hash, effectively checking `!dentry->d_hash.pprev`.
* **Writer:** The writer utilizes `hlist_bl_add_head_rcu(&entry->d_hash, b)`, which ultimately sets `n->pprev = &h->first;` for a new entry or re-adds it.
The immediate conflict I see is: If a reader is iterating over a list using `hlist_bl_for_each_entry_rcu`, it logically implies the `dentry` is *already* in the list, and thus, `dentry->d_hash.pprev` should be non-NULL. But the writer, particularly `__d_rehash`, is intended to *add* a dentry to the hash table. If the dentry *wasn't* in the table before, `pprev` would be NULL. The fact that the reader is iterating suggests it's already in the table.
So, how can a reader find a dentry if it's not yet added by the writer? `hlist_bl_for_each_entry_rcu` starts with `hlist_bl_first_rcu(head)`. Also, `hlist_bl_add_head_rcu` sets `n->pprev` *before* calling `hlist_bl_set_first_rcu(h, n)`.
```c
static inline void hlist_bl_add_head_rcu(struct hlist_bl_node *n,
struct hlist_bl_head *h)
{
struct hlist_bl_node *first;
/* don't need hlist_bl_first_rcu because we're under lock */
first = hlist_bl_first(h);
n->next = first;
if (first)
first->pprev = &n->next;
n->pprev = &h->first;
/* need _rcu because we can have concurrent lock free readers */
hlist_bl_set_first_rcu(h, n);
}
```
If the reader is at `dentry`, it *should* already be in the list. The reader traverses using `pos = rcu_dereference_raw(pos->next)`. So, the only way it would see the updated `dentry` after an add is if the write happens *before* the reader has a chance to access it, and it will be able to see the results of `hlist_bl_set_first_rcu` when it is called.
The key observation is that `hlist_bl_set_first_rcu` uses `rcu_assign_pointer`, which implies a release barrier. This means the setting of `n->pprev = &h->first` should be visible to any reader that subsequently sees `n` through `h->first`.
However, the report indicates the race is occurring on the *same* memory address, meaning the reader and writer are operating on the *identical* `dentry`. Can the same dentry be rehashed while it's already in the hash table? The usual behavior of `__d_rehash` is to add an *unhashed* dentry. `d_unhashed` is used to check if the dentry is in the hash table, and it returns true if it is not.
Now, let's look at `__d_add`:
```c
static void __d_add(struct dentry *dentry, struct inode *inode)
{
struct inode *dir = NULL;
unsigned n;
spin_lock(&dentry->d_lock);
if (unlikely(d_in_lookup(dentry))) {
dir = dentry->d_parent->d_inode;
n = start_dir_add(dir);
__d_lookup_done(dentry);
}
if (inode) {
unsigned add_flags = d_flags_for_inode(inode);
hlist_add_head_rcu(&dentry->d_u.d_alias, &inode->i_dentry
|
| 14/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:d_splice_alias_ops] Results: map[SourceCode:3053: struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
3054: const struct dentry_operations *ops)
3055: {
3056: if (IS_ERR(inode))
3057: return ERR_CAST(inode);
3058:
3059: BUG_ON(!d_unhashed(dentry));
3060:
3061: if (!inode)
3062: goto out;
3063:
3064: security_d_instantiate(dentry, inode);
3065: spin_lock(&inode->i_lock);
3066: if (S_ISDIR(inode->i_mode)) {
3067: struct dentry *new = __d_find_any_alias(inode);
3068: if (unlikely(new)) {
3069: /* The reference to new ensures it remains an alias */
3070: spin_unlock(&inode->i_lock);
3071: write_seqlock(&rename_lock);
3072: if (unlikely(d_ancestor(new, dentry))) {
3073: write_sequnlock(&rename_lock);
3074: dput(new);
3075: new = ERR_PTR(-ELOOP);
3076: pr_warn_ratelimited(
3077: "VFS: Lookup of '%s' in %s %s"
3078: " would have caused loop\n",
3079: dentry->d_name.name,
3080: inode->i_sb->s_type->name,
3081: inode->i_sb->s_id);
3082: } else if (!IS_ROOT(new)) {
3083: struct dentry *old_parent = dget(new->d_parent);
3084: int err = __d_unalias(dentry, new);
3085: write_sequnlock(&rename_lock);
3086: if (err) {
3087: dput(new);
3088: new = ERR_PTR(err);
3089: }
3090: dput(old_parent);
3091: } else {
3092: __d_move(new, dentry, false);
3093: write_sequnlock(&rename_lock);
3094: }
3095: iput(inode);
3096: return new;
3097: }
3098: }
3099: out:
3100: __d_add(dentry, inode, ops);
3101: return NULL;
3102: }
SourceFile:fs/dcache.c] |
| 15/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 13919 output: 39 thoughts: 55 Thoughts: Okay, let me think this through. **Digging into that `d_splice_alias_ops` function call is making me rethink things.** I just realized it calls `__d_add`. I *thought* I saw `__d_add` earlier, and if memory serves, it was a function that took two arguments. But now, looking at this specific instance in `d_splice_alias_ops`, it's being called with *three* arguments. Hmm, that's not what I expected. Let me go back and double-check my understanding of that `__d_add` function signature. I need to clarify what's going on with the arguments before I can continue. I need to verify that I'm interpreting this correctly. |
| 16/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_add] Results: map[SourceCode:2737: static inline void __d_add(struct dentry *dentry, struct inode *inode,
2738: const struct dentry_operations *ops)
2739: {
2740: wait_queue_head_t *d_wait;
2741: struct inode *dir = NULL;
2742: unsigned n;
2743: spin_lock(&dentry->d_lock);
2744: if (unlikely(d_in_lookup(dentry))) {
2745: dir = dentry->d_parent->d_inode;
2746: n = start_dir_add(dir);
2747: d_wait = __d_lookup_unhash(dentry);
2748: }
2749: if (unlikely(ops))
2750: d_set_d_op(dentry, ops);
2751: if (inode) {
2752: unsigned add_flags = d_flags_for_inode(inode);
2753: hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
2754: raw_write_seqcount_begin(&dentry->d_seq);
2755: __d_set_inode_and_type(dentry, inode, add_flags);
2756: raw_write_seqcount_end(&dentry->d_seq);
2757: fsnotify_update_flags(dentry);
2758: }
2759: __d_rehash(dentry);
2760: if (dir)
2761: end_dir_add(dir, n, d_wait);
2762: spin_unlock(&dentry->d_lock);
2763: if (inode)
2764: spin_unlock(&inode->i_lock);
2765: }
SourceFile:fs/dcache.c] |
| 17/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 14704 output: 40 thoughts: 804 Thoughts: **Analyzing a Potential Race Condition in Directory Entry Hashing**
Okay, let's break this down. My initial thought process is triggered by a reported KCSAN race, specifically around directory entry (dentry) hashing and lookups. I see that the `__d_add` function, which adds a dentry to the hash table, calls `__d_rehash(dentry)`. The dentry is, naturally, *the* one being added. Before adding, a `dentry` should *not* yet be in the hash table. The presence of `BUG_ON(!d_unhashed(dentry));` in `d_splice_alias_ops` at line 3059 further validates this assumption. So, `dentry` is definitively unhashed before `__d_add`. This is good; it's what I expect.
Now, the real question: how could a reader in `__d_lookup_rcu_op_compare` ever *see* this dentry that's in the process of being added? The reader is traversing a hash bucket `b`, computed as `b = d_hash(hashlen)`. It's iterating using `hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash)`.
If `__d_rehash(dentry)` is actively running, it's in the process of adding `dentry` to bucket `b`. The code sequence uses `hlist_bl_add_head_rcu`, which first sets `n->next`, then `first->pprev`, `n->pprev`, and finally, `h->first`. The core question now is about the visibility of these operations to the reader.
My immediate conclusion is that if the reader *does* see the `dentry`, it means the `hlist_bl_set_first_rcu(h, n)` operation must have *already* happened, or is happening concurrently. Crucially, if that's the case, then the update to `n->pprev = &h->first` *should* have preceded it, because of the release semantics guaranteed by `rcu_assign_pointer`.
Then I see the `hlist_bl_unhashed` code:
```c
static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
{
return !h->pprev;
}
```
Notably, it *doesn't* use `READ_ONCE`. This is immediately suspicious.
If the reader sees `dentry` via `hlist_bl_first_rcu(head)`, it implies `head->first` has been updated to point to `dentry`. Because of `rcu_assign_pointer` in `hlist_bl_set_first_rcu`, the write to `dentry->d_hash.pprev` *should* be visible to the reader.
However, KCSAN reports a race, specifically around the reader checking `!dentry->d_hash.pprev` (meaning the dentry is unhashed), while the writer is setting `dentry->d_hash.pprev = &h->first` (the dentry is now hashed).
This race is the crux: If these operations *are* racing, the reader might see a NULL `pprev` even if it has found the dentry in the list? But if it has found it, shouldn't `pprev` be non-NULL? The reader, finding the dentry in the list, means it’s already hashed. In this scenario `pprev` should *never* be NULL.
Wait, the logic in place says that if the reader sees a NULL `pprev`, `d_unhashed` returns true, and the reader *skips* to the next dentry. Is this behavior problematic? If the dentry is *in the process* of being added, is it acceptable to skip it during this specific lookup? Logically, the dentry *isn't fully there* yet. Although, critically, it *is* already at the head of the list.
The main concern is: if `hlist_bl_unhashed` is true, the reader *skips* the dentry. This potentially means the reader might fail to find a dentry that was just added. However, considering that `d_alloc_parallel` (which calls `__d_lookup_rcu`) might then either attempt to create the dentry or wait for it, this may not be a catastrophe.
My next step needs to be analyzing `d_alloc_parallel`. I need to fully understand its behavior in these concurrent add/lookup scenarios.
|
| 18/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:d_alloc_parallel] Results: map[SourceCode:2587: struct dentry *d_alloc_parallel(struct dentry *parent,
2588: const struct qstr *name,
2589: wait_queue_head_t *wq)
2590: {
2591: unsigned int hash = name->hash;
2592: struct hlist_bl_head *b = in_lookup_hash(parent, hash);
2593: struct hlist_bl_node *node;
2594: struct dentry *new = __d_alloc(parent->d_sb, name);
2595: struct dentry *dentry;
2596: unsigned seq, r_seq, d_seq;
2597:
2598: if (unlikely(!new))
2599: return ERR_PTR(-ENOMEM);
2600:
2601: new->d_flags |= DCACHE_PAR_LOOKUP;
2602: spin_lock(&parent->d_lock);
2603: new->d_parent = dget_dlock(parent);
2604: hlist_add_head(&new->d_sib, &parent->d_children);
2605: if (parent->d_flags & DCACHE_DISCONNECTED)
2606: new->d_flags |= DCACHE_DISCONNECTED;
2607: spin_unlock(&parent->d_lock);
2608:
2609: retry:
2610: rcu_read_lock();
2611: seq = smp_load_acquire(&parent->d_inode->i_dir_seq);
2612: r_seq = read_seqbegin(&rename_lock);
2613: dentry = __d_lookup_rcu(parent, name, &d_seq);
2614: if (unlikely(dentry)) {
2615: if (!lockref_get_not_dead(&dentry->d_lockref)) {
2616: rcu_read_unlock();
2617: goto retry;
2618: }
2619: if (read_seqcount_retry(&dentry->d_seq, d_seq)) {
2620: rcu_read_unlock();
2621: dput(dentry);
2622: goto retry;
2623: }
2624: rcu_read_unlock();
2625: dput(new);
2626: return dentry;
2627: }
2628: if (unlikely(read_seqretry(&rename_lock, r_seq))) {
2629: rcu_read_unlock();
2630: goto retry;
2631: }
2632:
2633: if (unlikely(seq & 1)) {
2634: rcu_read_unlock();
2635: goto retry;
2636: }
2637:
2638: hlist_bl_lock(b);
2639: if (unlikely(READ_ONCE(parent->d_inode->i_dir_seq) != seq)) {
2640: hlist_bl_unlock(b);
2641: rcu_read_unlock();
2642: goto retry;
2643: }
2644: /*
2645: * No changes for the parent since the beginning of d_lookup().
2646: * Since all removals from the chain happen with hlist_bl_lock(),
2647: * any potential in-lookup matches are going to stay here until
2648: * we unlock the chain. All fields are stable in everything
2649: * we encounter.
2650: */
2651: hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
2652: if (dentry->d_name.hash != hash)
2653: continue;
2654: if (dentry->d_parent != parent)
2655: continue;
2656: if (!d_same_name(dentry, parent, name))
2657: continue;
2658: hlist_bl_unlock(b);
2659: /* now we can try to grab a reference */
2660: if (!lockref_get_not_dead(&dentry->d_lockref)) {
2661: rcu_read_unlock();
2662: goto retry;
2663: }
2664:
2665: rcu_read_unlock();
2666: /*
2667: * somebody is likely to be still doing lookup for it;
2668: * wait for them to finish
2669: */
2670: spin_lock(&dentry->d_lock);
2671: d_wait_lookup(dentry);
2672: /*
2673: * it's not in-lookup anymore; in principle we should repeat
2674: * everything from dcache lookup, but it's likely to be what
2675: * d_lookup() would've found anyway. If it is, just return it;
2676: * otherwise we really have to repeat the whole thing.
2677: */
2678: if (unlikely(dentry->d_name.hash != hash))
2679: goto mismatch;
2680: if (unlikely(dentry->d_parent != parent))
2681: goto mismatch;
2682: if (unlikely(d_unhashed(dentry)))
2683: goto mismatch;
2684: if (unlikely(!d_same_name(dentry, parent, name)))
2685: goto mismatch;
2686: /* OK, it *is* a hashed match; return it */
2687: spin_unlock(&dentry->d_lock);
2688: dput(new);
2689: return dentry;
2690: }
2691: rcu_read_unlock();
2692: new->d_wait = wq;
2693: hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
2694: hlist_bl_unlock(b);
2695: return new;
2696: mismatch:
2697: spin_unlock(&dentry->d_lock);
2698: dput(dentry);
2699: goto retry;
2700: }
SourceFile:fs/dcache.c] |
| 19/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 18356 output: 42 thoughts: 553 Thoughts: **Race Condition Analysis in dentry Lookup and Insertion** Okay, so I'm digging into this dentry lookup and insertion process, and there's a nasty race condition lurking here. Let's break it down. If a lookup via `__d_lookup_rcu` fails because of a race, `d_alloc_parallel` jumps in to check `in_lookup_hash`. If the dentry *still* isn't there, it adds itself to `in_lookup_hash`. The goal is to avoid duplicated dentries. But wait, if `__d_rehash` is happening, that means another thread has *finished* a lookup and is inserting the dentry into the main hash table. If a reader misses it in `__d_lookup_rcu`, it will then check `in_lookup_hash`, as described above. The thing is, `__d_rehash` runs *after* `__d_lookup_done`, and `__d_lookup_done` removes the dentry from `in_lookup_hash`. Here's the crux of the problem: there's a window where the dentry is neither in `in_lookup_hash` nor (visibly) in the main hash table. If a reader hits this window, it'll incorrectly conclude the dentry doesn't exist and might try to create a duplicate or add itself to `in_lookup_hash`. Now, looking at `__d_add`, it does something interesting. If the dentry is *in* a lookup (via `d_in_lookup`), it calls `__d_lookup_unhash` and *then* `__d_rehash`. `__d_lookup_unhash` removes the dentry from `in_lookup_hash`, and `__d_rehash` adds it to the main hash table, with `end_dir_add` waking up waiters. So, if a reader in `d_alloc_parallel` misses the dentry in `__d_lookup_rcu` and it's already removed from `in_lookup_hash`, it's going to try to add *its* new dentry to `in_lookup_hash`. This is a classic double-allocation risk. But there's something else: `d_alloc_parallel` takes `hlist_bl_lock(b)` on the `in_lookup_hash` bucket before adding. And `__d_add` also calls `__d_lookup_unhash` which should also take that lock! Alright, time to dig deeper into the code, starting with `__d_lookup_unhash` to confirm my suspicion... I'm pretty sure it's the key to understanding the full locking order. |
| 20/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_lookup_unhash] Results: map[SourceCode:2703: /*
2704: * - Unhash the dentry
2705: * - Retrieve and clear the waitqueue head in dentry
2706: * - Return the waitqueue head
2707: */
2708: static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
2709: {
2710: wait_queue_head_t *d_wait;
2711: struct hlist_bl_head *b;
2712:
2713: lockdep_assert_held(&dentry->d_lock);
2714:
2715: b = in_lookup_hash(dentry->d_parent, dentry->d_name.hash);
2716: hlist_bl_lock(b);
2717: dentry->d_flags &= ~DCACHE_PAR_LOOKUP;
2718: __hlist_bl_del(&dentry->d_u.d_in_lookup_hash);
2719: d_wait = dentry->d_wait;
2720: dentry->d_wait = NULL;
2721: hlist_bl_unlock(b);
2722: INIT_HLIST_NODE(&dentry->d_u.d_alias);
2723: INIT_LIST_HEAD(&dentry->d_lru);
2724: return d_wait;
2725: }
SourceFile:fs/dcache.c] |
| 21/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 19961 output: 41 thoughts: 1387 Thoughts: **Analyzing the dentry hash table race condition**
Okay, so I'm diving into this dentry hash table issue. I see that `hlist_bl_lock(b)` is definitely involved. Let's break down the `d_alloc_parallel` scenario. First, `__d_lookup_rcu` kicks off, which uses RCU and no locks, then we grab the lock on `in_lookup_hash`, using `hlist_bl_lock(b)`, and iterate the `in_lookup_hash`.
Now, when `__d_add` is running concurrently, things get interesting. It grabs `spin_lock(&dentry->d_lock)`, then calls `__d_lookup_unhash` to take that same lock, remove it from `in_lookup_hash`, and then releases the lock. Following that, it calls `__d_rehash`, which takes the `hlist_bl_lock`, adds to the main table, and releases it.
Here's the problem: if `__d_lookup_rcu` misses the dentry because of a race – it sees `pprev == NULL` even if the dentry is *there* – it then takes that same lock, `hlist_bl_lock(b)`. If `__d_add` has finished, the reader *won't* find the entry. It could create a duplicate! But wait, `d_alloc_parallel` is called from `lookup_open` or `path_openat`. It will eventually be hashed.
The race specifically happens on `d_unhashed(dentry)` within `__d_lookup_rcu_op_compare`. It's checking if the dentry is still valid or hasn't been removed, but if `d_unhashed` returns true, it's skipped.
Looking at how `pprev` is handled, `hlist_bl_add_head_rcu` does: `n->pprev = &h->first; hlist_bl_set_first_rcu(h, n);`. `hlist_bl_set_first_rcu` then uses `rcu_assign_pointer`. A reader should see the updated `pprev` due to the memory barrier from `rcu_assign_pointer`.
The `hlist_bl_for_each_entry_rcu` uses `hlist_bl_first_rcu(head)`, which in turn uses `rcu_dereference_check` which has a consume barrier. This *should* be fine.
So, why is KCSAN reporting a race? Ah, it's because `d_unhashed` and `hlist_bl_unhashed` *don't* use `READ_ONCE` or any synchronization on `pprev`. It's just a plain read: `static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h) { return !h->pprev; }`. And when we add using `hlist_bl_add_head_rcu` the value is just assigned directly `n->pprev = &h->first;`
If the compiler reorders the read of `pprev` in `hlist_bl_unhashed`, or if it's not atomic, we have an issue. `pprev` is a pointer, so it *should* be atomic, so the main concern is seeing it as NULL when it shouldn't be.
If the reader finds a dentry in the list, it got there via `next` or `head`. If via `head`, `rcu_assign_pointer` in `hlist_bl_set_first_rcu` should ensure visibility of `pprev`. If from `node->next`, a similar method.
Wait, `__d_rehash` uses `hlist_bl_add_head_rcu`, so the dentry is *added* to the head. The *only* way to see this dentry is via `hlist_bl_first_rcu(b)`. That means if the reader *does* see the entry, it means it read `b->first`, and therefore, sees a non-NULL `pprev`. The read should happen *after* `b->first` was set with `rcu_assign_pointer`.
*However*, `hlist_bl_unhashed` is just a plain read. Can the compiler reorder it? It's inside a loop, so that's the thing that stands out.
Wait...what about removing? `d_unhashed` is used to see if the dentry is still hashed. RCU lookups might find a dentry *being* removed. `h
|
| 22/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:include/linux/dcache.h IncludeLines:true Name:d_unhashed] Results: map[SourceCode: 358: /**
359: * d_unhashed - is dentry hashed
360: * @dentry: entry to check
361: *
362: * Returns true if the dentry passed is not currently hashed.
363: */
364: static inline int d_unhashed(const struct dentry *dentry)
365: {
366: return hlist_bl_unhashed(&dentry->d_hash);
367: }
SourceFile:include/linux/dcache.h] |
| 23/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 22583 output: 39 thoughts: 181 Thoughts: **Analyzing the `hlist_bl_unhashed` Function** Okay, so I'm looking at this `hlist_bl_unhashed` function, and it's pretty straightforward. It checks if a `hlist_bl_node` is unhashed by simply checking if its `pprev` pointer is NULL. That makes sense, right? A NULL `pprev` indicates it's not currently linked into a hash list. But then I started thinking about the state transitions. When a dentry *isn't* hashed, `pprev` is indeed NULL. Fine. And when it *is* being hashed, `pprev` gets set to something non-NULL, as expected. But, when the dentry is being *unhashed* using something like `hlist_bl_del_rcu`, `pprev` gets set to `LIST_POISON2`. Now, here's where it gets interesting, and frankly, seems a bit off. If `pprev` is `LIST_POISON2`, `hlist_bl_unhashed` would return *false*. So, the code would be indicating that the dentry is *still* hashed, even though it's in the process of being removed! That doesn't seem right at all. This appears to be a potential bug or at least an unexpected state. This is highly unusual and warrants more investigation. Let's dig into `__d_drop` and see what it is doing and, potentially, if it is a consequence of this. |
| 24/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_drop] Results: map[SourceCode: 572: void __d_drop(struct dentry *dentry)
573: {
574: if (!d_unhashed(dentry)) {
575: ___d_drop(dentry);
576: dentry->d_hash.pprev = NULL;
577: write_seqcount_invalidate(&dentry->d_seq);
578: }
579: }
SourceFile:fs/dcache.c] |
| 25/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 23277 output: 39 thoughts: 7 Thoughts: Okay, I'm ready. Here's a summary, in the first person, assuming I'm an expert in the field, and interpreting the "Thoughts:" provided as my own: **Analyzing Potential Variables** Hmm, this situation is interesting. Let's see... It appears we're focusing on an important variable here, a drop that we are labeling with a variable that starts with "___d_drop". I need to consider what it's significance is. Is it a dependent variable, an independent one, or perhaps something else entirely? My initial thought is to identify its characteristics. What does this variable represent? What are its units? What influences it? What would be the optimal way to measure this? How does it behave? What other factors could be influenced by a "drop"? This also makes me consider the different ways this "drop" could influence this. Is it a sudden, acute event, or a gradual decline? Is the emphasis on the thing dropped, the act of dropping, or the impact of the drop? The context is critical here. I need to delve deeper into its meaning to formulate a hypothesis or refine an existing model. We should be considering the implications of this "___d_drop" on other parameters as well. The goal is to determine the best approach to incorporate this variable into our work. |
| 26/2 | 2026/02/24 16:55 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:___d_drop] Results: map[SourceCode: 554: static void ___d_drop(struct dentry *dentry)
555: {
556: struct hlist_bl_head *b;
557: /*
558: * Hashed dentries are normally on the dentry hashtable,
559: * with the exception of those newly allocated by
560: * d_obtain_root, which are always IS_ROOT:
561: */
562: if (unlikely(IS_ROOT(dentry)))
563: b = &dentry->d_sb->s_roots;
564: else
565: b = d_hash(dentry->d_name.hash);
566:
567: hlist_bl_lock(b);
568: __hlist_bl_del(&dentry->d_hash);
569: hlist_bl_unlock(b);
570: }
SourceFile:fs/dcache.c] |
| 27/2 | 2026/02/24 16:55 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 23891 output: 42 thoughts: 3283 Thoughts: **Analyzing a Data Race in the Linux Kernel's Dentry Hash Table** Okay, so I'm diving into this KCSAN report. It flags a potential data race involving `dentry->d_hash.pprev` during dentry re-hashing and lookups. My initial thought is, "Wait, why is this happening?" My expertise tells me that `__d_drop` sets `dentry->d_hash.pprev = NULL`, so if `d_unhashed` is called after this operation, it's going to evaluate to `true`. And `__d_drop` uses the simple `__hlist_bl_del` instead of the RCU-safe `hlist_bl_del_rcu` that sets `n->pprev = LIST_POISON2;`. Looking closely, `__d_rehash` adds an entry to the hash table via `hlist_bl_add_head_rcu`, which sets `n->pprev = &h->first;`. The concurrent reader, in `__d_lookup_rcu_op_compare`, checks `d_unhashed` and thus `!n->pprev`. If the reader is looking at the dentry during rehash, and *found* it in the list, `hlist_bl_set_first_rcu` would have been called *after* setting `n->pprev`, so the reader should *not* see a NULL `pprev`! KCSAN is flagging a race where a write to `pprev` and a read from it are happening concurrently *without* explicit synchronization, like `READ_ONCE`/`WRITE_ONCE` or locks. Even if the logic *should* be safe because of ordering, KCSAN still flags it. So, is this harmful? If the reader sees `pprev` as NULL, it will skip this dentry, which may lead to lookup failures or duplicate dentries. But, during rehash, the dentry *should* be considered "hashed," even if it hasn't been fully added yet. It seems like a logic inconsistency. The `rcu_dereference_check` should prevent reordering of dependent loads. `rcu_dereference_raw` is used in the `hlist_bl_for_each_entry_rcu` macro to read `h->first`, and this should provide a memory barrier. Digging deeper, the write to `n->pprev = &h->first` is *not* a `WRITE_ONCE`, and the read `!h->pprev` is *not* a `READ_ONCE`. This is why KCSAN is correct in identifying a data race. If the compiler were to reorder `n->pprev = &h->first` after `hlist_bl_set_first_rcu`, it could be harmful. But `rcu_assign_pointer` in `hlist_bl_set_first_rcu` has a release barrier that ensures all *prior* writes (including the `n->pprev` write) are visible before the write to `h->first`. So, the only way it could be harmful is if the reader sees a partial write or if the compiler reorders the write. The `rcu_dereference_check` should prevent reordering. I believe it prevents reordering of *dependent* loads. The load of `n->pprev` is dependent on the load of `n` from `h->first`, so it *should* be ordered correctly. KCSAN reports any data race. In this case, the RCU ordering should provide safety, but KCSAN flags this unless `READ_ONCE`/`WRITE_ONCE` are used. Looking into `d_unhashed`, I see it's used extensively. If it incorrectly returns `true` (unhashed) when a dentry is, in fact, hashed, it can cause problems. In this case, it might skip a dentry that was just added during rehash. If a lookup misses the dentry, the `d_alloc_parallel` function takes the `hlist_bl_lock` on the `in_lookup_hash`. If the dentry was already moved from `in_lookup_hash` to the main hash table, the function won't find it in `in_lookup_hash` and may create a duplicate. However, in that function the dentry will be *added* to the `in_lookup_hash`, then removed, and the return path from the function requires that another process's add is completed before the current process can return, and so the existing dentry would be found by `d_splice_alias`. Based on my analysis, it's not a * |
| 28/2 | 2026/02/24 16:56 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_lookup_rcu] Results: map[SourceCode:2263: /**
2264: * __d_lookup_rcu - search for a dentry (racy, store-free)
2265: * @parent: parent dentry
2266: * @name: qstr of name we wish to find
2267: * @seqp: returns d_seq value at the point where the dentry was found
2268: * Returns: dentry, or NULL
2269: *
2270: * __d_lookup_rcu is the dcache lookup function for rcu-walk name
2271: * resolution (store-free path walking) design described in
2272: * Documentation/filesystems/path-lookup.txt.
2273: *
2274: * This is not to be used outside core vfs.
2275: *
2276: * __d_lookup_rcu must only be used in rcu-walk mode, ie. with vfsmount lock
2277: * held, and rcu_read_lock held. The returned dentry must not be stored into
2278: * without taking d_lock and checking d_seq sequence count against @seq
2279: * returned here.
2280: *
2281: * Alternatively, __d_lookup_rcu may be called again to look up the child of
2282: * the returned dentry, so long as its parent's seqlock is checked after the
2283: * child is looked up. Thus, an interlocking stepping of sequence lock checks
2284: * is formed, giving integrity down the path walk.
2285: *
2286: * NOTE! The caller *has* to check the resulting dentry against the sequence
2287: * number we've returned before using any of the resulting dentry state!
2288: */
2289: struct dentry *__d_lookup_rcu(const struct dentry *parent,
2290: const struct qstr *name,
2291: unsigned *seqp)
2292: {
2293: u64 hashlen = name->hash_len;
2294: const unsigned char *str = name->name;
2295: struct hlist_bl_head *b = d_hash(hashlen);
2296: struct hlist_bl_node *node;
2297: struct dentry *dentry;
2298:
2299: /*
2300: * Note: There is significant duplication with __d_lookup_rcu which is
2301: * required to prevent single threaded performance regressions
2302: * especially on architectures where smp_rmb (in seqcounts) are costly.
2303: * Keep the two functions in sync.
2304: */
2305:
2306: if (unlikely(parent->d_flags & DCACHE_OP_COMPARE))
2307: return __d_lookup_rcu_op_compare(parent, name, seqp);
2308:
2309: /*
2310: * The hash list is protected using RCU.
2311: *
2312: * Carefully use d_seq when comparing a candidate dentry, to avoid
2313: * races with d_move().
2314: *
2315: * It is possible that concurrent renames can mess up our list
2316: * walk here and result in missing our dentry, resulting in the
2317: * false-negative result. d_lookup() protects against concurrent
2318: * renames using rename_lock seqlock.
2319: *
2320: * See Documentation/filesystems/path-lookup.txt for more details.
2321: */
2322: hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
2323: unsigned seq;
2324:
2325: /*
2326: * The dentry sequence count protects us from concurrent
2327: * renames, and thus protects parent and name fields.
2328: *
2329: * The caller must perform a seqcount check in order
2330: * to do anything useful with the returned dentry.
2331: *
2332: * NOTE! We do a "raw" seqcount_begin here. That means that
2333: * we don't wait for the sequence count to stabilize if it
2334: * is in the middle of a sequence change. If we do the slow
2335: * dentry compare, we will do seqretries until it is stable,
2336: * and if we end up with a successful lookup, we actually
2337: * want to exit RCU lookup anyway.
2338: *
2339: * Note that raw_seqcount_begin still *does* smp_rmb(), so
2340: * we are still guaranteed NUL-termination of ->d_name.name.
2341: */
2342: seq = raw_seqcount_begin(&dentry->d_seq);
2343: if (dentry->d_parent != parent)
2344: continue;
2345: if (dentry->d_name.hash_len != hashlen)
2346: continue;
2347: if (unlikely(dentry_cmp(dentry, str, hashlen_len(hashlen)) != 0))
2348: continue;
2349: /*
2350: * Check for the dentry being unhashed.
2351: *
2352: * As tempting as it is, we *can't* skip it because of a race window
2353: * between us finding the dentry before it gets unhashed and loading
2354: * the sequence counter after unhashing is finished.
2355: *
2356: * We can at least predict on it.
2357: */
2358: if (unlikely(d_unhashed(dentry)))
2359: continue;
2360: *seqp = seq;
2361: return dentry;
2362: }
2363: return NULL;
2364: }
SourceFile:fs/dcache.c] |
| 29/2 | 2026/02/24 16:56 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 29939 output: 22 thoughts: 4909 Thoughts: **Analysis of Potential Data Race in Kernel Dentry Lookup** Okay, so I've been looking at this KCSAN report. The gist is there's a potential data race flagged around `d_unhashed()` in `__d_lookup_rcu_op_compare`, specifically lines 2352-2362 of the code which calls `d_unhashed(dentry)`. My initial thought is it's probably one of those common RCU-related races. I'm seeing the classic pattern: a writer is adding a dentry to a hash list, and a reader is looking up dentries in that same list. The potential race is around the `pprev` pointer of the `hlist_bl_node`. My immediate reaction is to look at the surrounding code, and I see we have `d_unhashed()`. This function checks if `h->pprev` is NULL. That seems to be the critical area. The potential issue here is that the writer, during the insertion of a dentry, might be in the middle of updating `pprev`, and the reader could see a stale value. So the reader might skip a potentially valid dentry because it sees it as "unhashed". My logic goes like this: The writer is using RCU for protection. The `rcu_assign_pointer` is key here. That means the reader, if it successfully dereferences a pointer to the dentry, should always see a consistent state. The thing that's concerning here is that there's a comment suggesting that they *can't* skip the check, but then the code does skip if `d_unhashed` is true. But the comment is right: we shouldn't skip the check, but we must *continue* in the loop if `d_unhashed` is true, because that means the dentry is not present in the hash table any longer. My primary concern is that a compiler might reorder memory accesses. But the compiler *shouldn't* be able to reorder the load of the `dentry` pointer and the load of `dentry->d_hash.pprev` because that is a classic data dependency. So it *shouldn't* see a NULL `pprev` *unless* the writer hasn't yet finished writing the list pointer. So here's what I am thinking. *If* the reader sees `dentry`, then it's been made visible via RCU, *and* the `pprev` pointer *should* have been written with the appropriate barrier before the pointer to the dentry was made visible. Therefore the reader should never skip the dentry. But here's the rub - if the compiler or CPU reorders the load of `pprev` to before the load of the dentry that's visible via RCU, then we might. So KCSAN is finding a true data race on the `pprev` field of the dentry. I'm considering the possibility of concurrent unhashing by another thread, such as if `__d_drop` is running on the same dentry concurrently with `__d_rehash` or if the dentry is being unhashed by another thread at the same time. The way the kernel is structured, the fact that `__d_add` and `__d_drop` would require to acquire a dentry lock means the two threads should *not* be able to run concurrently. So the only transition is from NULL to non-NULL in the `hlist_bl_node`. My hypothesis is that KCSAN is flagging this because, while logically safe, these pointer accesses aren't explicitly atomic in the code. I recall seeing many such reports. The kernel often considers these data races benign if the logic handles the potential transient state. Here, we can handle missing a dentry because the slow path will take over. So, now I'm thinking about other reasons the `pprev` might be NULL *when* it shouldn't. One option is that `dentry` was being re-used. But the code does `INIT_HLIST_BL_NODE(&dentry->d_hash)` at allocation, so this sets `pprev` to NULL. This means it starts NULL. And it only becomes non-NULL in `__d_rehash`. The code is robust. And the report specifically mentions `__d_rehash`, so it's a known writer and reader conflict. I'm thinking about the possibility of `LIST_POISON2` as the `pprev` value. But the seq count ensures data integrity. I just went back and re-read the code in the kernel. `hlist_bl_add_head_rcu` doesn't use `WRITE_ONCE` for `pprev`. So it' |
| 30/2 | 2026/02/24 16:57 | tool | grepper |
0mError:no matches Args: map[Expression:d_unhashed.*KCSAN] Results: map[Output:] |
| 31/2 | 2026/02/24 16:57 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 35909 output: 21 thoughts: 12 Thoughts: Okay, here's my attempt at summarizing that thought process, keeping in mind I'm supposed to sound like an expert: **Investigating a Potential Hash Table Vulnerability** Alright, let's see... the problem's hitting me in a way that suggests a potential vulnerability in how we're handling hash table collisions, or perhaps even in the hashing implementation itself. I'm thinking, where could I start digging? Given the nature of this, the code relating to the core data structure used for the hash table is my primary target. A quick search for a likely function name is in order. Given that it's a "bl" or "bloom filter" and "unhashed", I'm looking for anything named `hlist_bl_unhashed`. I will examine the codebase in the likely locations for such a function name. Time to see what I can find. |
| 32/2 | 2026/02/24 16:57 | tool | grepper |
0mArgs:map[Expression:hlist_bl_unhashed] Results: map[Output:fs/mbcache.c=152=static struct mb_cache_entry *__entry_find(struct mb_cache *cache,
--
fs/mbcache.c-161- hlist_bl_lock(head);
fs/mbcache.c:162: if (entry && !hlist_bl_unhashed(&entry->e_hash_list))
fs/mbcache.c-163- node = entry->e_hash_list.next;
--
fs/netfs/fscache_volume.c=390=static void fscache_free_volume(struct fscache_volume *volume)
--
fs/netfs/fscache_volume.c-407-
fs/netfs/fscache_volume.c:408: if (!hlist_bl_unhashed(&volume->hash_link))
fs/netfs/fscache_volume.c-409- fscache_unhash_volume(volume);
--
include/linux/dcache.h=364=static inline int d_unhashed(const struct dentry *dentry)
include/linux/dcache.h-365-{
include/linux/dcache.h:366: return hlist_bl_unhashed(&dentry->d_hash);
include/linux/dcache.h-367-}
--
include/linux/list_bl.h=44=static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h)
--
include/linux/list_bl.h-51-
include/linux/list_bl.h:52:static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
include/linux/list_bl.h-53-{
--
include/linux/list_bl.h=138=static inline void hlist_bl_del_init(struct hlist_bl_node *n)
include/linux/list_bl.h-139-{
include/linux/list_bl.h:140: if (!hlist_bl_unhashed(n)) {
include/linux/list_bl.h-141- __hlist_bl_del(n);
--
include/linux/rculist_bl.h=21=static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
--
include/linux/rculist_bl.h-30- *
include/linux/rculist_bl.h:31: * Note: hlist_bl_unhashed() on entry does not return true after this,
include/linux/rculist_bl.h-32- * the entry is in an undefined state. It is useful for RCU based
] |
| 33/2 | 2026/02/24 16:57 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 36818 output: 32 thoughts: 1267 Thoughts: **Benign RCU Race in Dentry Hashing**
Okay, let's break this down. That comment in `include/linux/rculist_bl.h` caught my eye. The wording, specifically "does not return true," implies a false return from `hlist_bl_unhashed`. Since `hlist_bl_unhashed` returns `!h->pprev`, a false return means `h->pprev` is non-NULL. Right, `hlist_bl_del_rcu` sets `pprev = LIST_POISON2`, which makes sense. So `d_unhashed` returning `false` for a deleted dentry isn't surprising.
However, `dentry` deletion uses `__d_drop`, which *sets* `pprev = NULL`. So `d_unhashed` *should* return `true` for dropped dentries. The existence of `hlist_bl_del_rcu` seems odd if dentries don't use it directly, but maybe other users of `hlist_bl` do.
For dentries, though, the behavior of `d_unhashed` is reliable in this context, modulo the race, of course.
I'm pretty comfortable that this is a benign RCU initialization race. It's a classic scenario where a reader might see a concurrent write, but RCU barriers ensure that the reader, if it sees the node at all, sees the initialized state. Even if the reader misses the initial write, the impact is minimal – a false negative in an RCU lookup.
Now, let's think about the potential "harmful" scenarios. Could `pprev` be seen as NULL when it shouldn't be? Well, that's the crux of the race. If `d_unhashed` is true, we skip the dentry. Can it be non-NULL when it *should* be NULL? Only if the write to `pprev` happens before the dentry is in the list, but we can only get the dentry *from* the list, meaning we have to have seen the write putting it in the list *after* the write to `pprev`.
So, the race is harmless.
Now, the "bad" side: what could go wrong? A crash? No, just a pointer comparison with NULL. Memory corruption? Nope, it's a read. Infinite loop? Not likely. So it's benign.
Hmm, let's make sure. Checking `hlist_bl_add_head_rcu` shows its use in `fs/dcache.c`, specifically in `__d_rehash`.
Actually, a bigger question: is `dentry->d_hash.pprev` used for anything else? The answer is in `__hlist_bl_del`. It's used to find the pointer to the node.
```c
static inline void __hlist_bl_del(struct hlist_bl_node *n)
{
struct hlist_bl_node *next = n->next;
struct hlist_bl_node **pprev = n->pprev;
...
WRITE_ONCE(*pprev, ...);
```
If `pprev` *is* NULL in `__hlist_bl_del`, we crash. But `__hlist_bl_del` is *always* under a lock (the hash bucket lock). The writer in our race is *also* under that same lock. The reader doesn't call `__hlist_bl_del`. Therefore, no crash.
Okay, I'm confident in concluding it's benign.
The `hlist_bl_unhashed` function:
```c
static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h)
{
return !h->pprev;
}
```
This is called by `d_unhashed`. The race is between `__d_rehash` (writing `pprev`) and `__d_lookup_rcu_op_compare` (reading `pprev`). The reader is in an RCU walk. The writer is adding a dentry to the hash. The dentry was previously unhashed. The writer sets `pprev`, then uses `rcu_assign_pointer` to make the dentry visible in the hash table. The reader finds the dentry (which means it saw the `rcu_assign_pointer` or a later write). Thanks to RCU memory ordering, the reader *should* see the non-NULL `pprev`. KCSAN flags it because the accesses aren't atomic. Even if the reader *did* see NULL, it would just skip the dentry. Safe,
|
| 34/2 | 2026/02/24 16:57 | tool | codesearch-struct-layout |
0mArgs:map[ContextFile:fs/dcache.c Name:dentry] Results: map[Fields:[map[Name:d_flags OffsetBits:0 SizeBits:32] map[Name:d_seq OffsetBits:32 SizeBits:32] map[Name:d_hash OffsetBits:64 SizeBits:128] map[Name:d_parent OffsetBits:192 SizeBits:64] map[Name: OffsetBits:256 SizeBits:128] map[Name:d_inode OffsetBits:384 SizeBits:64] map[Name:d_shortname OffsetBits:448 SizeBits:320] map[Name:d_op OffsetBits:768 SizeBits:64] map[Name:d_sb OffsetBits:832 SizeBits:64] map[Name:d_time OffsetBits:896 SizeBits:64] map[Name:d_fsdata OffsetBits:960 SizeBits:64] map[Name:d_lockref OffsetBits:1024 SizeBits:64] map[Name: OffsetBits:1088 SizeBits:128] map[Name:d_sib OffsetBits:1216 SizeBits:128] map[Name:d_children OffsetBits:1344 SizeBits:64] map[Name:d_u OffsetBits:1408 SizeBits:128]]] |
| 35/2 | 2026/02/24 16:57 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 39496 output: 19 thoughts: 538 Thoughts: **Analyzing a Potential Race Condition** Okay, let's break this down. The report points to an address, `0xffff8881006c5490`, and I'm pretty sure it's related to the `dentry`'s `d_hash.pprev` pointer. Based on the offsets I know – `d_hash` at 64 bits (8 bytes), its size being 128 bits (16 bytes), and the `hlist_bl_node`'s layout with `pprev` at offset 8 – if that address is indeed `d_hash.pprev`, then the `dentry` structure itself must start at `0xffff8881006c5480`. The report also indicates writes and reads to `0xffff8881006c5490` of 8 bytes each, which strongly supports my deduction that this is, in fact, `pprev`. So far, so good, I'm comfortable with that. Now, let's consider the code from `include/linux/rculist_bl.h`, specifically `hlist_bl_add_head_rcu`. The crucial part is this line: `first->pprev = &n->next;`. This is also a write, and that's what makes me think. Could this potentially race with a concurrent reader? A reader might be accessing `first->pprev` at the same time this write is happening. The thing is, the code path that matters here is when `d_unhashed` is called on `dentry`, which is the variable `n`. If a reader was on `first`, that reader would check `d_unhashed(first)`. But the change we're observing is the change of `first->pprev` from an address to `&n->next`. `first->pprev` was already non-NULL. It was pointing to `h->first`. It changes to point to `n->next`. In both cases, it's non-NULL. So the condition in `d_unhashed` would still be met and it doesn't matter. The only race that truly matters is the write to `n->pprev` because it changes from NULL to non-NULL. However, the race is benign. No significant data corruption or program misbehavior is expected. I think I've exhausted all the reasonable possibilities. I'm satisfied. |
| 36/2 | 2026/02/24 16:57 | tool | set-results |
0mArgs:map[Benign:true Confident:true] Results: map[Benign:true Confident:true] |
| 37/2 | 2026/02/24 16:57 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 40603 output: 655 thoughts: 147 Thoughts: Here's my analysis of the situation: **Benign Race Condition in Dentry Hashing** Okay, so I've finished digging into this, and it looks like the racing field is `dentry->d_hash.pprev`. This is accessed through `d_unhashed()` during read operations, and it's initialized by `__d_rehash()` using `hlist_bl_add_head_rcu()` during write operations. Classic RCU initialization pattern, really. The node gets set up before it's made visible to the readers. The beauty of RCU, as I see it, is that the memory ordering guarantees mean any reader that actually *sees* the node will also see it in a consistent, initialized state. Even if, somehow, a reader momentarily catches the old, NULL value before it's fully populated (and that's a *big* if with RCU), the worst-case scenario is a safe skip of that `dentry` during a lookup. The slow path handles that perfectly well. So, after considering the RCU guarantees and the potential scenarios, I'm confident in saying this race condition is completely benign. It's a non-issue. |
| Total Calls | Total Tokens | Avg Tokens | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|
| Total Calls | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|