syzbot |
sign-in | mailing list | source | docs | 🏰 |
| ID | Workflow | Result | Correct | Bug | Created | Started | Finished | Revision | Error |
|---|---|---|---|---|---|---|---|---|---|
| 6bbe9164-dfbf-48af-bb2d-6d186245e810 | assessment-kcsan | Benign: ✅ Confident: ✅ | ❓ | KCSAN: data-race in smc_connect_work / smc_poll | 2026/01/14 23:02 | 2026/01/14 23:02 | 2026/01/14 23:23 | a9d6a79219801d2130df3b1a792c57f0e5428e9f |
The data race occurs between a write to the `sk_err` field of an SMC socket in `smc_connect_work()` and a read of the same field in `smc_poll()`.
### Analysis of the Accesses
1. **Write Access (`smc_connect_work`):**
In `net/smc/af_smc.c`, the `smc_connect_work()` function is responsible for handling the connection process in the background. When the underlying TCP socket (`clcsock`) encounters an error, the error code is copied to the SMC socket:
```c
1605: smc->sk.sk_err = smc->clcsock->sk->sk_err;
```
This write is performed without holding the SMC socket lock (`lock_sock(&smc->sk)`), although it does hold the lock of the underlying TCP socket.
2. **Read Access (`smc_poll`):**
The `smc_poll()` function checks for socket errors to report them to user space via the `EPOLLERR` flag:
```c
2889: if (sk->sk_err)
2890: mask |= EPOLLERR;
```
This read is performed without any locks, which is a standard practice in `poll` implementations in the Linux kernel to ensure that the `poll` system call remains non-blocking and efficient.
### Evaluation
* **Atomicity:** The `sk_err` field is a 4-byte `int`. On the x86 architecture (and most others supported by Linux), a 4-byte aligned integer write/read is atomic, meaning there is no risk of "tearing" (reading a partially written value). The reported value change (`0x0` to `0x6f`, which is `ECONNREFUSED`) confirms a complete word was written.
* **Consequences:** The only consequence of this race is a minor timing window where `poll()` might return a stale value (0 instead of `ECONNREFUSED`). However, `smc_connect_work()` calls `sk_state_change()` (which triggers a wakeup) shortly after setting the error. If `poll()` misses the error in one iteration, the subsequent wakeup will ensure it is caught in the next iteration.
* **Comparison with other protocols:** Other networking protocols like TCP use `READ_ONCE()` and `WRITE_ONCE()` to access `sk_err` outside of locks to satisfy KCSAN and prevent compiler-related reordering issues. SMC's implementation is missing these annotations, which is why KCSAN flags it. However, this does not lead to memory corruption or system instability.
### Conclusion
The data race is **benign**. It involves a simple status flag used for error reporting. While it technically violates the memory model by not using atomic primitives for shared access, it does not lead to any harmful side effects like crashes or memory corruption. It is a common pattern in the network stack that has been identified for cleanup to satisfy sanitizers.
Benign: True
Confident: True================================================================== BUG: KCSAN: data-race in smc_connect_work / smc_poll write to 0xffff88811a82089c of 4 bytes by task 3413 on cpu 0: smc_connect_work+0xef/0x4e0 net/smc/af_smc.c:1606 process_one_work kernel/workqueue.c:3263 [inline] process_scheduled_works+0x4ce/0x9d0 kernel/workqueue.c:3346 worker_thread+0x582/0x770 kernel/workqueue.c:3427 kthread+0x489/0x510 kernel/kthread.c:463 ret_from_fork+0x122/0x1b0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 read to 0xffff88811a82089c of 4 bytes by task 24721 on cpu 1: smc_poll+0x188/0x530 net/smc/af_smc.c:2890 sock_poll+0x20d/0x230 net/socket.c:1443 vfs_poll include/linux/poll.h:82 [inline] select_poll_one fs/select.c:480 [inline] do_select+0x8f1/0xf40 fs/select.c:536 core_sys_select+0x3dc/0x6e0 fs/select.c:677 do_pselect fs/select.c:759 [inline] __do_sys_pselect6 fs/select.c:802 [inline] __se_sys_pselect6+0x216/0x280 fs/select.c:793 __x64_sys_pselect6+0x78/0x90 fs/select.c:793 x64_sys_call+0x28bd/0x3000 arch/x86/include/generated/asm/syscalls_64.h:271 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd2/0x200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00000000 -> 0x0000006f Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 24721 Comm: syz.1.7092 Tainted: G W syzkaller #0 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 ==================================================================
| Seq | Timestamp | Type | Name | Duration |
|---|---|---|---|---|
| 0/0 | 2026/01/14 23:02 | flow | assessment-kcsan |
20mModel:unknown Results: map[Benign:true Confident:true Explanation:The data race occurs between a write to the `sk_err` field of an SMC socket in `smc_connect_work()` and a read of the same field in `smc_poll()`.
### Analysis of the Accesses
1. **Write Access (`smc_connect_work`):**
In `net/smc/af_smc.c`, the `smc_connect_work()` function is responsible for handling the connection process in the background. When the underlying TCP socket (`clcsock`) encounters an error, the error code is copied to the SMC socket:
```c
1605: smc->sk.sk_err = smc->clcsock->sk->sk_err;
```
This write is performed without holding the SMC socket lock (`lock_sock(&smc->sk)`), although it does hold the lock of the underlying TCP socket.
2. **Read Access (`smc_poll`):**
The `smc_poll()` function checks for socket errors to report them to user space via the `EPOLLERR` flag:
```c
2889: if (sk->sk_err)
2890: mask |= EPOLLERR;
```
This read is performed without any locks, which is a standard practice in `poll` implementations in the Linux kernel to ensure that the `poll` system call remains non-blocking and efficient.
### Evaluation
* **Atomicity:** The `sk_err` field is a 4-byte `int`. On the x86 architecture (and most others supported by Linux), a 4-byte aligned integer write/read is atomic, meaning there is no risk of "tearing" (reading a partially written value). The reported value change (`0x0` to `0x6f`, which is `ECONNREFUSED`) confirms a complete word was written.
* **Consequences:** The only consequence of this race is a minor timing window where `poll()` might return a stale value (0 instead of `ECONNREFUSED`). However, `smc_connect_work()` calls `sk_state_change()` (which triggers a wakeup) shortly after setting the error. If `poll()` misses the error in one iteration, the subsequent wakeup will ensure it is caught in the next iteration.
* **Comparison with other protocols:** Other networking protocols like TCP use `READ_ONCE()` and `WRITE_ONCE()` to access `sk_err` outside of locks to satisfy KCSAN and prevent compiler-related reordering issues. SMC's implementation is missing these annotations, which is why KCSAN flags it. However, this does not lead to memory corruption or system instability.
### Conclusion
The data race is **benign**. It involves a simple status flag used for error reporting. While it technically violates the memory model by not using atomic primitives for shared access, it does not lead to any harmful side effects like crashes or memory corruption. It is a common pattern in the network stack that has been identified for cleanup to satisfy sanitizers.
Benign: True
Confident: True] |
| 1/1 | 2026/01/14 23:02 | action | kernel-checkouter |
0mModel:unknown Results: map[KernelSrc:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/src/f273a07432f85b9534f56a2719372a5e00a40ba9] |
| 2/1 | 2026/01/14 23:03 | action | kernel-builder |
3mModel:unknown Results: map[KernelObj:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/build/2e9bc86de744460b905fa622cb3837e1aadcd0aa] |
| 3/1 | 2026/01/14 23:07 | action | codesearch-prepare |
13mModel:unknown Results: map[Index:codesearch-index] |
| 4/1 | 2026/01/14 23:21 | agent | expert |
2mModel:unknown Results: map[Benign:true Confident:true] Instruction: You are an experienced Linux kernel developer tasked with determining if the given kernel data race is benign or not. The data race report is from KCSAN tool. It contains 2 stack traces of the memory accesses that constitute a data race. A "benign" data races are on a simple int/bool variable or similar field, and the accesses are not supposed to be protected by any mutual exclusion primitives. Common examples of such "benign" data races are accesses to various flags fields, statistics counters, and similar. A "benign" data race does not lead to memory corruption/crash with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A non-benign (or "harmful" data race) can lead to corruption/crash even with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A common example of a "harmful" data races is race on a complex container (list/hashmap/etc), where accesses are supposed to be protected by a mutual exclusion primitive. In the final reply explain why you think the given data race is benign or is harmful. Use the provided tools to confirm any assumptions, variables/fields being accessed, etc. In particular, don't make assumptions about the kernel source code, use codesearch tools to read the actual source code. Use set-results tool to provide results of the analysis. It must be called exactly once before the final reply. Ignore results of this tool. Prompt: The data race report is: ================================================================== BUG: KCSAN: data-race in smc_connect_work / smc_poll write to 0xffff88811a82089c of 4 bytes by task 3413 on cpu 0: smc_connect_work+0xef/0x4e0 net/smc/af_smc.c:1606 process_one_work kernel/workqueue.c:3263 [inline] process_scheduled_works+0x4ce/0x9d0 kernel/workqueue.c:3346 worker_thread+0x582/0x770 kernel/workqueue.c:3427 kthread+0x489/0x510 kernel/kthread.c:463 ret_from_fork+0x122/0x1b0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 read to 0xffff88811a82089c of 4 bytes by task 24721 on cpu 1: smc_poll+0x188/0x530 net/smc/af_smc.c:2890 sock_poll+0x20d/0x230 net/socket.c:1443 vfs_poll include/linux/poll.h:82 [inline] select_poll_one fs/select.c:480 [inline] do_select+0x8f1/0xf40 fs/select.c:536 core_sys_select+0x3dc/0x6e0 fs/select.c:677 do_pselect fs/select.c:759 [inline] __do_sys_pselect6 fs/select.c:802 [inline] __se_sys_pselect6+0x216/0x280 fs/select.c:793 __x64_sys_pselect6+0x78/0x90 fs/select.c:793 x64_sys_call+0x28bd/0x3000 arch/x86/include/generated/asm/syscalls_64.h:271 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd2/0x200 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00000000 -> 0x0000006f Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 24721 Comm: syz.1.7092 Tainted: G W syzkaller #0 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 ================================================================== Reply: The data race occurs between a write to the `sk_err` field of an SMC socket in `smc_connect_work()` and a read of the same field in `smc_poll()`.
### Analysis of the Accesses
1. **Write Access (`smc_connect_work`):**
In `net/smc/af_smc.c`, the `smc_connect_work()` function is responsible for handling the connection process in the background. When the underlying TCP socket (`clcsock`) encounters an error, the error code is copied to the SMC socket:
```c
1605: smc->sk.sk_err = smc->clcsock->sk->sk_err;
```
This write is performed without holding the SMC socket lock (`lock_sock(&smc->sk)`), although it does hold the lock of the underlying TCP socket.
2. **Read Access (`smc_poll`):**
The `smc_poll()` function checks for socket errors to report them to user space via the `EPOLLERR` flag:
```c
2889: if (sk->sk_err)
2890: mask |= EPOLLERR;
```
This read is performed without any locks, which is a standard practice in `poll` implementations in the Linux kernel to ensure that the `poll` system call remains non-blocking and efficient.
### Evaluation
* **Atomicity:** The `sk_err` field is a 4-byte `int`. On the x86 architecture (and most others supported by Linux), a 4-byte aligned integer write/read is atomic, meaning there is no risk of "tearing" (reading a partially written value). The reported value change (`0x0` to `0x6f`, which is `ECONNREFUSED`) confirms a complete word was written.
* **Consequences:** The only consequence of this race is a minor timing window where `poll()` might return a stale value (0 instead of `ECONNREFUSED`). However, `smc_connect_work()` calls `sk_state_change()` (which triggers a wakeup) shortly after setting the error. If `poll()` misses the error in one iteration, the subsequent wakeup will ensure it is caught in the next iteration.
* **Comparison with other protocols:** Other networking protocols like TCP use `READ_ONCE()` and `WRITE_ONCE()` to access `sk_err` outside of locks to satisfy KCSAN and prevent compiler-related reordering issues. SMC's implementation is missing these annotations, which is why KCSAN flags it. However, this does not lead to memory corruption or system instability.
### Conclusion
The data race is **benign**. It involves a simple status flag used for error reporting. While it technically violates the memory model by not using atomic primitives for shared access, it does not lead to any harmful side effects like crashes or memory corruption. It is a common pattern in the network stack that has been identified for cleanup to satisfy sanitizers.
Benign: True
Confident: True |
| 5/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: **Data Race Investigation: `smc_connect_work` vs. `smc_poll`** Okay, so the report indicates a classic data race – always fun. We've got a write happening in `smc_connect_work` and a read in `smc_poll`, both targeting the same 4-byte field at `0xffff88811a82089c`. The value flip from `0x00000000` to `0x0000006f` is the smoking gun, naturally. My initial approach, given this information, is to dive right into the source code. Specifically, the report helpfully points us to the key locations. We're looking at `smc_connect_work+0xef/0x4e0` (line 1606) and `smc_poll+0x188/0x530` (line 2890) within `net/smc/af_smc.c`. That's where I need to start my analysis. Let's see what's happening there with those memory accesses. |
| 6/2 | 2026/01/14 23:21 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:smc_connect_work SourceFile:net/smc/af_smc.c] Results: map[Missing:false SourceCode:1594: static void smc_connect_work(struct work_struct *work)
1595: {
1596: struct smc_sock *smc = container_of(work, struct smc_sock,
1597: connect_work);
1598: long timeo = READ_ONCE(smc->sk.sk_sndtimeo);
1599: int rc = 0;
1600:
1601: if (!timeo)
1602: timeo = MAX_SCHEDULE_TIMEOUT;
1603: lock_sock(smc->clcsock->sk);
1604: if (smc->clcsock->sk->sk_err) {
1605: smc->sk.sk_err = smc->clcsock->sk->sk_err;
1606: } else if ((1 << smc->clcsock->sk->sk_state) &
1607: (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
1608: rc = sk_stream_wait_connect(smc->clcsock->sk, &timeo);
1609: if ((rc == -EPIPE) &&
1610: ((1 << smc->clcsock->sk->sk_state) &
1611: (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)))
1612: rc = 0;
1613: }
1614: release_sock(smc->clcsock->sk);
1615: lock_sock(&smc->sk);
1616: if (rc != 0 || smc->sk.sk_err) {
1617: smc->sk.sk_state = SMC_CLOSED;
1618: if (rc == -EPIPE || rc == -EAGAIN)
1619: smc->sk.sk_err = EPIPE;
1620: else if (rc == -ECONNREFUSED)
1621: smc->sk.sk_err = ECONNREFUSED;
1622: else if (signal_pending(current))
1623: smc->sk.sk_err = -sock_intr_errno(timeo);
1624: sock_put(&smc->sk); /* passive closing */
1625: goto out;
1626: }
1627:
1628: rc = __smc_connect(smc);
1629: if (rc < 0)
1630: smc->sk.sk_err = -rc;
1631:
1632: out:
1633: if (!sock_flag(&smc->sk, SOCK_DEAD)) {
1634: if (smc->sk.sk_err) {
1635: smc->sk.sk_state_change(&smc->sk);
1636: } else { /* allow polling before and after fallback decision */
1637: smc->clcsock->sk->sk_write_space(smc->clcsock->sk);
1638: smc->sk.sk_write_space(&smc->sk);
1639: }
1640: }
1641: release_sock(&smc->sk);
1642: }
SourceFile:net/smc/af_smc.c] |
| 7/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: **Initial Assessment of a Potential Connection Refusal** Okay, so I'm looking at this report and it points to line 1606 in `smc_connect_work` which, after a quick check, is actually: `} else if ((1 << smc->clcsock->sk->sk_state) &`. That's not a write operation, which is what the report explicitly says is happening. The report mentions a "write to 0xffff88811a82089c of 4 bytes by task 3413 on cpu 0: smc_connect_work+0xef/0x4e0 net/smc/af_smc.c:1606". Hold on, line 1605 is `smc->sk.sk_err = smc->clcsock->sk->sk_err;`. It's possible that the line numbers are slightly off, or that the provided offset references the beginning of the subsequent statement. That's a definite write. Let me see... 0x6f in decimal is 111. `#define ECONNREFUSED 111 /* Connection refused */`. Right, so it *is* writing to `sk_err` and setting it to `ECONNREFUSED`. That definitely fits the report. The write makes perfect sense in this context. Now that I have a clearer picture of what's happening within `smc_connect_work`, I'll shift my attention to `smc_poll` to see what information it provides. |
| 8/2 | 2026/01/14 23:21 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:smc_poll SourceFile:net/smc/af_smc.c] Results: map[Missing:false SourceCode:2871: __poll_t smc_poll(struct file *file, struct socket *sock,
2872: poll_table *wait)
2873: {
2874: struct sock *sk = sock->sk;
2875: struct smc_sock *smc;
2876: __poll_t mask = 0;
2877:
2878: if (!sk)
2879: return EPOLLNVAL;
2880:
2881: smc = smc_sk(sock->sk);
2882: if (smc->use_fallback) {
2883: /* delegate to CLC child sock */
2884: mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
2885: sk->sk_err = smc->clcsock->sk->sk_err;
2886: } else {
2887: if (sk->sk_state != SMC_CLOSED)
2888: sock_poll_wait(file, sock, wait);
2889: if (sk->sk_err)
2890: mask |= EPOLLERR;
2891: if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
2892: (sk->sk_state == SMC_CLOSED))
2893: mask |= EPOLLHUP;
2894: if (sk->sk_state == SMC_LISTEN) {
2895: /* woken up by sk_data_ready in smc_listen_work() */
2896: mask |= smc_accept_poll(sk);
2897: } else if (smc->use_fallback) { /* as result of connect_work()*/
2898: mask |= smc->clcsock->ops->poll(file, smc->clcsock,
2899: wait);
2900: sk->sk_err = smc->clcsock->sk->sk_err;
2901: } else {
2902: if ((sk->sk_state != SMC_INIT &&
2903: atomic_read(&smc->conn.sndbuf_space)) ||
2904: sk->sk_shutdown & SEND_SHUTDOWN) {
2905: mask |= EPOLLOUT | EPOLLWRNORM;
2906: } else {
2907: sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
2908: set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
2909:
2910: if (sk->sk_state != SMC_INIT) {
2911: /* Race breaker the same way as tcp_poll(). */
2912: smp_mb__after_atomic();
2913: if (atomic_read(&smc->conn.sndbuf_space))
2914: mask |= EPOLLOUT | EPOLLWRNORM;
2915: }
2916: }
2917: if (atomic_read(&smc->conn.bytes_to_rcv))
2918: mask |= EPOLLIN | EPOLLRDNORM;
2919: if (sk->sk_shutdown & RCV_SHUTDOWN)
2920: mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
2921: if (sk->sk_state == SMC_APPCLOSEWAIT1)
2922: mask |= EPOLLIN;
2923: if (smc->conn.urg_state == SMC_URG_VALID)
2924: mask |= EPOLLPRI;
2925: }
2926: }
2927:
2928: return mask;
2929: }
SourceFile:net/smc/af_smc.c] |
| 9/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: **SMC Socket Error Handling: A Deep Dive into a Potential Data Race** Okay, so I'm looking at this issue where there seems to be a data race related to how `sk_err` is handled in the SMC (Shared Memory Communications) code. Specifically, the race involves writing to `sk->sk_err` in `smc_connect_work` and reading it in `smc_poll`. The report pointed me to the critical lines. My initial thought was that a potential read from `smc_poll` at `net/smc/af_smc.c:2890` is missing the `EPOLLERR` flag, due to a data race on `sk->sk_err`. The write happens in `smc_connect_work` at `net/smc/af_smc.c:1606`, or more precisely 1605. I quickly checked the line numbers again, as compiler optimizations can sometimes be misleading. I also observed that `sk_err` is set to different error codes like `EPIPE` or `ECONNREFUSED`. The key concern here is whether the read and write operations on `sk_err` are properly synchronized, because `sk_err` is `0xffff88811a82089c`. My investigation reveals that `smc_poll` reads `sk->sk_err` without holding a lock (that's the first red flag). However, it's called under `rcu_read_lock` and other context. `smc_connect_work`, on the other hand, *writes* to `smc->sk.sk_err` (at 1605), but it only holds the lock on `smc->clcsock->sk` and not `smc->sk`. This is where the race happens. It writes the `sk_err` to `smc` by copying from `clcsock`. I had to double-check: is `sk_err` supposed to be protected by the socket lock? Yes, typically it is. But, I know from experience that many `poll` implementations in the kernel read `sk_err` without locking (e.g., `tcp_poll`). This is likely the root of the problem. The data race is between the write in `smc_connect_work` and the read in `smc_poll`. A likely outcome is that a poll will miss an error. While a data race on an `int` could be troublesome, a 4-byte read/write on most Linux architectures is atomic. It seems like the value changed from 0 to `0x6f` (ECONNREFUSED). I considered the consequences. If `smc_poll` reads 0 and misses the error, it will likely catch it in the next poll, because `sk_state_change` will wake up waiters. I needed to verify why the write at `1605` isn't under `lock_sock(&smc->sk)`. Reviewing `smc_connect_work`, it appears that `lock_sock(&smc->sk)` is acquired *after* the potentially problematic write. So `sk_err` is set before the lock is held. I had to determine if there were other places writing to `sk_err`. The crux of the matter is whether this race is *harmful*. If `smc_poll` misses an error, it should still be caught, eventually, by the subsequent poll calls triggered through `sk_state_change`. I remembered `sock_error(sk)` resets the error, but `smc_poll` reads the error directly. Many other protocols do this in their poll functions. TCP does this, too! I looked into that. Now, why does the code do this copy from the "inner" TCP socket `clcsock->sk` to the "outer" SMC socket? This looks like a lock inversion: acquiring `clcsock->sk` and then `smc->sk`. Does this happen elsewhere in the SMC code? I need to check. This could be the reason for *not* holding the `smc->sk` lock at line 1605, to avoid the inversion. |
| 10/2 | 2026/01/14 23:21 | tool | codesearch-file-index |
0mModel:unknown Args: map[SourceFile:net/smc/af_smc.c] Results: map[Entities:[map[Kind:function Name:__smc_connect] map[Kind:function Name:__smc_create] map[Kind:function Name:__smc_getsockopt] map[Kind:function Name:__smc_release] map[Kind:function Name:__smc_setsockopt] map[Kind:function Name:smc_accept] map[Kind:function Name:smc_accept_dequeue] map[Kind:function Name:smc_accept_enqueue] map[Kind:function Name:smc_accept_poll] map[Kind:function Name:smc_accept_unlink] map[Kind:function Name:smc_adjust_sock_bufsizes] map[Kind:function Name:smc_bind] map[Kind:function Name:smc_check_ism_v2_match] map[Kind:function Name:smc_clcsock_accept] map[Kind:function Name:smc_clcsock_data_ready] map[Kind:function Name:smc_close_non_accepted] map[Kind:function Name:smc_conn_abort] map[Kind:function Name:smc_conn_save_peer_info] map[Kind:function Name:smc_conn_save_peer_info_fce] map[Kind:function Name:smc_connect] map[Kind:function Name:smc_connect_check_aclc] map[Kind:function Name:smc_connect_clc] map[Kind:function Name:smc_connect_decline_fallback] map[Kind:function Name:smc_connect_fallback] map[Kind:function Name:smc_connect_ism] map[Kind:function Name:smc_connect_ism_vlan_cleanup] map[Kind:function Name:smc_connect_ism_vlan_setup] map[Kind:function Name:smc_connect_rdma] map[Kind:function Name:smc_connect_rdma_v2_prepare] map[Kind:function Name:smc_connect_work] map[Kind:function Name:smc_connect_work] map[Kind:function Name:smc_copy_sock_settings] map[Kind:function Name:smc_copy_sock_settings_to_clc] map[Kind:function Name:smc_copy_sock_settings_to_smc] map[Kind:function Name:smc_create] map[Kind:function Name:smc_create_clcsk] map[Kind:function Name:smc_destruct] map[Kind:function Name:smc_exit] map[Kind:function Name:smc_fback_data_ready] map[Kind:function Name:smc_fback_error_report] map[Kind:function Name:smc_fback_forward_wakeup] map[Kind:function Name:smc_fback_mark_woken] map[Kind:function Name:smc_fback_replace_callbacks] map[Kind:function Name:smc_fback_restore_callbacks] map[Kind:function Name:smc_fback_state_change] map[Kind:function Name:smc_fback_wakeup_waitqueue] map[Kind:function Name:smc_fback_write_space] map[Kind:function Name:smc_fill_gid_list] map[Kind:function Name:smc_find_ism_device] map[Kind:function Name:smc_find_ism_store_rc] map[Kind:function Name:smc_find_ism_v1_device_serv] map[Kind:function Name:smc_find_ism_v2_device_clnt] map[Kind:function Name:smc_find_ism_v2_device_serv] map[Kind:function Name:smc_find_ism_v2_is_unique_chid] map[Kind:function Name:smc_find_proposal_devices] map[Kind:function Name:smc_find_rdma_device] map[Kind:function Name:smc_find_rdma_v1_device_serv] map[Kind:function Name:smc_find_rdma_v2_device_serv] map[Kind:function Name:smc_getname] map[Kind:function Name:smc_getsockopt] map[Kind:function Name:smc_hash_sk] map[Kind:function Name:smc_hs_congested] map[Kind:function Name:smc_init] map[Kind:function Name:smc_ioctl] map[Kind:function Name:smc_is_already_selected] map[Kind:function Name:smc_isascii] map[Kind:function Name:smc_link_save_peer_info] map[Kind:function Name:smc_listen] map[Kind:function Name:smc_listen_decline] map[Kind:function Name:smc_listen_find_device] map[Kind:function Name:smc_listen_ism_init] map[Kind:function Name:smc_listen_out] map[Kind:function Name:smc_listen_out_connected] map[Kind:function Name:smc_listen_out_err] map[Kind:function Name:smc_listen_prfx_check] map[Kind:function Name:smc_listen_rdma_finish] map[Kind:function Name:smc_listen_rdma_init] map[Kind:function Name:smc_listen_rdma_reg] map[Kind:function Name:smc_listen_v2_check] map[Kind:function Name:smc_listen_work] map[Kind:function Name:smc_net_exit] map[Kind:function Name:smc_net_init] map[Kind:function Name:smc_net_stat_exit] map[Kind:function Name:smc_net_stat_init] map[Kind:function Name:smc_nl_disable_hs_limitation] map[Kind:function Name:smc_nl_dump_hs_limitation] map[Kind:function Name:smc_nl_enable_hs_limitation] map[Kind:function Name:smc_poll] map[Kind:function Name:smc_recvmsg] map[Kind:function Name:smc_release] map[Kind:function Name:smc_release_cb] map[Kind:function Name:smc_restore_fallback_changes] map[Kind:function Name:smc_sendmsg] map[Kind:function Name:smc_set_keepalive] map[Kind:function Name:smc_setsockopt] map[Kind:function Name:smc_shutdown] map[Kind:function Name:smc_sk_init] map[Kind:function Name:smc_sock_alloc] map[Kind:function Name:smc_splice_read] map[Kind:function Name:smc_stat_fallback] map[Kind:function Name:smc_stat_inc_fback_rsn_cnt] map[Kind:function Name:smc_switch_to_fallback] map[Kind:function Name:smc_tcp_listen_work] map[Kind:function Name:smc_tcp_listen_work] map[Kind:function Name:smc_tcp_syn_recv_sock] map[Kind:function Name:smc_ulp_clone] map[Kind:function Name:smc_ulp_init] map[Kind:function Name:smc_unhash_sk] map[Kind:function Name:smc_v2_determine_accepted_chid] map[Kind:function Name:smcd_conn_save_peer_info] map[Kind:function Name:smcr_clnt_conf_first_link] map[Kind:function Name:smcr_conn_save_peer_info] map[Kind:function Name:smcr_lgr_reg_rmbs] map[Kind:function Name:smcr_lgr_reg_sndbufs] map[Kind:function Name:smcr_serv_conf_first_link]] Missing:false] |
| 11/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: Okay, here's my summary, presented as a stream of thought: **Diving into `smc_setsockopt`** Right, let's get down to brass tacks. I need to take a look at `smc_setsockopt`. I've got a feeling I need to refresh my understanding of how this particular function works within the broader scope of Stream Control Transmission Protocol (SCTP) and its socket options. I remember some of the nuances involved, but a quick review is always a good idea before I start digging into the more complex stuff I'm currently working on. Need to make sure I’m not missing anything obvious before I get bogged down. |
| 12/2 | 2026/01/14 23:21 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:smc_setsockopt SourceFile:net/smc/af_smc.c] Results: map[Missing:false SourceCode:3071: int smc_setsockopt(struct socket *sock, int level, int optname,
3072: sockptr_t optval, unsigned int optlen)
3073: {
3074: struct sock *sk = sock->sk;
3075: struct smc_sock *smc;
3076: int val, rc;
3077:
3078: if (level == SOL_TCP && optname == TCP_ULP)
3079: return -EOPNOTSUPP;
3080: else if (level == SOL_SMC)
3081: return __smc_setsockopt(sock, level, optname, optval, optlen);
3082:
3083: smc = smc_sk(sk);
3084:
3085: /* generic setsockopts reaching us here always apply to the
3086: * CLC socket
3087: */
3088: mutex_lock(&smc->clcsock_release_lock);
3089: if (!smc->clcsock) {
3090: mutex_unlock(&smc->clcsock_release_lock);
3091: return -EBADF;
3092: }
3093: if (unlikely(!smc->clcsock->ops->setsockopt))
3094: rc = -EOPNOTSUPP;
3095: else
3096: rc = smc->clcsock->ops->setsockopt(smc->clcsock, level, optname,
3097: optval, optlen);
3098: if (smc->clcsock->sk->sk_err) {
3099: sk->sk_err = smc->clcsock->sk->sk_err;
3100: sk_error_report(sk);
3101: }
3102: mutex_unlock(&smc->clcsock_release_lock);
3103:
3104: if (optlen < sizeof(int))
3105: return -EINVAL;
3106: if (copy_from_sockptr(&val, optval, sizeof(int)))
3107: return -EFAULT;
3108:
3109: lock_sock(sk);
3110: if (rc || smc->use_fallback)
3111: goto out;
3112: switch (optname) {
3113: case TCP_FASTOPEN:
3114: case TCP_FASTOPEN_CONNECT:
3115: case TCP_FASTOPEN_KEY:
3116: case TCP_FASTOPEN_NO_COOKIE:
3117: /* option not supported by SMC */
3118: if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) {
3119: rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP);
3120: } else {
3121: rc = -EINVAL;
3122: }
3123: break;
3124: case TCP_NODELAY:
3125: if (sk->sk_state != SMC_INIT &&
3126: sk->sk_state != SMC_LISTEN &&
3127: sk->sk_state != SMC_CLOSED) {
3128: if (val) {
3129: SMC_STAT_INC(smc, ndly_cnt);
3130: smc_tx_pending(&smc->conn);
3131: cancel_delayed_work(&smc->conn.tx_work);
3132: }
3133: }
3134: break;
3135: case TCP_CORK:
3136: if (sk->sk_state != SMC_INIT &&
3137: sk->sk_state != SMC_LISTEN &&
3138: sk->sk_state != SMC_CLOSED) {
3139: if (!val) {
3140: SMC_STAT_INC(smc, cork_cnt);
3141: smc_tx_pending(&smc->conn);
3142: cancel_delayed_work(&smc->conn.tx_work);
3143: }
3144: }
3145: break;
3146: case TCP_DEFER_ACCEPT:
3147: smc->sockopt_defer_accept = val;
3148: break;
3149: default:
3150: break;
3151: }
3152: out:
3153: release_sock(sk);
3154:
3155: return rc;
3156: }
SourceFile:net/smc/af_smc.c] |
| 13/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: **Data Race Analysis in `smc_setsockopt`** Okay, so I'm looking at this `smc_setsockopt` function and something's immediately jumping out at me. Specifically, lines 3098-3101, where it's setting `sk->sk_err` without holding the socket lock. It's using the `clcsock_release_lock`, which is a `mutex`. My initial thought is, "Hmm, is that mutex held in `smc_connect_work`?" Nope, it isn't. So, the next question is, "Is there a race condition between `smc_connect_work` and `smc_poll`?" Yes, looks like it. `smc_poll` reads `sk_err` to determine the `EPOLLERR` status, while `smc_connect_work` *writes* `sk_err` to report an error from the underlying TCP socket. Now, if `sk_err` is just an `int` and the architecture handles 32-bit writes atomically – and I'd assume it does – the reader should see a valid value; either the old value (likely 0) or the new value (e.g., `ECONNREFUSED`). Either one is a consistent state. If `poll` sees 0, it doesn't set `EPOLLERR`, if it sees `ECONNREFUSED` it does. This seems like a pretty typical "benign" data race pattern in the networking stack, where status flags are read without explicit locking in poll routines. I've seen it countless times, and I wouldn't usually raise a flag. The thing is, KCSAN flagged this. That always makes me want to dig deeper. The write to `sk_err` could be missed by `poll`, but the `sk_error_report` or `sk_state_change` function is called after the write to `sk_err`, so that should kick things off. Looking at `smc_connect_work`, line 1634, it checks `smc->sk.sk_err` and calls `sk_state_change`. My next step would be to figure out what `sk_state_change` does for SMC. Is it `smc_fback_state_change` or something like that? Let me check where `sk_state_change` is set for SMC. It's probably in `smc_sk_init` or `smc_sock_alloc`. Okay, gotta check those routines next, to be certain about the side effects of this potentially missed error. I also need to make sure that there isn't more to this than meets the eye. |
| 14/2 | 2026/01/14 23:21 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:smc_sk_init SourceFile:net/smc/af_smc.c] Results: map[Missing:false SourceCode: 380: void smc_sk_init(struct net *net, struct sock *sk, int protocol)
381: {
382: struct smc_sock *smc = smc_sk(sk);
383:
384: sk->sk_state = SMC_INIT;
385: sk->sk_destruct = smc_destruct;
386: sk->sk_protocol = protocol;
387: WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem));
388: WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem));
389: INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
390: INIT_WORK(&smc->connect_work, smc_connect_work);
391: INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
392: INIT_LIST_HEAD(&smc->accept_q);
393: sock_lock_init_class_and_name(sk, "slock-AF_SMC", &smc_slock_key,
394: "sk_lock-AF_SMC", &smc_key);
395: spin_lock_init(&smc->accept_q_lock);
396: spin_lock_init(&smc->conn.send_lock);
397: sk->sk_prot->hash(sk);
398: mutex_init(&smc->clcsock_release_lock);
399: smc_init_saved_callbacks(smc);
400: smc->limit_smc_hs = net->smc.limit_smc_hs;
401: smc->use_fallback = false; /* assume rdma capability first */
402: smc->fallback_rsn = 0;
403: smc_close_init(smc);
404: }
SourceFile:net/smc/af_smc.c] |
| 15/2 | 2026/01/14 23:21 | llm | expert |
0mModel:unknown Thoughts: Here's a summary of my thought process: **Diving Deep into Network Socket State Transitions** Okay, so I'm looking at this code and I've noticed something. I see a point where a state change for the socket isn't explicitly set with `sk_state_change`. This is a red flag. My initial thought is that it's relying on the default wakeup function, `sock_def_wakeup`. I need to figure out whether that's intentional or if it's a potential bug or oversight. To understand the intended behavior, I need to check the initialization of saved callbacks, specifically, I need to check `smc_init_saved_callbacks`. That seems like the logical next step. That's probably where the relevant configuration or default setting is established and it'll confirm whether this apparent omission is by design. |
| 16/2 | 2026/01/14 23:22 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:smc_init_saved_callbacks SourceFile:net/smc/af_smc.c] Results: map[Missing:false SourceCode: 334: static inline void smc_init_saved_callbacks(struct smc_sock *smc)
335: {
336: smc->clcsk_state_change = NULL;
337: smc->clcsk_data_ready = NULL;
338: smc->clcsk_write_space = NULL;
339: smc->clcsk_error_report = NULL;
340: }
SourceFile:net/smc/smc.h] |
| 17/2 | 2026/01/14 23:22 | llm | expert |
0mModel:unknown Thoughts: **Deconstructing the `sk_err` Race Condition**
Okay, so I'm digging into this potential race condition involving `sk_err` in the SMC code, and here's the breakdown of my thought process. First, I see that `smc_init_saved_callbacks` is in `net/smc/smc.h` as expected. I'm assuming the standard `sk_state_change` function for `smc->sk` is being used, unless explicitly overridden.
My attention is drawn to `smc_connect_work`. Specifically, lines 1634-1635:
```c
1634: if (smc->sk.sk_err) {
1635: smc->sk.sk_state_change(&smc->sk);
```
The worry is, if `smc_poll` reads `sk_err` as 0, it might miss `EPOLLERR`. But then `sk_state_change` is called. If `smc_poll` was waiting in `sock_poll_wait`, it should wake up and re-evaluate. Okay, `smc_poll` itself has:
```c
2887: if (sk->sk_state != SMC_CLOSED)
2888: sock_poll_wait(file, sock, wait);
2889: if (sk->sk_err)
2890: mask |= EPOLLERR;
```
So, it adds itself to the wait queue, reads `sk_err`, and then if `sk_state_change` is called, it should be fine. The key problem is if `sk_err` is non-zero, but `poll` reads it as zero, and no further wakeups happen, but in this case `sk_state_change` is called AFTER the write so that shouldn't happen.
The question then is, is it possible for the CPU to reorder the write to `sk_err` and the call to `sk_state_change`? Both actions are in the same thread. `sk_state_change` is a function call, and the write to `smc->sk.sk_err` is the worry, but it has to happen before the call. Looking deeper, and it's clear the call happens inside a `lock_sock` / `release_sock` block.
```c
1615: lock_sock(&smc->sk);
...
1632: out:
1633: if (!sock_flag(&smc->sk, SOCK_DEAD)) {
1634: if (smc->sk.sk_err) {
1635: smc->sk.sk_state_change(&smc->sk);
...
1641: release_sock(&smc->sk);
```
So the write to `sk_err` is after the `lock_sock`. The report mentions the write at `smc_connect_work+0xef`. I need to map that. If line 1594 is the start, then 1605 is about 11 lines into the function. Wait, `0xef` is 239 bytes. That means the offending code is further down than 1605. Actually, `0xef` might be 1605.
Let's check line 1605.
```c
1605: smc->sk.sk_err = smc->clcsock->sk->sk_err;
```
This write is NOT protected by `lock_sock(&smc->sk)`. And `smc_poll` reads `sk->sk_err` without any lock!
This is a pretty standard pattern. Reading `sk_err` in `poll` and writing it on error occurs frequently in the network stack. As long as a wakeup is triggered, the race isn't a *huge* concern, since `sk_err` is an integer. However, the SMC code is doing the copy *without* the "outer" socket lock, which is potentially unusual. If it was under `lock_sock`, `smc_poll` still wouldn't take the lock, so the race would still exist.
The bottom line is that since `smc_poll` doesn't hold `lock_sock`, there's ALWAYS a race with any write to `sk_err`. I need to check how TCP handles this. Let's look at `tcp_poll` in `net/ipv4/tcp
|
| 18/2 | 2026/01/14 23:22 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:tcp_poll SourceFile:net/ipv4/tcp_poll.c] Results: map[Missing:false SourceCode: 640: EXPORT_SYMBOL(tcp_poll); SourceFile:net/ipv4/tcp.c] |
| 19/2 | 2026/01/14 23:22 | llm | expert |
0mModel:unknown Thoughts: Here's your summary: **Digging into the Kernel TCP Polling Mechanism** Okay, so I'm trying to trace the path of `tcp_poll`... and I need to figure out where it's *actually* defined. Wait, is it in `net/ipv4/tcp.c`? That seems like a logical place to start, but I'm just going to verify. I'll fire up my trusty search tool... let me see if I can find the definition of `tcp_poll` specifically. Gotta chase this down properly. |
| 20/2 | 2026/01/14 23:22 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:tcp_poll SourceFile:net/ipv4/tcp.c] Results: map[Missing:false SourceCode: 528: /*
529: * Wait for a TCP event.
530: *
531: * Note that we don't need to lock the socket, as the upper poll layers
532: * take care of normal races (between the test and the event) and we don't
533: * go look at any of the socket buffers directly.
534: */
535: __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
536: {
537: __poll_t mask;
538: struct sock *sk = sock->sk;
539: const struct tcp_sock *tp = tcp_sk(sk);
540: u8 shutdown;
541: int state;
542:
543: sock_poll_wait(file, sock, wait);
544:
545: state = inet_sk_state_load(sk);
546: if (state == TCP_LISTEN)
547: return inet_csk_listen_poll(sk);
548:
549: /* Socket is not locked. We are protected from async events
550: * by poll logic and correct handling of state changes
551: * made by other threads is impossible in any case.
552: */
553:
554: mask = 0;
555:
556: /*
557: * EPOLLHUP is certainly not done right. But poll() doesn't
558: * have a notion of HUP in just one direction, and for a
559: * socket the read side is more interesting.
560: *
561: * Some poll() documentation says that EPOLLHUP is incompatible
562: * with the EPOLLOUT/POLLWR flags, so somebody should check this
563: * all. But careful, it tends to be safer to return too many
564: * bits than too few, and you can easily break real applications
565: * if you don't tell them that something has hung up!
566: *
567: * Check-me.
568: *
569: * Check number 1. EPOLLHUP is _UNMASKABLE_ event (see UNIX98 and
570: * our fs/select.c). It means that after we received EOF,
571: * poll always returns immediately, making impossible poll() on write()
572: * in state CLOSE_WAIT. One solution is evident --- to set EPOLLHUP
573: * if and only if shutdown has been made in both directions.
574: * Actually, it is interesting to look how Solaris and DUX
575: * solve this dilemma. I would prefer, if EPOLLHUP were maskable,
576: * then we could set it on SND_SHUTDOWN. BTW examples given
577: * in Stevens' books assume exactly this behaviour, it explains
578: * why EPOLLHUP is incompatible with EPOLLOUT. --ANK
579: *
580: * NOTE. Check for TCP_CLOSE is added. The goal is to prevent
581: * blocking on fresh not-connected or disconnected socket. --ANK
582: */
583: shutdown = READ_ONCE(sk->sk_shutdown);
584: if (shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
585: mask |= EPOLLHUP;
586: if (shutdown & RCV_SHUTDOWN)
587: mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
588:
589: /* Connected or passive Fast Open socket? */
590: if (state != TCP_SYN_SENT &&
591: (state != TCP_SYN_RECV || rcu_access_pointer(tp->fastopen_rsk))) {
592: int target = sock_rcvlowat(sk, 0, INT_MAX);
593: u16 urg_data = READ_ONCE(tp->urg_data);
594:
595: if (unlikely(urg_data) &&
596: READ_ONCE(tp->urg_seq) == READ_ONCE(tp->copied_seq) &&
597: !sock_flag(sk, SOCK_URGINLINE))
598: target++;
599:
600: if (tcp_stream_is_readable(sk, target))
601: mask |= EPOLLIN | EPOLLRDNORM;
602:
603: if (!(shutdown & SEND_SHUTDOWN)) {
604: if (__sk_stream_is_writeable(sk, 1)) {
605: mask |= EPOLLOUT | EPOLLWRNORM;
606: } else { /* send SIGIO later */
607: sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
608: set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
609:
610: /* Race breaker. If space is freed after
611: * wspace test but before the flags are set,
612: * IO signal will be lost. Memory barrier
613: * pairs with the input side.
614: */
615: smp_mb__after_atomic();
616: if (__sk_stream_is_writeable(sk, 1))
617: mask |= EPOLLOUT | EPOLLWRNORM;
618: }
619: } else
620: mask |= EPOLLOUT | EPOLLWRNORM;
621:
622: if (urg_data & TCP_URG_VALID)
623: mask |= EPOLLPRI;
624: } else if (state == TCP_SYN_SENT &&
625: inet_test_bit(DEFER_CONNECT, sk)) {
626: /* Active TCP fastopen socket with defer_connect
627: * Return EPOLLOUT so application can call write()
628: * in order for kernel to generate SYN+data
629: */
630: mask |= EPOLLOUT | EPOLLWRNORM;
631: }
632: /* This barrier is coupled with smp_wmb() in tcp_done_with_error() */
633: smp_rmb();
634: if (READ_ONCE(sk->sk_err) ||
635: !skb_queue_empty_lockless(&sk->sk_error_queue))
636: mask |= EPOLLERR;
637:
638: return mask;
639: }
SourceFile:net/ipv4/tcp.c] |
| 21/2 | 2026/01/14 23:22 | llm | expert |
0mModel:unknown Thoughts: **Digging into the `sk_err` Conundrum** Okay, so I'm looking at this KCSAN report and it seems to be flagging a data race on `sk->sk_err`. My initial thought was to compare `tcp_poll` and `smc_poll` to see what I could find. I noticed immediately that `tcp_poll` uses `READ_ONCE` for `sk->sk_err`, which is interesting, because the `smc_poll` code I was just looking at didn't seem to have that. This seems like a likely culprit, given the context. Looking further, in `tcp_poll`, the code checks `sk->sk_err` directly and adds the `EPOLLERR` flag if it's set. This is exactly the sort of thing that KCSAN would flag – a plain read potentially racing with a write. Let's see how `smc_connect_work` writes to it. Aha, there it is! `smc->sk.sk_err = smc->clcsock->sk->sk_err;` It's a plain write, just as I suspected! My first inclination is that the SMC code is simply missing the necessary `READ_ONCE` and `WRITE_ONCE` macros. If the established protocol is to use these primitives for `sk_err` when accessed without locks, then SMC definitely needs to follow suit. I need to make a note to check if that's the established protocol. Wait, though, hold on. I need to check something else. Many other protocols probably just do plain reads and writes of `sk_err`. It’s just an `int`, right? That raises the question of whether this is even harmful. In general, it’s probably not a huge deal. It *could* lead to undesirable compiler optimizations, like assuming the value doesn't change inside a loop, but in the context of `poll`, it's generally called only once per event, so that's probably not a concern. Okay, let me circle back and double check the `tcp_poll` code. Ah, here it is: `/* This barrier is coupled with smp_wmb() in tcp_done_with_error() */`. There's an `smp_rmb()` *and* `READ_ONCE` in TCP! SMC is missing *both* of those things. Now I need to see where `sk_err` is written in TCP. It mentions `tcp_done_with_error`, so I need to check that out and see if I can find the write. |
| 22/2 | 2026/01/14 23:22 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:tcp_done_with_error SourceFile:net/ipv4/tcp_input.c] Results: map[Missing:false SourceCode:4617: void tcp_done_with_error(struct sock *sk, int err)
4618: {
4619: /* This barrier is coupled with smp_rmb() in tcp_poll() */
4620: WRITE_ONCE(sk->sk_err, err);
4621: smp_wmb();
4622:
4623: tcp_write_queue_purge(sk);
4624: tcp_done(sk);
4625:
4626: if (!sock_flag(sk, SOCK_DEAD))
4627: sk_error_report(sk);
4628: }
SourceFile:net/ipv4/tcp_input.c] |
| 23/2 | 2026/01/14 23:22 | llm | expert |
0mModel:unknown Thoughts: **Initial Assessment and Deep Dive into a Kernel Data Race** Okay, so I'm looking at this KCSAN report. It's flagging a potential data race in the SMC networking code, specifically around the `sk_err` field of a socket structure. TCP uses `WRITE_ONCE` and `smp_wmb()` for atomic and ordered access to `sk_err`, but SMC isn't doing that here. The core of my concern revolves around whether this missing atomicity is actually problematic. Initially, I'm thinking, if it's just about a delayed or reordered error report in `poll`, it's probably not a huge deal. Crashes or data corruption are unlikely since `sk_state_change` follows with barriers. But KCSAN is reporting `sk_err`. It also looks like `smc_poll` is accessing multiple fields without `READ_ONCE`, like `sk_shutdown` and `sk_state`. This means that `smc_poll` is also racing with writes, particularly with regards to `sk_err` if an error occurs. My immediate instinct is to classify this as "benign" because it likely won't cause crashes, but it's *technically* a bug because it's missing the necessary atomic primitives. In the kernel world, these kinds of races, where only a slightly delayed status update is the consequence, are often tolerated if they are easy to fix. But, ideally, we should fix it. I'm now zeroing in on `smc_poll`. If `sk_err` is set, `EPOLLERR` is added to the mask. I'm seeing that KCSAN reports this kind of race frequently and that `READ_ONCE` or `WRITE_ONCE` are then usually added. I'm checking if `sk_err` is used for more than just error reporting – if it's an index or somehow used in a dangerous way. It's just an error code which doesn't seem harmful. The KCSAN report shows `value changed: 0x00000000 -> 0x0000006f`, which is `ECONNREFUSED`. I should search for similar reports to see what fixes were applied. Let's look more closely at `smc_connect_work` to check for other shared variables that it touches. I see `smc->sk.sk_state` is touched there. `sk_state` is an `unsigned char`. Char accesses are usually atomic for aligned accesses, but `sk_state` might be packed. But, KCSAN is specifically highlighting the race on `sk_err`. I need to confirm if `sk_err` is somehow involved in more complex logic. I see that `smc_poll` has attempted to mimic TCP's polling behavior but misses the mark with the atomic reads/writes, specifically on `sk_err`. The use of `sock_poll_wait` suggests that if `sk_err` is set AFTER `sock_poll_wait`, the poller *should* be woken up via `sk_state_change` or `sk_error_report`. Re-checking `sk_state_change`, it looks like `sk_err` is checked *before* `sk_state_change` is called, so I shouldn't have to worry about the compiler/CPU reordering the write to `sk_err` after the check. There is a memory barrier in `release_sock`, so visibility should be guaranteed. This seems to be the kind of "benign" race KCSAN is intended to find: the code *could* be optimized in a way that is hard to reason about. I'm now going to go with the "benign" classification because there won't be a crash. I'm checking to confirm if `sk_err` is used as an index anywhere and, thankfully, it's not. It seems that `smc_poll` itself writes to `sk->sk_err` which is a point of interest. The same value, from `clcsock->sk->sk_err`, is being written. This adds some complexity but suggests that it's less of an issue since the same value is written. Let's check `smc_connect_work` which might be the source of the race. `smc->sk.sk_err` is written to, from `smc->clcsock->sk->sk_err`, and `smc_poll` is reading that very same value from `sk->sk_err` in line 2890 (as per the report). This write happens if `use_fallback` is true, and it reads at 289 |
| 24/2 | 2026/01/14 23:23 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:sock SourceFile:include/net/sock.h] Results: map[Missing:true SourceCode: SourceFile:] |
| 25/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: Okay, let's see... I'm looking at something network-related. Right, the `struct sock`... I vaguely recall that being defined somewhere within the kernel source. Hmm, let me think... `include/net/sock.h` feels right. But I need to confirm that. So, I'll start with a quick check. I'll use the `codesearch-definition-source` command, that should work with just the identifier name, "sock". See if that works, should be pretty quick. Oh, wait a second. I need to be more precise! "sock" by itself is probably going to give me too many results, with functions and variables that happen to contain "sock" in their name. I need to be explicit. It's the `struct sock` that I'm interested in! Right, better make sure I specify `struct sock` in the search term. That should get me exactly what I'm looking for. Let me try that... |
| 26/2 | 2026/01/14 23:23 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:sock SourceFile:include/net/sock.h] Results: map[Missing:true SourceCode: SourceFile:] |
| 27/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience: **Investigating `sock.h` with `codesearch-file-index`** Alright, I'm thinking I should start digging into the `include/net/sock.h` file. Given the kind of problems I'm usually tackling, and how ubiquitous sockets are in this stack, I suspect there might be something interesting in there. I'm going to try leveraging `codesearch-file-index`. It's usually pretty good at quickly pinpointing relevant definitions, structures, and common usage patterns within a specific file. Hopefully, this'll give me a jump start instead of having to hunt around manually. Let's see what turns up... and if I can get some context on anything new or related to the issue I'm looking at. |
| 28/2 | 2026/01/14 23:23 | tool | codesearch-file-index |
0mModel:unknown Args: map[SourceFile:include/net/sock.h] Results: map[Entities:[map[Kind:function Name:SOCKET_I] map[Kind:function Name:SOCK_INODE] map[Kind:function Name:__dst_negative_advice] map[Kind:function Name:__lock_sock] map[Kind:function Name:__lock_sock_fast] map[Kind:function Name:__locked_read_sk_user_data_with_flags] map[Kind:function Name:__rcu_dereference_sk_user_data_with_flags] map[Kind:function Name:__receive_sock] map[Kind:function Name:__release_sock] map[Kind:function Name:__sk_add_backlog] map[Kind:function Name:__sk_add_node] map[Kind:function Name:__sk_backlog_rcv] map[Kind:function Name:__sk_del_bind_node] map[Kind:function Name:__sk_del_node] map[Kind:function Name:__sk_del_node_init] map[Kind:function Name:__sk_dst_check] map[Kind:function Name:__sk_dst_get] map[Kind:function Name:__sk_dst_reset] map[Kind:function Name:__sk_dst_set] map[Kind:function Name:__sk_flush_backlog] map[Kind:function Name:__sk_head] map[Kind:function Name:__sk_mem_raise_allocated] map[Kind:function Name:__sk_mem_reclaim] map[Kind:function Name:__sk_mem_reduce_allocated] map[Kind:function Name:__sk_mem_schedule] map[Kind:function Name:__sk_nulls_add_node_rcu] map[Kind:function Name:__sk_nulls_add_node_tail_rcu] map[Kind:function Name:__sk_nulls_del_node_init_rcu] map[Kind:function Name:__sk_nulls_head] map[Kind:function Name:__sk_prot_rehash] map[Kind:function Name:__sk_queue_drop_skb] map[Kind:function Name:__sk_receive_skb] map[Kind:function Name:__sk_rmem_schedule] map[Kind:function Name:__sk_rx_queue_set] map[Kind:function Name:__sk_stream_is_writeable] map[Kind:function Name:__sk_stream_memory_free] map[Kind:function Name:__sock_cmsg_send] map[Kind:function Name:__sock_put] map[Kind:function Name:__sock_queue_rcv_skb] map[Kind:function Name:__sock_recv_cmsgs] map[Kind:function Name:__sock_recv_timestamp] map[Kind:function Name:__sock_recv_wifi_status] map[Kind:function Name:__sock_tx_timestamp] map[Kind:function Name:__sock_wfree] map[Kind:function Name:_sock_tx_timestamp] map[Kind:function Name:bpf_skops_tx_timestamping] map[Kind:function Name:do_sock_getsockopt] map[Kind:function Name:do_sock_setsockopt] map[Kind:function Name:dst_negative_advice] map[Kind:function Name:gfp_any] map[Kind:function Name:gfp_memcg_charge] map[Kind:function Name:lock_sock] map[Kind:function Name:lock_sock_fast] map[Kind:function Name:lock_sock_fast_nested] map[Kind:function Name:lock_sock_nested] map[Kind:function Name:lockdep_sock_is_held] map[Kind:function Name:mem_cgroup_from_sk] map[Kind:function Name:mem_cgroup_sk_enabled] map[Kind:function Name:mem_cgroup_sk_under_memory_pressure] map[Kind:function Name:net_tx_rndhash] map[Kind:function Name:proto_register] map[Kind:function Name:proto_sockets_allocated_sum_positive] map[Kind:function Name:proto_unregister] map[Kind:function Name:release_sock] map[Kind:function Name:sk_acceptq_added] map[Kind:function Name:sk_acceptq_is_full] map[Kind:function Name:sk_acceptq_removed] map[Kind:function Name:sk_add_backlog] map[Kind:function Name:sk_add_bind_node] map[Kind:function Name:sk_add_node] map[Kind:function Name:sk_add_node_rcu] map[Kind:function Name:sk_add_node_tail_rcu] map[Kind:function Name:sk_alloc] map[Kind:function Name:sk_backlog_rcv] map[Kind:function Name:sk_can_gso] map[Kind:function Name:sk_capable] map[Kind:function Name:sk_clear_bit] map[Kind:function Name:sk_clear_memalloc] map[Kind:function Name:sk_clone_lock] map[Kind:function Name:sk_common_release] map[Kind:function Name:sk_del_node_init] map[Kind:function Name:sk_destruct] map[Kind:function Name:sk_dev_equal_l3scope] map[Kind:function Name:sk_drops_add] map[Kind:function Name:sk_drops_inc] map[Kind:function Name:sk_drops_read] map[Kind:function Name:sk_drops_reset] map[Kind:function Name:sk_drops_skbadd] map[Kind:function Name:sk_dst_check] map[Kind:function Name:sk_dst_confirm] map[Kind:function Name:sk_dst_get] map[Kind:function Name:sk_dst_reset] map[Kind:function Name:sk_dst_set] map[Kind:function Name:sk_eat_skb] map[Kind:function Name:sk_entry] map[Kind:function Name:sk_error_report] map[Kind:function Name:sk_flush_backlog] map[Kind:function Name:sk_forward_alloc_add] map[Kind:function Name:sk_free] map[Kind:function Name:sk_fullsock] map[Kind:function Name:sk_get_meminfo] map[Kind:function Name:sk_get_rmem0] map[Kind:function Name:sk_get_wmem0] map[Kind:function Name:sk_getsockopt] map[Kind:function Name:sk_gfp_mask] map[Kind:function Name:sk_gso_disable] map[Kind:function Name:sk_has_account] map[Kind:function Name:sk_has_allocations] map[Kind:function Name:sk_hashed] map[Kind:function Name:sk_head] map[Kind:function Name:sk_incoming_cpu_update] map[Kind:function Name:sk_ioctl] map[Kind:function Name:sk_is_inet] map[Kind:function Name:sk_is_readable] map[Kind:function Name:sk_is_refcounted] map[Kind:function Name:sk_is_stream_unix] map[Kind:function Name:sk_is_tcp] map[Kind:function Name:sk_is_udp] map[Kind:function Name:sk_is_unix] map[Kind:function Name:sk_is_vsock] map[Kind:function Name:sk_listener] map[Kind:function Name:sk_listener_or_tw] map[Kind:function Name:sk_may_scm_recv] map[Kind:function Name:sk_mc_loop] map[Kind:function Name:sk_mem_charge] map[Kind:function Name:sk_mem_pages] map[Kind:function Name:sk_mem_reclaim] map[Kind:function Name:sk_mem_reclaim_final] map[Kind:function Name:sk_mem_uncharge] map[Kind:function Name:sk_memalloc_socks] map[Kind:function Name:sk_net_capable] map[Kind:function Name:sk_net_refcnt_upgrade] map[Kind:function Name:sk_next] map[Kind:function Name:sk_node_init] map[Kind:function Name:sk_ns_capable] map[Kind:function Name:sk_nulls_add_node_rcu] map[Kind:function Name:sk_nulls_del_node_init_rcu] map[Kind:function Name:sk_nulls_head] map[Kind:function Name:sk_nulls_next] map[Kind:function Name:sk_owner_clear] map[Kind:function Name:sk_owner_put] map[Kind:function Name:sk_owner_set] map[Kind:function Name:sk_pacing_shift_update] map[Kind:function Name:sk_page_frag] map[Kind:function Name:sk_page_frag_refill] map[Kind:function Name:sk_peek_offset] map[Kind:function Name:sk_peek_offset_bwd] map[Kind:function Name:sk_peek_offset_fwd] map[Kind:function Name:sk_prot_clear_nulls] map[Kind:function Name:sk_prot_mem_limits] map[Kind:function Name:sk_rcvqueues_full] map[Kind:function Name:sk_receive_skb] map[Kind:function Name:sk_requests_wifi_status] map[Kind:function Name:sk_reset_timer] map[Kind:function Name:sk_rethink_txhash] map[Kind:function Name:sk_rmem_alloc_get] map[Kind:function Name:sk_rmem_schedule] map[Kind:function Name:sk_rx_queue_clear] map[Kind:function Name:sk_rx_queue_get] map[Kind:function Name:sk_rx_queue_set] map[Kind:function Name:sk_rx_queue_update] map[Kind:function Name:sk_send_sigurg] map[Kind:function Name:sk_set_bit] map[Kind:function Name:sk_set_memalloc] map[Kind:function Name:sk_set_peek_off] map[Kind:function Name:sk_set_socket] map[Kind:function Name:sk_set_txhash] map[Kind:function Name:sk_setsockopt] map[Kind:function Name:sk_setup_caps] map[Kind:function Name:sk_sleep] map[Kind:function Name:sk_sockets_allocated_dec] map[Kind:function Name:sk_sockets_allocated_inc] map[Kind:function Name:sk_sockets_allocated_read_positive] map[Kind:function Name:sk_stop_timer] map[Kind:function Name:sk_stop_timer_sync] map[Kind:function Name:sk_stream_error] map[Kind:function Name:sk_stream_is_writeable] map[Kind:function Name:sk_stream_kill_queues] map[Kind:function Name:sk_stream_memory_free] map[Kind:function Name:sk_stream_min_wspace] map[Kind:function Name:sk_stream_moderate_sndbuf] map[Kind:function Name:sk_stream_wait_close] map[Kind:function Name:sk_stream_wait_connect] map[Kind:function Name:sk_stream_wait_memory] map[Kind:function Name:sk_stream_write_space] map[Kind:function Name:sk_stream_wspace] map[Kind:function Name:sk_tx_queue_clear] map[Kind:function Name:sk_tx_queue_get] map[Kind:function Name:sk_tx_queue_set] map[Kind:function Name:sk_uid] map[Kind:function Name:sk_under_cgroup_hierarchy] map[Kind:function Name:sk_unhashed] map[Kind:function Name:sk_unused_reserved_mem] map[Kind:function Name:sk_user_data_is_nocopy] map[Kind:function Name:sk_user_ns] map[Kind:function Name:sk_wait_data] map[Kind:function Name:sk_wake_async] map[Kind:function Name:sk_wake_async_rcu] map[Kind:function Name:sk_wmem_alloc_get] map[Kind:function Name:sk_wmem_queued_add] map[Kind:function Name:sk_wmem_schedule] map[Kind:function Name:skb_add_data_nocache] map[Kind:function Name:skb_clone_and_charge_r] map[Kind:function Name:skb_copy_to_page_nocache] map[Kind:function Name:skb_do_copy_data_nocache] map[Kind:function Name:skb_get_tx_timestamp] map[Kind:function Name:skb_has_tx_timestamp] map[Kind:function Name:skb_orphan_partial] map[Kind:function Name:skb_prepare_for_gro] map[Kind:function Name:skb_set_hash_from_sk] map[Kind:function Name:skb_set_owner_edemux] map[Kind:function Name:skb_set_owner_r] map[Kind:function Name:skb_set_owner_sk_safe] map[Kind:function Name:skb_set_owner_w] map[Kind:function Name:skb_setup_tx_timestamp] map[Kind:function Name:skb_sk_is_prefetched] map[Kind:function Name:skwq_has_sleeper] map[Kind:function Name:sock_alloc_send_pskb] map[Kind:function Name:sock_alloc_send_skb] map[Kind:function Name:sock_allow_reclassification] map[Kind:function Name:sock_bind_add] map[Kind:function Name:sock_bindtoindex] map[Kind:function Name:sock_cmsg_send] map[Kind:function Name:sock_common_getsockopt] map[Kind:function Name:sock_common_recvmsg] map[Kind:function Name:sock_common_setsockopt] map[Kind:function Name:sock_confirm_neigh] map[Kind:function Name:sock_copy_flags] map[Kind:function Name:sock_copy_user_timeval] map[Kind:function Name:sock_def_readable] map[Kind:function Name:sock_dequeue_err_skb] map[Kind:function Name:sock_edemux] map[Kind:function Name:sock_efree] map[Kind:function Name:sock_enable_timestamp] map[Kind:function Name:sock_error] map[Kind:function Name:sock_flag] map[Kind:function Name:sock_gen_put] map[Kind:function Name:sock_get_timeout] map[Kind:function Name:sock_gettstamp] map[Kind:function Name:sock_graft] map[Kind:function Name:sock_hold] map[Kind:function Name:sock_i_ino] map[Kind:function Name:sock_init_data] map[Kind:function Name:sock_init_data_uid] map[Kind:function Name:sock_intr_errno] map[Kind:function Name:sock_inuse_add] map[Kind:function Name:sock_inuse_get] map[Kind:function Name:sock_ioctl_inout] map[Kind:function Name:sock_kfree_s] map[Kind:function Name:sock_kmalloc] map[Kind:function Name:sock_kmemdup] map[Kind:function Name:sock_kzfree_s] map[Kind:function Name:sock_load_diag_module] map[Kind:function Name:sock_net] map[Kind:function Name:sock_net_set] map[Kind:function Name:sock_net_uid] map[Kind:function Name:sock_no_accept] map[Kind:function Name:sock_no_bind] map[Kind:function Name:sock_no_connect] map[Kind:function Name:sock_no_getname] map[Kind:function Name:sock_no_ioctl] map[Kind:function Name:sock_no_linger] map[Kind:function Name:sock_no_listen] map[Kind:function Name:sock_no_mmap] map[Kind:function Name:sock_no_recvmsg] map[Kind:function Name:sock_no_sendmsg] map[Kind:function Name:sock_no_sendmsg_locked] map[Kind:function Name:sock_no_shutdown] map[Kind:function Name:sock_no_socketpair] map[Kind:function Name:sock_not_owned_by_me] map[Kind:function Name:sock_omalloc] map[Kind:function Name:sock_orphan] map[Kind:function Name:sock_owned_by_me] map[Kind:function Name:sock_owned_by_user] map[Kind:function Name:sock_owned_by_user_nocheck] map[Kind:function Name:sock_pfree] map[Kind:function Name:sock_poll_wait] map[Kind:function Name:sock_prot_inuse_add] map[Kind:function Name:sock_prot_inuse_get] map[Kind:function Name:sock_put] map[Kind:function Name:sock_queue_err_skb] map[Kind:function Name:sock_queue_rcv_skb] map[Kind:function Name:sock_queue_rcv_skb_reason] map[Kind:function Name:sock_rcvlowat] map[Kind:function Name:sock_rcvtimeo] map[Kind:function Name:sock_read_timestamp] map[Kind:function Name:sock_recv_cmsgs] map[Kind:function Name:sock_recv_errqueue] map[Kind:function Name:sock_recv_timestamp] map[Kind:function Name:sock_release_ownership] map[Kind:function Name:sock_replace_proto] map[Kind:function Name:sock_reset_flag] map[Kind:function Name:sock_rfree] map[Kind:function Name:sock_rps_reset_rxhash] map[Kind:function Name:sock_rps_save_rxhash] map[Kind:function Name:sock_set_flag] map[Kind:function Name:sock_set_keepalive] map[Kind:function Name:sock_set_mark] map[Kind:function Name:sock_set_priority] map[Kind:function Name:sock_set_rcvbuf] map[Kind:function Name:sock_set_reuseaddr] map[Kind:function Name:sock_set_reuseport] map[Kind:function Name:sock_set_sndtimeo] map[Kind:function Name:sock_set_timestamp] map[Kind:function Name:sock_set_timestamping] map[Kind:function Name:sock_setsockopt] map[Kind:function Name:sock_skb_set_dropcount] map[Kind:function Name:sock_sndtimeo] map[Kind:function Name:sock_tx_timestamp] map[Kind:function Name:sock_valbool_flag] map[Kind:function Name:sock_wfree] map[Kind:function Name:sock_wmalloc] map[Kind:function Name:sock_write_timestamp] map[Kind:function Name:sock_writeable] map[Kind:function Name:sock_wspace] map[Kind:function Name:sockcm_init] map[Kind:function Name:sockopt_capable] map[Kind:function Name:sockopt_lock_sock] map[Kind:function Name:sockopt_ns_capable] map[Kind:function Name:sockopt_release_sock] map[Kind:function Name:tcp_stream_memory_free] map[Kind:function Name:tcp_v4_do_rcv] map[Kind:function Name:tcp_v6_do_rcv] map[Kind:function Name:unlock_sock_fast]] Missing:false] |
| 29/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience, presented in the first person: **Navigating the `struct sock` Labyrinth** Alright, so I'm diving into the kernel's network stack again. I need to refresh my memory on the core socket structure. Specifically, I know I'll be starting by dissecting `struct sock`. I'm pretty sure it's defined in the standard include path, so I'll begin by checking in `include/net/sock.h`. That's generally where all the foundational socket-level definitions reside. From there, I'll be able to trace its member variables and their roles within the net subsystem. |
| 30/2 | 2026/01/14 23:23 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:sock SourceFile:include/net/sock.h] Results: map[Missing:true SourceCode: SourceFile:] |
| 31/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: Okay, here's my interpretation of that thought process, tailored to an expert like myself: **Delving into `struct sock` with `codesearch-definition-source`** Alright, so I'm thinking about digging into the `struct sock` definition. Specifically, I'm going to leverage `codesearch-definition-source`. That's my go-to when I want to quickly understand the structure, where it's defined, and understand its context within the codebase. I need to get a clear picture of this structure in a quick manner. I'll be looking for where it's defined, which header files are involved, and maybe even a quick glimpse at some of its key members. Ultimately, I want the quickest path to understanding the architecture of the sock structure. |
| 32/2 | 2026/01/14 23:23 | tool | codesearch-definition-source |
0mModel:unknown Args: map[IncludeLines:true Name:struct sock SourceFile:include/net/sock.h] Results: map[Missing:true SourceCode: SourceFile:] |
| 33/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: **Data Race Assessment: `smc->sk.sk_err`** Okay, so I'm diving into this KCSAN report. My initial thought is to quickly locate the `struct sock` definition, probably in `include/net/sock.h`. I might even try to find it using something like grep, or I'd use `codesearch-definition-source`. Wait, I also need to consider that `struct sock_common` is often used. But, for this, I don't even need the exact offset; I know `sk_err` is a 4-byte integer. Now, is there any reason this race could be harmful? I see that in `smc_connect_work`, the code copies `clcsock->sk->sk_err` to `smc->sk.sk_err`. Both are just `int` fields. If the TCP stack is *also* writing `clcsock->sk->sk_err`, that would be another race. But I'm pretty sure the TCP stack uses `WRITE_ONCE` in those cases. The actual race here, as KCSAN points out, is between `smc_connect_work` (the writer) and `smc_poll` (the reader) on `smc->sk.sk_err`. Considering that: (1) It's a race on an `int`. (2) The reader is a `poll` function, which is often lockless to avoid blocking. (3) The writer is a background task propagating an error. (4) The written value is a standard error code. (5) Missing an update in `poll` is transient, because there's a wakeup later. (6) This pattern of lockless reading of `sk_err` in `poll` is common (though ideally with `READ_ONCE`). It seems pretty likely to be "benign". It won't cause memory corruption or a crash; it's a minor correctness issue (missing `READ_ONCE`/`WRITE_ONCE`) that KCSAN rightly flags. Just to be sure, I need to check: Can `smc->sk.sk_err` be written with a value larger than 32 bits? No, it's an `int`. Could the compiler do something crazy with multiple writes to the same location? Unlikely with a simple assignment. I'm leaning toward "benign." I should double-check if setting `sk_err` has any *other* side effects, or whether some other functions check it and then take different action based on it. Those checks are usually done *while* holding `lock_sock`. In `smc_connect_work` itself, there's a check on `sk_err` *while* holding `lock_sock(&smc->sk)`. If the error was set without the lock at 1605, it is then read with the lock at 1616. This is fine for that function's logic. So, the race is *only* with `smc_poll`, (and potentially other readers/writers.) Is there any other writer? `smc_setsockopt` writes to `sk_err`, and perhaps `smc_clcsock_data_ready`, or other callbacks might too. In `smc_poll`, the code checks `sk->sk_err` and sets the `EPOLLERR` flag. If `smc_poll` is called from `select` or `poll` it just returns the mask, but in these kinds of things, it's pretty common that you have these kinds of races. I'm feeling pretty confident that it's benign now. Now, just to be *completely* certain, does `sk_err` affect calculations, like length? No, it's always just an error code. One last thing: the value changed from 0 to `0x6f`, which is 111, or `ECONNREFUSED`. That's exactly what I'd expect for a connection failure. Final check against "harmful" criteria: No corruption/crash; it's not a complex container; and yes, it's supposed to be protected by mutual exclusion, but the `poll` situation is a known exception, which is lockless (ideally with `READ_ONCE`). SMC's failure to use `READ_ONCE` is a data race according to KCSAN, but that doesn't make it "harmful" in terms of causing crashes. Actually, a lot of KCSAN reports in the kernel are for precisely this type of "benign" race on status flags. I'll mark it as benign. One extra detail: The `smc_connect_work` line |
| 34/2 | 2026/01/14 23:23 | tool | set-results |
0mModel:unknown Args: map[Benign:true Confident:true] Results: map[Benign:true Confident:true] |
| 35/2 | 2026/01/14 23:23 | llm | expert |
0mModel:unknown Thoughts: **Benign Data Race in SMC Connection Handling** Okay, I've got the conclusion, now let's articulate the reasoning clearly. I've pinpointed a data race. The key is how `smc_connect_work` writes to `sk->sk_err` and `smc_poll` reads it. This `sk->sk_err` is just an integer that holds the socket error code. Now, the crux of the issue: `smc_poll` is reading this error code without holding the socket lock. The goal, of course, is to avoid blocking the `poll` system call – a pretty standard practice, and honestly, a good one to minimize latency. Ideally, I'd expect to see `READ_ONCE()` and `WRITE_ONCE()` here, like you'd find in a lot of other places in the kernel network stack, like `tcp_poll` or when `tcp_done_with_error` is called. However, it's just a simple, aligned integer. I wouldn't expect memory corruption or a crash on the architectures we're supporting. What I'm seeing is an error code change from 0 to `0x6f` (that's `ECONNREFUSED`), which is exactly what we'd expect for a connection error. The race *could* lead to a slightly delayed error report from `poll()`, but the key here is that a wakeup, triggered by `sk_state_change`, happens immediately after the write. Ultimately, the poller will get the correct state. So, taking all that into account, it's a benign data race. It's not pretty, and I might recommend a more robust solution in a code review for an extra level of safety, but I'm comfortable categorizing it as non-critical. |
| Total Calls | Total Tokens | Avg Tokens | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|
| Total Calls | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|