
I don’t consider myself a Go expert and have only occasionally used this language, but I’d like to share a story about a bug at the intersection of Go and the Windows kernel that I was “lucky” enough to encounter.
This bug is still present (GitHub issue #64482), although there’s reason to hope it will be fixed in the next Go release.
Nevertheless, if the stars align unfavorably and your Go program suddenly hangs on a client machine during a CancelIoEx call — and you can’t reproduce or analyze the problem — I hope the material below will help you understand its cause and find a possible workaround.
How the Problem Manifested
In a real-world service, one of the threads on Windows became completely stuck in a CancelIoEx
call.
Microsoft’s documentation claims that CancelIoEx
does not wait for canceled operations to complete:
The CancelIoEx function does not wait for all canceled operations to complete.
However, in practice, the thread remained waiting indefinitely.
The problem turned out to be extremely difficult to reproduce. A full memory dump was required for proper analysis, but without a stable reproduction scenario, this was nearly impossible. This pushed me to start looking for anomalies related to CancelIoEx
. Eventually, I stumbled upon mysterious errors in Go’s TestPipeIOCloseRace — an issue that had been quietly sitting in the backlog for two years.
To make diagnostics easier, I wrote a small tool based on this test.
Hangtest: Reproducing a Rare Bug
To reproduce the problem more reliably, I wrote hangtest — a small application that launches several workers repeatedly creating pipes, reading, writing, and closing them. Essentially, it’s the same TestPipeIOCloseRace
, but on a massive scale.
Here’s the full code so you can reproduce the issue yourself:
package main
import (
"errors"
"fmt"
"io"
"os"
"runtime"
"strings"
"sync"
"time"
)
const (
iterationsPerWorker = 100000
numWorkers = 10 // итого 10_000 итераций параллельно
)
func main() {
if runtime.GOOS == "js" || runtime.GOOS == "wasip1" {
fmt.Printf("skipping on %s: no pipes\n", runtime.GOOS)
return
}
var wg sync.WaitGroup
wg.Add(numWorkers)
for i := 0; i < numWorkers; i++ {
go func(workerID int) {
defer wg.Done()
for j := 0; j < iterationsPerWorker; j++ {
if err := runPipeIOCloseRace(); err != nil {
fmt.Printf("worker %d, iteration %d: %v\n", workerID, j, err)
}
}
}(i)
}
wg.Wait()
}
func runPipeIOCloseRace() error {
r, w, err := Pipe()
if err != nil {
return fmt.Errorf("pipe creation failed: %w", err)
}
var wg sync.WaitGroup
wg.Add(3)
var errOnce sync.Once
var firstErr error
fail := func(e error) {
errOnce.Do(func() {
firstErr = e
})
}
go func() {
defer wg.Done()
for {
n, err := w.Write([]byte("hi"))
if err != nil {
switch {
case errors.Is(err, ErrClosed),
strings.Contains(err.Error(), "broken pipe"),
strings.Contains(err.Error(), "pipe is being closed"),
strings.Contains(err.Error(), "hungup channel"):
// Ignore expected errors
default:
fail(fmt.Errorf("write error: %w", err))
}
return
}
if n != 2 {
fail(fmt.Errorf("wrote %d bytes, expected 2", n))
return
}
}
}()
go func() {
defer wg.Done()
var buf [2]byte
for {
n, err := r.Read(buf[:])
if err != nil {
if err != io.EOF && !errors.Is(err, ErrClosed) {
fail(fmt.Errorf("read error: %w", err))
}
return
}
if n != 2 {
fail(fmt.Errorf("read %d bytes, want 2", n))
}
}
}()
go func() {
defer wg.Done()
time.Sleep(time.Millisecond)
if err := r.Close(); err != nil {
fail(fmt.Errorf("close reader: %w", err))
}
if err := w.Close(); err != nil {
fail(fmt.Errorf("close writer: %w", err))
}
}()
wg.Wait()
return firstErr
}
// --------- Pipe Implementation (simple wrapper using os.Pipe) --------- //
var ErrClosed = errors.New("pipe closed")
func Pipe() (*pipeReader, *pipeWriter, error) {
r, w, err := os.Pipe()
if err != nil {
return nil, nil, err
}
return &pipeReader{r}, &pipeWriter{w}, nil
}
type pipeReader struct {
*os.File
}
type pipeWriter struct {
*os.File
}
Even with this approach, reproducing the problem wasn’t immediate. On my work laptop with a 12th-gen Intel processor, the hang never occurred. The first time I caught it was on Windows 11 ARM64, and later it was finally reproduced on a virtual x64 system. This finally allowed us to collect a dump and analyze it together with Microsoft support.
Cause of the Hang
The joint dump analysis with Microsoft support showed the following:
- Thread #1 was performing a synchronous ReadFile / WriteFile on a pipe.
- At the same time, it was the initiator of asynchronous I/O that was supposed to be canceled.
- The
CancelIoEx
call from another thread queued an APC for cancellation, but it could not be delivered — during synchronous I/O, the Windows kernel disables the delivery of normal kernel APCs. - As a result, the second thread also became blocked, waiting for an APC that would never be delivered.
Since Go doesn’t allow direct thread control, a thread that previously executed asynchronous I/O (e.g., on a socket) may later be reused for synchronous I/O on a pipe. In such a case, that thread must cancel its own asynchronous I/O via CancelIoEx
, but it can’t — because it’s stuck in synchronous I/O.
Microsoft Kernel Team Analysis (WinDbg Output)
Comment from Microsoft on hangtest:
In hangtest.exe, thread (ffffa2029d6c3080) is performing a synchronous WriteFile on a Named Pipe. Because this is synchronous I/O, normal kernel APC delivery is disabled. Another thread (ffffa2029f0d7080) calls CancelIoEx() for the same FileObject. It queues an APC for IRP cancellation and waits for completion. However, the APC cannot be delivered because thread 0xffffa2029d6c3080 is in synchronous wait. As noted, it’s critical not to mix synchronous and asynchronous I/O. To cancel synchronous operations, use CancelSynchronousIo.
Thread executing WriteFile
(synchronous I/O):
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time State
hangtest.exe (ffffa202a18f0080) ffffa2029d6c3080 2d38.1cd0 63ms 344ms 3380 Executive 9m:35.406 Waiting
Irp List:
IRP File Driver Owning Process
ffffa202a47f2b00 (null) Npfs hangtest.exe
# Child-SP Return Call Site
0 ffffe881ff1f7e70 fffff8067f02a4d0 nt!KiSwapContext+0x76
1 ffffe881ff1f7fb0 fffff8067f0299ff nt!KiSwapThread+0x500
2 ffffe881ff1f8060 fffff8067f0292a3 nt!KiCommitThreadWait+0x14f
3 ffffe881ff1f8100 fffff8067f1f1744 nt!KeWaitForSingleObject+0x233
4 ffffe881ff1f81f0 fffff8067f406d96 nt!IopWaitForSynchronousIoEvent+0x50
5 (Inline) ---------------- nt!IopWaitForSynchronousIo+0x23
6 ffffe881ff1f8230 fffff8067f3cfdb5 nt!IopSynchronousServiceTail+0x466
7 ffffe881ff1f82d0 fffff8067f487ec0 nt!IopWriteFile+0x23d
8 ffffe881ff1f83d0 fffff8067f212005 nt!NtWriteFile+0xd0
9 ffffe881ff1f8450 00007fffdb70d5f4 nt!KiSystemServiceCopyEnd+0x25
a 00000055e4fff638 0000000000000000 0x7fffdb70d5f4
Irp Details: ffffa202a47f2b00
Thread Process Frame Count
============================= ===========
ffffa2029d6c3080 hangtest.exe 2
Irp Stack Frame(s)
# Driver Major Minor Dispatch Routine Flg Ctrl Status Device File
=== ================ ===== ===== ================ === ==== ======= ================ ================
->2 \FileSystem\Npfs WRITE 0 IRP_MJ_WRITE 0 1 Pending ffffa202987985f0 ffffa2029f495670
================================================================
ffffa2029f495670
Related File Object: 0xffffa2029f494d10
Device Object: 0xffffa202987985f0 \FileSystem\Npfs
Vpb is NULL
Flags: 0x40082
Synchronous IO
Named Pipe
Handle Created
Thread executing CancelIoEx
:
FsContext: 0xffffc40ddc15a8c0 FsContext2: 0xffffc40ddc6bb291
Private Cache Map: 0x00000001
CurrentByteOffset: 0
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time State
hangtest.exe (ffffa202a18f0080) ffffa2029f0d7080 2d38.241c 31ms 391ms 4151 Executive 9m:35.406 Waiting
# Child-SP Return Call Site
0 ffffe881fedd8fe0 fffff8067f02a4d0 nt!KiSwapContext+0x76
1 ffffe881fedd9120 fffff8067f0299ff nt!KiSwapThread+0x500
2 ffffe881fedd91d0 fffff8067f0292a3 nt!KiCommitThreadWait+0x14f
3 ffffe881fedd9270 fffff8067f4a72dd nt!KeWaitForSingleObject+0x233
4 ffffe881fedd9360 fffff8067f5218dc nt!IopCancelIrpsInThreadList+0x125
5 ffffe881fedd93b0 fffff8067f4a70d6 nt!IopCancelIrpsInThreadListForCurrentProcess+0xc4
6 ffffe881fedd9470 fffff8067f212005 nt!NtCancelIoFileEx+0xc6
7 ffffe881fedd94c0 00007fffdb70e724 nt!KiSystemServiceCopyEnd+0x25
8 00000055e45ffc88 0000000000000000 0x7fffdb70e724
This thread has been waiting 9m:35.406 on a kernel component request
===============================================================
0xffffa2029f494d10
Related File Object: 0xffffa2029fb0dd60
Device Object: 0xffffa202987985f0 \FileSystem\Npfs
Vpb is NULL
Event signalled
Flags: 0x40082
Synchronous IO
Named Pipe
Handle Created
Second Example: Real Case with Sockets and Pipes
In addition to hangtest, we provided Microsoft with a dump from another real-world process where the same mechanism occurred.
In that case, a thread first performed asynchronous I/O on a socket and later got blocked on a synchronous ReadFile for a pipe. Another thread called CancelIoEx() for the AFD endpoint, queuing a normal kernel APC to cancel the IRP and waiting for completion. No progress was made because normal kernel APC delivery was disabled on the thread due to the synchronous I/O wait.
In fact, asynchronous I/O on a socket and synchronous I/O on a pipe overlapped within the same thread. When it became necessary to cancel the I/O via CancelIoEx
, this turned out to be impossible, and the thread hung.
Stacks of the blocked threads:
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time State
reducted_executable_name.exe (ffffaf85feb300c0) ffffaf8602c90080 e88.2b24 57s.703 24s.281 549327 Executive 3h:25:43.718 Waiting
Irp List:
IRP File Driver Owning Process
ffffaf8600e76640 (null) Npfs reducted_executable_name.exe
# Child-SP Return Call Site
0 ffffd005e29d4570 fffff8025d7071d7 nt!KiSwapContext+0x76
1 ffffd005e29d46b0 fffff8025d706d49 nt!KiSwapThread+0x297
2 ffffd005e29d4770 fffff8025d705ad0 nt!KiCommitThreadWait+0x549
3 ffffd005e29d4810 fffff8025dca1548 nt!KeWaitForSingleObject+0x520
4 (Inline) ---------------- nt!IopWaitForSynchronousIo+0x3e
5 ffffd005e29d48e0 fffff8025dc9ffc8 nt!IopSynchronousServiceTail+0x258
6 ffffd005e29d4990 fffff8025d877cc5 nt!NtReadFile+0x688
7 ffffd005e29d4a90 00007fff4e820094 nt!KiSystemServiceCopyEnd+0x25
8 000000002de1fbe8 00007fff4a9aa747 ntdll!ZwReadFile+0x14
9 000000002de1fbf0 000000000047337e KERNELBASE!ReadFile+0x77
a 000000002de1fc70 000000000000056c reducted_executable_name+0x7337e
b 000000002de1fc78 000000c000573ec0 0x56c
c 000000002de1fc80 0000000000008000 0xc000573ec0
d 000000002de1fc88 000000c001701d54 0x8000
e 000000002de1fc90 0000000000000000 0xc001701d54
Irp Details: ffffaf8600e76640
Thread Process Frame Count
=========================== ===========
ffffaf8602c90080 reducted_executable_name.exe 2
Irp Stack Frame(s)
# Driver Major Minor Dispatch Routine Flg Ctrl Status Device File
=== ================ ===== ===== ================ === ==== ======= ================ ================
->2 \FileSystem\Npfs READ 0 IRP_MJ_READ 0 1 Pending ffffaf85f7da2c00 ffffaf86107074b0
==============================================================
ffffaf86107074b0
Related File Object: 0xffffaf8602a1f6a0
Device Object: 0xffffaf85f7da2c00 \FileSystem\Npfs
Vpb is NULL
Flags: 0x40082
Synchronous IO
Named Pipe
Handle Created
File Object is currently busy and has 0 waiters.
Process Thread CID UserTime KernelTime ContextSwitches Wait Reason Time State
reducted_executable_name.exe (ffffaf85feb300c0) ffffaf8602d06080 e88.2840 53s.969 24s.203 539612 Executive 3h:27:07.468 Waiting
# Child-SP Return Call Site
0 ffffd005e2b02630 fffff8025d7071d7 nt!KiSwapContext+0x76
1 ffffd005e2b02770 fffff8025d706d49 nt!KiSwapThread+0x297
2 ffffd005e2b02830 fffff8025d705ad0 nt!KiCommitThreadWait+0x549
3 ffffd005e2b028d0 fffff8025dd06e87 nt!KeWaitForSingleObject+0x520
4 ffffd005e2b029a0 fffff8025dd6b377 nt!IopCancelIrpsInThreadList+0x11f
5 ffffd005e2b029f0 fffff8025dd06cd2 nt!IopCancelIrpsInThreadListForCurrentProcess+0xc3
6 ffffd005e2b02ab0 fffff8025d877cc5 nt!NtCancelIoFileEx+0xc2
7 ffffd005e2b02b00 00007fff4e8211c4 nt!KiSystemServiceCopyEnd+0x25
8 000000002e01fc28 00007fff4a9f9ef0 ntdll!ZwCancelIoFileEx+0x14
9 000000002e01fc30 000000000047337e KERNELBASE!CancelIoEx+0x10
a 000000002e01fc70 000000002e01fc88 reducted_executable_name+0x7337e
0xffffaf86`11746570
\Endpoint
Device Object: 0xffffaf85f7db0b20 \Driver\AFD
Vpb is NULL
Flags: 0x6040400
Queue Irps to Thread
Handle Created
Skip Completion Port Queueing On Success
Skip Set Of File Object Event On Completion
Microsoft’s Conclusion
Microsoft confirmed this is not a traditional bug but a specific behavior of Windows I/O implementation. The behavior of CancelIoEx
in such a situation is indeed possible, even though it’s not reflected in the official documentation.
Unfortunately, Go does not account for this, making it especially problematic: the developer does not expect blocking and encounters it unexpectedly in production. Since Go lacks direct thread control, this scenario is hard to avoid without runtime changes.
Coordination with Go and Microsoft Developers
I was able to get the attention of a Go developer from Microsoft and coordinate their interaction with the Microsoft Kernel Team. The joint analysis confirmed the issue is reproducible and directly related to APC behavior during synchronous I/O.
In my opinion, this should either be properly handled at the Windows kernel level or at least clearly documented for CancelIoEx
.The first option seems unrealistic because any change to APC mechanisms could introduce unpredictable backward-compatibility issues. As a result, the documentation promises non-blocking behavior, but in practice blocking can occur.
Practical Consequences
- In Go tests, this appeared as a rare flaky test — TestPipeIOCloseRace — which would occasionally fail in CI.
- In a real-world service, a thread working asynchronously with a network socket was scheduled to perform synchronous I/O on a pipe — and hung in
CancelIoEx
. - From the outside, the process still appeared alive, but internally a critical goroutine was permanently blocked.
- The problem is extremely difficult to reproduce — it may not appear for weeks in testing, but can suddenly strike in production and freeze a critical service.
Possible Solutions
- Proper handle management
- Do not call CancelIoEx for synchronous handles (created without FILE_FLAG_OVERLAPPED).
- To cancel synchronous operations, use
CancelSynchronousIo
.
These measures eliminate the blocking risk but require changes in the Go runtime.
- Minimize pipe usage
If possible, avoid using pipes in critical paths of the code to reduce the likelihood of hitting this issue. - Wait for the official fix
The Go team confirmed the problem and plans to fix it in Go 1.26 (see issue #64482).
Conclusion
This case demonstrated that a rare flaky test can hide a very real and dangerous production problem.
- Calling
CancelIoEx
on synchronous pipe handles can permanently block a thread, despite the documentation claiming otherwise. - In Go, the scheduler may assign a thread that previously performed asynchronous I/O to a synchronous operation. In such a situation, the thread will be unable to cancel its own asynchronous operations.
- Until the fix in Go 1.26, developers have only workarounds.
⚡ If your Go service on Windows hangs in CancelIoEx
and you can’t reproduce the problem, I hope this post helps you understand the cause and find a temporary workaround.