Skip to content

Instantly share code, notes, and snippets.

@egonelbre
Last active April 21, 2026 11:48
Show Gist options
  • Select an option

  • Save egonelbre/18432be81e1a4e18887e1590ead6f496 to your computer and use it in GitHub Desktop.

Select an option

Save egonelbre/18432be81e1a4e18887e1590ead6f496 to your computer and use it in GitHub Desktop.
Reproducer: SIGSEGV when .NET threadpool calls Go via cgo while SIGRTMIN+2 is delivered (Go sigaltstack too small for CoreCLR's handler)
bin/
obj/
libgolib.so
libgolib.h

.NET + Go cgo sigaltstack-race reproducer

AI-assisted content. This reproducer, the analysis below, and the proposed-fix section were produced in collaboration with an AI coding assistant (Claude). The reproducer reliably crashes on the developer box it was authored on (3/3 SIGSEGV) and the core-dump evidence is reproducible, but the prose interpretation — register-state explanations, Go-runtime code references, fix proposals — is AI-generated and not independently reviewed by a Go runtime or CoreCLR maintainer. Verify claims against the code before acting on them.

Self-contained reproducer for a SIGSEGV in CoreCLR's signal-dispatch chain when a .NET threadpool thread calls into Go via cgo while an RT signal is delivered to it.

Quick start

./run.sh                       # build + run in signal mode (synthetic tgkill) until crash
./run.sh gc [N]                # build + run in GC mode (no tgkill — pure GC pressure)
./run.sh fix [signal|gc] [N]   # run with the C#-side sigaltstack shim (REPRO_FIX=1)
./run.sh build                 # build only
./run.sh run                   # run once
./run.sh loop 50               # run up to 50 times in a row
./run.sh gdb                   # run under gdb and dump a core at crash → ./crash/core

Two reproduction modes, both reach the same crash:

  • signal — a dedicated thread fires kernel signal 34 (CoreCLR's INJECT_ACTIVATION_SIGNAL) at every thread via tgkill. Synthetic but very fast: 3/3 SIGSEGV in plain runs on the dev box.
  • gc — no tgkill from us. Each worker allocates ~16 KB of short-lived garbage between Ping() calls, and a helper thread calls GC.Collect(2, Forced, blocking: true) at the configured interval. Lets CoreCLR's own GC fire the activation signal naturally. Also 3/3–5/5 SIGSEGV on the dev box.

Prerequisites

Tool Tested version Required for
Go 1.25.3, 1.26.2 go build -buildmode=c-shared
.NET SDK 10.0.106 dotnet build / run
gcc / glibc dev any recent cgo (CGO_ENABLED=1) linking
gdb 15.x only for ./run.sh gdb

Install on Ubuntu 24.04

# gcc + headers + gdb
sudo apt update
sudo apt install -y build-essential gdb

# Go (snap channel tracks latest; use --channel=1.25/stable if you want
# to match CI exactly, or grab a tarball from https://go.dev/dl/)
sudo snap install go --classic

# .NET SDK 10 — Ubuntu 24.04 has it in the default archive
sudo apt install -y dotnet-sdk-10.0
# If your release doesn't have dotnet-sdk-10.0 yet, use Microsoft's feed:
#   https://learn.microsoft.com/dotnet/core/install/linux-ubuntu

# Sanity check
go version && dotnet --version && gcc --version | head -1 && gdb --version | head -1

To match CI exactly (Go 1.25), install alongside the snap:

mkdir -p ~/sdk
curl -sSL https://go.dev/dl/go1.25.3.linux-amd64.tar.gz \
    | tar -C ~/sdk -xz && mv ~/sdk/go ~/sdk/go1.25
# then invoke as ~/sdk/go1.25/bin/go (or export PATH=$HOME/sdk/go1.25/bin:$PATH)

Go 1.25 and 1.26 both reproduce the crash identically; the race is not fixed in 1.26.

Files

File Purpose
golib.go Trivial cgo export: ping() int { return 42 }.
go.mod Go module declaration for golib.go.
Program.cs .NET host — P/Invokes ping, fires signal 34 in loop.
repro-dotnet.csproj .NET 10 console app project.
run.sh Build + run helper.
sigstack_helper.c C#-side mitigation shim (REPRO_FIX=1).

What it does

  1. Drives the Go ping() function (from the co-located libgolib.so) on 32 concurrent Task.Run workers, 1 000 000 calls each. Every call causes Go to needm/dropm an M thread on the .NET TP Worker, which re-registers Go's 32 KB sigaltstack on the thread.
  2. Starts a dedicated .NET Thread that enumerates every TID via Process.GetCurrentProcess().Threads and sends kernel signal 34 via tgkill(2) every 50 µs. Signal 34 is what the real CI crashes show a sibling thread sending, and it's what CoreCLR's own GC/JIT coordination fires at managed threads under normal operation (see "Which signal, and who sends it?" below).

Both ingredients are required — removing either makes the crash disappear on this hardware:

  • Remove the cgo workers → no Go-owned sigaltstack on the thread, no crash.
  • Remove the signal sender → nothing lands on the alt stack, no crash.

Manual build (if you don't want run.sh)

# 1. Build the tiny Go c-shared helper.
CGO_ENABLED=1 go build -buildmode=c-shared -o libgolib.so golib.go

# 2. Build the .NET host.
dotnet build -c Release

# 3. Run.
LD_LIBRARY_PATH=. ./bin/Release/net10.0/repro-dotnet

Tuning env vars (all optional):

Var Default Effect
REPRO_MODE signal signal = synthetic tgkill, gc = GC pressure
REPRO_WORKERS 32 Parallel .NET worker tasks
REPRO_ITERATIONS 1 000 000 ping() calls per worker
REPRO_INTERVAL_US 50 Signal send / GC.Collect() interval
REPRO_ALLOC_BYTES 16 384 Garbage allocated per ping in gc mode
REPRO_FIX unset 1 = pre-install 1 MiB sigaltstack per thread

Additional runtime knobs (not specific to this repro):

Var Effect
DOTNET_gcServer=0 Force Workstation GC (default is set to Server in .csproj)

Observed behaviour on the dev box

Scenario Outcome
signal mode, plain run SIGSEGV, 3/3 attempts
signal mode under strace -f -e trace=signal PASS, 5/5 attempts
gc mode (forced GC.Collect every 50 µs) — Server GC SIGSEGV, 5/5 attempts
gc mode (no GC.Collect — ambient allocation only) — Server GC SIGSEGV, 3/3 attempts
gc mode (no GC.Collect — ambient allocation only) — Workstation GC SIGSEGV, 3/3 attempts

Bottom line: synthetic tgkill is not required. Under normal .NET GC operation, CoreCLR fires its own INJECT_ACTIVATION_SIGNAL often enough at threads currently inside cgo that the race hits naturally. This happens with both Server GC and the default Workstation GC. Pure ambient allocation from the workers (no explicit GC.Collect()) is enough — meaning a real .NET app using a Go c-shared library can hit this in production whenever GC runs at a bad moment relative to a cgo call.

The crash is timing-sensitive enough that strace's per-syscall stop-and-log serialises the race away. Use gdb --args (or attach with gdb -p) to catch the signal live instead.

Analysing a crash

Apport silently drops cores for unpackaged binaries, and /proc/sys/kernel/core_pattern needs root to override. Easiest is to run under gdb:

gdb -batch -nx \
    -ex 'handle all nostop noprint pass' \
    -ex 'handle SIGSEGV stop print' \
    -ex 'run' \
    -ex 'gcore /tmp/repro.core' \
    -ex 'bt' \
    -ex 'info proc mappings' \
    -ex 'thread apply all bt 6' \
    -ex 'quit' \
    --args env LD_LIBRARY_PATH=. \
           ./bin/Release/net10.0/repro-dotnet

What the crash looks like (core examined 2026-04-21)

  • Faulting thread: .NET TP Worker.
  • PC: CoreCLR stack-probe prologue (movq $0, (%rsp) / sub $0x1000, %rsp) inside libcoreclr.so — part of the CoreCLR signal-handling dispatch chain, called from frame <signal handler called> on top of JIT'd managed code.
  • rsp, rbp, r15 all point into an 8-page gap (0x7ffff7731000 – 0x7ffff7739000) that is unmapped per info proc mappings. Adjacent low side is a recently-released /memfd:doublemapper (deleted) region.

Which signal, and who sends it?

The strace output in the CI crash investigation labelled the triggering signal SIGRT_2. That label is strace's convention — strace numbers RT signals from the kernel SIGRTMIN (= 32), and glibc reserves two of those (SIGCANCEL for pthread cancellation on 32, SIGSETXID for cross-thread setuid/setgid synchronisation on 33). So:

Label Kernel # Who owns it
SIGRT_0 32 glibc pthread (SIGCANCEL)
SIGRT_1 33 glibc (SIGSETXID)
SIGRT_2 34 glibc's public SIGRTMIN

Kernel signal 34 is the first RT signal userspace can freely use — which is why glibc exposes it as its public SIGRTMIN.

CoreCLR's PAL on Linux claims signal 34 for INJECT_ACTIVATION_SIGNAL. It's used for:

  • GC thread suspension. Before a GC can scan managed stacks it needs every managed thread parked at a safe point. It sends SIGRTMIN to each target; the handler parks the thread immediately (if at a safe point) or flags it to park at the next safe point.
  • Tiered-JIT re-dispatch. When a hot method is recompiled, other threads are activated via the same signal to pick up the new code address.
  • Debugger break / Thread.Interrupt-style cooperative interrupts.

See src/coreclr/pal/src/exception/signal.cppSEHInitializeSignals in the dotnet/runtime source (the activation handler) and InjectActivationInternal (the pthread_kill(thread, SIGRTMIN) call site).

So in the real CI crash, the SIGRT_2 in strace is CoreCLR sending its own activation signal at a .NET TP Worker that happened to be inside a cgo call to uplink-c at that moment. The investigation doc's earlier attribution — "Go uses SIGRT_2 for cooperative preemption scheduling" — was wrong. Go uses SIGURG (signal 23) for async preemption, not any RT signal.

The reproducer in this gist fires signal 34 explicitly from a dedicated thread so the race happens under a light synthetic load rather than needing a full GC-heavy xunit run to reach the same code path naturally.

Underlying cause

Two runtimes with incompatible signal-handling assumptions sharing an OS thread:

  • CoreCLR installs its signal handlers with SA_ONSTACK. Its SIGSEGV/activation chain includes stack-probe prologues that walk down thousands of bytes before the handler can decide whether to ignore / translate / forward the signal.
  • Go owns sigaltstack on any thread that has entered cgo. The per-M signal stack is 32 KB on Linux (malg(32 * 1024) in runtime/os_linux.go:mpreinit) — sized for Go's own handler, not CoreCLR's. Go also re-registers / disables that alt stack on every needm / dropm, i.e. on every single cgo call on a non-Go thread.

(Strace captures from CI additionally show a 16 KB sigaltstack at libc-heap addresses on .NET-owned threads — that's CoreCLR's own alt stack, separate from Go's. The reproducer's core dump shows an 8-page unmapped gap, which exactly matches Go's 32 KB gsignal region; the 16 KB one in the CI logs is CoreCLR-side.)

The race:

  1. A .NET TP Worker enters Go via P/Invoke; needm installs Go's 32 KB sigaltstack on the thread.
  2. A sibling thread fires SIGRTMIN+2 at it with tgkill.
  3. Kernel delivers on whatever sigaltstack the thread has registered — Go's 32 KB one.
  4. The signal isn't Go's to own, so Go's handler chains to CoreCLR's.
  5. CoreCLR's handler prologue does a multi-page stack probe that needs more than 32 KB.
  6. The probe walks off the end of the alt stack. Either it hits the guard page (SEGV_ACCERR — the CI strace signature) or an unmapped gap whose memory Go has already released in a concurrent dropm / needm cycle (SI_KERNEL, nested fault — the core dump signature shown above, with rsp 8 pages deep into unmapped VA right next to a freshly released memfd:doublemapper).
  7. The kernel can't deliver a second signal while the first handler is still on the broken alt stack → force_sig(SIGSEGV) with si_code=SI_KERNEL → process killed.

Two things have to be wrong simultaneously to crash:

  • Size mismatch. Go's 32 KB sigaltstack is too small for CoreCLR's handler.
  • Lifecycle race. Go enables / disables / recycles that alt stack around every cgo call, so even when the size is borderline OK there are windows where the kernel's view of the alt stack and the memory it actually points at disagree.

Previous mitigation attempts did not work because they were aimed at the wrong signal / layer:

  • GODEBUG=asyncpreemptoff=1 — turns off Go's SIGURG-based async preemption. But Go isn't sending the triggering signal in the first place; CoreCLR is, via its own INJECT_ACTIVATION_SIGNAL (signal 34). Silencing Go-side preemption leaves CoreCLR's activation path completely untouched.
  • Installing a 1 MB sigaltstack from managed code (SigStackFix.EnsureOnCurrentThread) — clobbered by Go's next needm on the same thread, which installs its own 32 KB alt stack via minitSignalStack.

Proposed fixes (Go side)

All paths below are in a local checkout of the Go source tree.

1. Enlarge the gsignal stack on Linux

src/runtime/os_linux.go:387-390:

func mpreinit(mp *m) {
    mp.gsignal = malg(32 * 1024) // Linux wants >= 2K
    mp.gsignal.m = mp
}

Bump to malg(128 * 1024) or malg(256 * 1024). Addresses the overflow half of the race and is cheap — one allocation per M, amortised across the process lifetime. Other Unix platforms (os_aix.go, os3_solaris.go, os_netbsd.go) already allocate 32 KB; giving Linux headroom for chained C-runtime handlers is defensible because Go is far more commonly shipped as c-shared on Linux than on those platforms.

2. Block signals around sigaltstack(SS_DISABLE) in unminitSignals

src/runtime/signal_unix.go:1370-1383:

func unminitSignals() {
    if getg().m.newSigstack {
        st := stackt{ss_flags: _SS_DISABLE}
        sigaltstack(&st, nil)
    } else {
        restoreGsignalStack(&getg().m.goSigStack)
    }
}

Wrap the sigaltstack(SS_DISABLE) call in a full sigprocmask block / unblock so no signal can be delivered between the state flip and the memory becoming reusable. dropm already sigblock(false)s before calling unminit, but false leaves _SigUnblock signals (SIGSEGV / SIGBUS / SIGFPE, and SIGURG when used for async preemption) free to arrive anyway. Using sigblock(true) here — or an explicit full sigset — closes that.

This narrows the race but doesn't fully close it: signal delivery is a multi-step kernel operation and a signal "in flight" before the block can still land. It's a meaningful improvement, not a fix.

3. Keep Go's sigaltstack installed for the whole OS thread lifetime

The root cause of the lifecycle race is that needm / dropm re-toggle the sigaltstack state on every cgo round trip. Stop toggling it:

  • minitSignalStack already has a branch for the common case where the thread already had an alt stack (e.g. CoreCLR's) — it just records it without reinstalling.
  • Add a symmetric branch for the case where Go did install its own: install it once the first time the OS thread enters Go, and leave it registered until the OS thread terminates. Change unminitSignals to not SS_DISABLE when newSigstack == true; move that disable into mexit (or the pthread-key destructor path) instead.

Trade-off: ~32–256 KB of permanently pinned per-thread memory, for ever-cgo'd threads. Eliminates the use-after-free half of the race entirely, because the memory can't be returned to the allocator while the kernel still thinks it's an alt stack for some thread.

4. Harden mexit — only free gsignal after confirming it's disabled

src/runtime/proc.go:2017-2029 unconditionally stackfree(mp.gsignal.stack) on mexit. A robust version:

  1. Block all signals on the current thread.
  2. sigaltstack(&disable, &oldss) — assert oldss matches mp.gsignal.stack.
  3. Read back with sigaltstack(nil, &check) — assert check.ss_flags & SS_DISABLE != 0.
  4. Only then stackfree(mp.gsignal.stack).

This turns a silent UAF into a clean throw if the kernel ever reports a pending alt-stack reference we didn't expect.

5. Place a guard page below every gsignal stack

Today Go's gsignal stack comes from stackalloc, which is Go's internal stack-pool allocator — adjacent pages may be other live allocations. An overflow corrupts them silently. Allocating gsignal stacks with an explicit PROT_NONE page below via mmap converts overflow into a clean SEGV_ACCERR at a known boundary and makes the class of bug diagnosable from a core dump alone.

What's the minimum viable upstream change

Fix #1 (size bump) by itself defangs the CI crash for the .NET + uplink-c case: CoreCLR's handler has enough room, so the probe never reaches the region where the lifecycle race could cause a UAF. It's a one-line patch and doesn't change any semantics.

The lifecycle race (#2, #3, #4) is the real underlying bug but needs design discussion on golang-dev before a patch is worth writing. Upstream-issue context that applies:

Alternative fix sites

Fixing this purely in Go isn't the only option:

  • CoreCLR could avoid chkstk-heavy paths when running on an externally-provided sigaltstack (detect via sigaltstack(nil, &st) at handler entry, switch to a CoreCLR-owned emergency stack before diving into managed signal dispatch).
  • Application code (uplink.NET here) could keep the two runtimes' threads strictly disjoint — never P/Invoke from a .NET TP Worker, always hop to a dedicated pool of threads whose sigaltstack state is under application control. Expensive and fragile.

C#-side mitigation (shim, REPRO_FIX=1)

sigstack_helper.c ships a small shim: ensure_large_sigaltstack() — a __thread-idempotent function that installs a 1 MiB sigaltstack (with a PROT_NONE guard page below) on the current thread the first time it's called. Enable it in the reproducer with REPRO_FIX=1 (or ./run.sh fix [signal|gc] [N]).

The mechanism: Go's minitSignalStack only installs its own alt stack when the current one is SS_DISABLE'd. By pre-installing a large stack on every .NET thread before it enters Go for the first time, Go takes the "use existing" branch, records our stack, and never touches sigaltstack again on that thread. Both halves of the race are closed:

  • Size mismatch — 1 MiB is vastly more than CoreCLR's handler needs.
  • Lifecycle race — we never free the memory (held until OS thread exit), so the kernel's sigaltstack pointer stays valid.

How to apply the shim in your own .NET project

The shim is two pieces: a tiny C file that builds to a shared library, and a P/Invoke declaration plus a Call() site in every managed→Go entry point.

1. Build the shim as a shared library.

Copy sigstack_helper.c into your project (or reference it from wherever you keep native tooling) and build it with:

cc -O2 -fPIC -shared -o libsigstack_helper.so sigstack_helper.c -lpthread

Ship the resulting libsigstack_helper.so next to your Go c-shared library (e.g. under src/uplink.NET/runtimes/linux-x64/native/ in the uplink.NET layout) so it's on LD_LIBRARY_PATH at runtime.

For other OSes / architectures, build with the matching compiler. The shim is Linux-only (it uses sigaltstack, __thread, mmap, MAP_STACK, mprotect); on Windows and macOS, skip it and conditionally do nothing in EnsureLargeSigaltstack.

2. Declare the P/Invoke in C#.

using System.Runtime.InteropServices;

internal static class SigStackFix
{
    [DllImport("sigstack_helper", EntryPoint = "ensure_large_sigaltstack")]
    public static extern void EnsureOnCurrentThread();
}

3. Call it on every code path that can reach Go, BEFORE the first Go call on that thread.

The previous SigStackFix.EnsureOnCurrentThread lived only in Access.AcquireProjectLease and the Access constructor — that's not enough. The shim is idempotent per thread (one __thread flag check after the first install), so it's cheap to sprinkle.

Recommended insertion points in a uplink.NET-shaped codebase:

  • Assembly module initializer (runs once per load, on whatever thread happens to load the assembly — covers the main thread):

    [System.Runtime.CompilerServices.ModuleInitializer]
    internal static void Init() => SigStackFix.EnsureOnCurrentThread();
  • Every public method of every P/Invoke-facing classAccess, BucketService, ObjectService, UploadOperation, DownloadOperation, etc. First line of each public method:

    public async Task UploadAsync(...) {
        SigStackFix.EnsureOnCurrentThread();
        // ... existing body ...
    }
  • Every Task.Run / Task.Factory.StartNew lambda that enters uplink, because those can hop to a fresh TP Worker that no earlier call has shim'd:

    Task.Run(() => {
        SigStackFix.EnsureOnCurrentThread();
        // ... cgo-reaching body ...
    });
  • IAsyncDisposable.DisposeAsync / IDisposable.Dispose on any type that owns a native handle, because the finalizer thread is a separate CoreCLR-created thread that the initializer never ran on.

4. Verify coverage with strace (one-time sanity check):

strace -f -e trace=sigaltstack ./your_app 2>&1 | grep "ss_size="

Every thread that ever enters Go should show exactly one sigaltstack({ss_sp=0x..., ss_size=1048576}, ...) install and nothing else. If you see ss_size=32768 — that's Go's 32 KB stack — you missed a P/Invoke entry point; add EnsureOnCurrentThread() there.

5. (Optional) Tune the stack size.

Override the default 1 MiB at compile time:

cc -O2 -fPIC -shared -DLARGE_SIGSTACK_SIZE=$((256*1024)) \
   -o libsigstack_helper.so sigstack_helper.c -lpthread

1 MiB is comfortably above anything CoreCLR's handler is likely to need; 256 KiB is a reasonable lower bound if you're tight on VM and will never use the debugger-attach or large-object paths.

What this doesn't fix.

Only the sigaltstack-overflow / UAF crash class. The shim cannot address a different CoreCLR-internal bug we occasionally see under extreme synthetic load (the mov (%rax), %ecx libcoreclr crash described below). If that one shows up in your production workload you'll need a CoreCLR-side or Go-side fix — see the "Proposed fixes" section above.

Measured effectiveness (dev box, 20 runs each)

Scenario No fix With fix
signal mode (tgkill every 50 µs, 32 workers, 500 k iters) 0/20 16/20
gc mode, ambient allocation only (32 workers, 64 KB/call) 0/20 0/20

The shim completely eliminates the original sigaltstack-overflow crash — caught in gdb, the faulting thread always has rsp inside an unmapped gap adjacent to Go's 32 KB gsignal region, and the backtrace shows <signal handler called> just above CoreCLR's chkstk prologue. With the shim enabled, that specific signature never appears in the gdb output.

Under extreme synthetic load (32 workers at 1 M pings/s, or 20 kHz tgkill flooding), a different crash fires: a CoreCLR-internal mov (%rax), %ecx pointer dereference on a non-worker CoreCLR thread, no <signal handler called> frame, backtrace is pure libcoreclr from clone3. This is a separate CoreCLR issue unrelated to sigaltstack — our sigstack shim can't prevent it because it isn't a signal-handler overflow. It looks like an allocation/thread-state race inside CoreCLR that only surfaces when the managed heap is churning hard while many threads are bouncing in and out of cgo.

Whether real-world uplink.NET workloads ever reach the second threshold is open. Our practical expectation is that the shim fully closes the "test host process crashed" case reported in the CI investigation — its crash signature (SEGV_ACCERR at a sigaltstack guard page, or SI_KERNEL nested fault on an unmapped gap) matches the overflow case the shim addresses, not the second CoreCLR-internal crash we only see under synthetic stress.

Why the earlier SigStackFix.EnsureOnCurrentThread didn't work

Per the investigation doc the shim was only called from Access.AcquireProjectLease and the Access constructor. Any cgo entry point on a thread that hadn't yet gone through those call sites — e.g. a .NET TP Worker picking up a new task before the application has ever constructed an Access on it — would see a fresh thread. needm would then install Go's 32 KB alt stack and the race could trigger.

For the shim to be fully effective every P/Invoke entry point that reaches Go must run ensure_large_sigaltstack() first. In this reproducer that means both Main() and the first line of every worker lambda. In a real codebase the same principle applies: wrap every managed→Go call site, or LD_PRELOAD a pthread_create interceptor that installs the alt stack on every new thread before user code runs.

module repro-sigaltstack
go 1.25
package main
import "C"
// ping is a trivial cgo entry point. Each call from a non-Go thread
// triggers needm (acquire M + install sigaltstack) on entry and dropm
// (disable sigaltstack + release M) on return. Rapid calls from many
// pthreads create the sigaltstack lifecycle churn needed to hit the race.
//
//export ping
func ping() C.int { return 42 }
func main() {}
// Minimal .NET host that mirrors the C reproducer in
// scripts/repro-sigaltstack/ but runs the workers on the CoreCLR
// threadpool. Keeping a CoreCLR runtime loaded alongside the Go
// c-shared library is a closer match to the real crash environment
// (xunit test host with cgo P/Invokes).
//
// Two modes:
//
// REPRO_MODE=signal (default)
// A dedicated thread fires kernel signal 34 (= glibc SIGRTMIN =
// CoreCLR PAL's INJECT_ACTIVATION_SIGNAL) at every other thread
// every REPRO_INTERVAL_US microseconds. strace labels this signal
// "SIGRT_2" (kernel-SIGRTMIN-relative: glibc reserves 32/33 for
// pthread cancel & setxid, so 34 is glibc's public SIGRTMIN).
// This synthesises what CoreCLR's GC / tiered-JIT machinery fires
// naturally. Most reliable way to reproduce.
//
// REPRO_MODE=gc
// No synthetic signal sender. Each worker allocates a burst of
// garbage between Ping() calls, and a dedicated thread forces
// GC.Collect() at a high rate. The idea: let CoreCLR's own GC
// fire INJECT_ACTIVATION_SIGNAL at the TP Workers while they're
// inside cgo, no libc signalling from us. Answers the question
// "does it reproduce under realistic GC pressure alone?".
// Pair with Server GC in runtimeconfig for max pressure.
//
// The original investigation doc attributed the signal to Go's
// cooperative preemption — that was wrong. Go uses SIGURG (signal 23)
// for async preemption, not any RT signal.
//
// Build + run:
// cd scripts/repro-dotnet
// CGO_ENABLED=1 go build -buildmode=c-shared -o libgolib.so golib.go
// dotnet build -c Release
// LD_LIBRARY_PATH=. ./bin/Release/net10.0/repro-dotnet
//
// Tunables via env vars:
// REPRO_MODE — "signal" (default) or "gc"
// REPRO_WORKERS — concurrent worker tasks (default: 32)
// REPRO_ITERATIONS — ping calls per worker (default: 1000000)
// REPRO_INTERVAL_US — signal / GC.Collect interval (default: 50)
// REPRO_ALLOC_BYTES — garbage allocated per ping in gc mode
// (default: 16384)
using System;
using System.Diagnostics;
using System.Linq;
using System.Runtime;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
internal static class Native
{
[DllImport("golib", EntryPoint = "ping")]
public static extern int Ping();
[DllImport("libc", EntryPoint = "tgkill")]
public static extern int Tgkill(int tgid, int tid, int sig);
[DllImport("libc", EntryPoint = "getpid")]
public static extern int Getpid();
[DllImport("libc", EntryPoint = "syscall")]
public static extern long Syscall(long number);
[DllImport("sigstack_helper", EntryPoint = "ensure_large_sigaltstack")]
public static extern void EnsureLargeSigaltstack();
}
internal static class Program
{
private const int SYS_GETTID = 186; // x86_64
// Kernel signal 34 = glibc SIGRTMIN = CoreCLR PAL's
// INJECT_ACTIVATION_SIGNAL (GC thread suspension, JIT patching,
// debugger activation). strace labels it SIGRT_2.
private const int CoreClrActivationSignal = 34;
private static volatile bool s_running = true;
public static int Main()
{
var mode = (Environment.GetEnvironmentVariable("REPRO_MODE") ?? "signal").ToLowerInvariant();
var workers = GetIntEnv("REPRO_WORKERS", 32);
var iters = GetIntEnv("REPRO_ITERATIONS", 1_000_000);
var intervalUs = GetIntEnv("REPRO_INTERVAL_US", 50);
var allocBytes = GetIntEnv("REPRO_ALLOC_BYTES", 16 * 1024);
var useFix = Environment.GetEnvironmentVariable("REPRO_FIX") == "1";
Console.Error.WriteLine(
$"[dotnet-repro] mode={mode} workers={workers} iters={iters} "
+ $"interval={intervalUs}µs gc={GCSettings.IsServerGC} "
+ $"fix={useFix} pid={Environment.ProcessId}");
if (useFix) Native.EnsureLargeSigaltstack(); // main thread
Native.Ping(); // warm cgo
Thread? driver = mode switch
{
"signal" => StartSignalSender(intervalUs),
"gc" => StartGcDriver(intervalUs),
_ => throw new ArgumentException($"unknown REPRO_MODE={mode}"),
};
var tasks = new Task[workers];
for (int i = 0; i < workers; i++)
{
tasks[i] = Task.Run(() =>
{
// Install the large sigaltstack BEFORE the first Ping() on
// this threadpool thread. Go's minitSignalStack will see it
// on needm and not install its own 32 KB stack.
if (useFix) Native.EnsureLargeSigaltstack();
for (int k = 0; k < iters; k++)
{
if (Native.Ping() != 42)
throw new Exception("ping returned unexpected value");
if (mode == "gc")
GenerateGarbage(allocBytes);
}
});
}
Task.WaitAll(tasks);
s_running = false;
driver?.Join();
Console.Error.WriteLine("[dotnet-repro] PASS");
return 0;
}
// Signal-sender thread: fires the CoreCLR activation signal at
// every other thread in the process. In a real .NET process this
// would be CoreCLR's own GC / tiered JIT machinery; we fire it
// explicitly so the race happens under a light synthetic load.
private static Thread StartSignalSender(int intervalUs)
{
var t = new Thread(() =>
{
int myTid = (int)Native.Syscall(SYS_GETTID);
int pid = Native.Getpid();
while (s_running)
{
try
{
foreach (var proc in Process.GetCurrentProcess().Threads.Cast<ProcessThread>())
{
if (proc.Id == myTid) continue;
Native.Tgkill(pid, proc.Id, CoreClrActivationSignal);
}
}
catch { /* thread list churns under contention */ }
Thread.Sleep(TimeSpan.FromMicroseconds(intervalUs));
}
}) { IsBackground = true, Name = "activation-sender" };
t.Start();
return t;
}
// GC driver: forces CoreCLR to do full-blocking GCs at a high rate
// so its INJECT_ACTIVATION_SIGNAL path fires "naturally" at the TP
// Workers. No libc-level tgkill from us.
private static Thread StartGcDriver(int intervalUs)
{
var t = new Thread(() =>
{
while (s_running)
{
// Mode=Forced guarantees a blocking, thread-suspending
// collection rather than a background/concurrent one —
// this is the path that needs to park every thread
// (including ones currently in cgo), which is the
// INJECT_ACTIVATION_SIGNAL code path we want exercised.
GC.Collect(2, GCCollectionMode.Forced, blocking: true);
Thread.Sleep(TimeSpan.FromMicroseconds(intervalUs));
}
}) { IsBackground = true, Name = "gc-driver" };
t.Start();
return t;
}
// Burn `bytes` worth of short-lived allocations to keep GC busy
// between cgo calls.
private static void GenerateGarbage(int bytes)
{
// A mix of arrays of different element types so the allocator
// touches multiple heap regions and promotion patterns.
var a = new byte[bytes];
var b = new int[bytes / 4];
var c = new object[bytes / 64];
for (int i = 0; i < c.Length; i++) c[i] = new string('x', 8);
GC.KeepAlive(a);
GC.KeepAlive(b);
GC.KeepAlive(c);
}
private static int GetIntEnv(string name, int def)
{
var s = Environment.GetEnvironmentVariable(name);
return int.TryParse(s, out var v) && v > 0 ? v : def;
}
}
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<LangVersion>latest</LangVersion>
<RootNamespace>repro_dotnet</RootNamespace>
<AssemblyName>repro-dotnet</AssemblyName>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<!-- Server GC so GC suspension uses more threads and fires
INJECT_ACTIVATION_SIGNAL more aggressively; relevant for
REPRO_MODE=gc. -->
<ServerGarbageCollection>true</ServerGarbageCollection>
<ConcurrentGarbageCollection>false</ConcurrentGarbageCollection>
</PropertyGroup>
</Project>
#!/usr/bin/env bash
# Build and run the .NET + Go cgo sigaltstack-race reproducer.
#
# Usage:
# ./run.sh Build once, run until it crashes (max 10 attempts).
# ./run.sh build Build only.
# ./run.sh run Run once (assumes already built).
# ./run.sh gc Run in GC-pressure mode (no synthetic signals —
# let CoreCLR's own GC fire the activation signal).
# ./run.sh gdb Build, then run under gdb and dump a core at crash.
# ./run.sh loop [N] Run N times in a row (default 10) and report rc.
#
# Requirements: Go (1.25+), .NET SDK (10.0+), gcc, gdb (for `gdb` mode).
set -euo pipefail
cd "$(dirname "$0")"
DOTNET_BIN="./bin/Release/net10.0/repro-dotnet"
build() {
echo "=== building libgolib.so (Go c-shared) ==="
CGO_ENABLED=1 go build -buildmode=c-shared -o libgolib.so golib.go
echo "=== building libsigstack_helper.so (C#-side fix shim) ==="
cc -O2 -fPIC -shared -o libsigstack_helper.so sigstack_helper.c -lpthread
echo "=== building .NET host ==="
DOTNET_CLI_HOME=/tmp DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 \
dotnet build -c Release --nologo -v quiet
}
run_once() {
LD_LIBRARY_PATH=. DOTNET_CLI_HOME=/tmp "$DOTNET_BIN" "$@"
}
run_loop() {
local max=${1:-10}
local attempt rc
for attempt in $(seq 1 "$max"); do
echo "--- attempt $attempt ---"
set +e
timeout 60 env LD_LIBRARY_PATH=. DOTNET_CLI_HOME=/tmp "$DOTNET_BIN"
rc=$?
set -e
echo "exit=$rc"
if [[ $rc -eq 139 ]]; then
echo "=== SIGSEGV on attempt $attempt — reproduced ==="
return 0
fi
done
echo "=== no crash in $max attempts ==="
return 1
}
run_gdb() {
mkdir -p ./crash
gdb -batch -nx \
-ex 'set pagination off' \
-ex 'handle all nostop noprint pass' \
-ex 'handle SIGSEGV stop print' \
-ex 'run' \
-ex 'printf "\n===== CRASHED =====\n"' \
-ex 'thread' \
-ex 'info registers rip rsp rbp' \
-ex 'x/4i $rip' \
-ex 'bt 20' \
-ex 'gcore ./crash/core' \
-ex 'info proc mappings' \
-ex 'thread apply all bt 6' \
-ex 'quit' \
--args env LD_LIBRARY_PATH=. DOTNET_CLI_HOME=/tmp "$DOTNET_BIN" \
2>&1 | tee ./crash/gdb.log
echo
echo "core: ./crash/core (re-analyse with: gdb $DOTNET_BIN ./crash/core)"
}
cmd=${1:-default}
case "$cmd" in
build) build ;;
run) run_once ;;
gc) build; REPRO_MODE=gc run_loop "${2:-10}" ;;
fix) build; REPRO_FIX=1 REPRO_MODE="${2:-signal}" run_loop "${3:-10}" ;;
loop) run_loop "${2:-10}" ;;
gdb) build; run_gdb ;;
default) build; run_loop 10 ;;
*) echo "usage: $0 [build|run|gc [N]|fix [signal|gc] [N]|loop [N]|gdb]" >&2; exit 2 ;;
esac
/*
* Per-thread "large sigaltstack" shim for the .NET + Go cgo sigaltstack
* crash.
*
* The race we're avoiding:
* - Go's needm installs its own 32 KB sigaltstack on every non-Go
* thread that enters cgo.
* - CoreCLR's signal handler (for SIGRTMIN / INJECT_ACTIVATION_SIGNAL)
* needs more than 32 KB and/or Go's sigaltstack lifecycle
* (dropm -> SS_DISABLE -> memory recycled) races with signal
* delivery, producing SIGSEGV.
*
* The shim: install a large (default 1 MiB) sigaltstack on every thread
* BEFORE it first calls into Go. When Go's minitSignalStack later reads
* the current sigaltstack state, it sees an existing stack and takes
* the "use existing" branch — it never installs its own 32 KB stack,
* and never SS_DISABLEs on dropm. This closes both halves of the race:
*
* 1. Size mismatch — our stack is 1 MiB, way more than CoreCLR needs.
* 2. Lifecycle race — we never free the memory (held until thread
* exit), so the kernel's sigaltstack pointer is always valid.
*
* Usage:
* cc -O2 -fPIC -shared -o libsigstack_helper.so sigstack_helper.c -lpthread
*
* // In C# — call ONCE per thread, before any cgo P/Invoke on that
* // thread. Safe to call from any thread, cheap after the first call
* // (one TLS read).
* [DllImport("sigstack_helper")]
* static extern void ensure_large_sigaltstack();
*/
#define _GNU_SOURCE
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#ifndef LARGE_SIGSTACK_SIZE
#define LARGE_SIGSTACK_SIZE (1 * 1024 * 1024) /* 1 MiB */
#endif
/* Per-thread flag — set once the current thread has a shim stack
* installed. Using __thread avoids pthread_key ceremony; it's also
* zero-initialised so the first access on a new thread is naturally
* "not installed". */
static __thread int large_sigstack_installed;
static __thread void* large_sigstack_base;
/*
* Install a large sigaltstack on the current thread if it doesn't
* already have one big enough. Idempotent per thread.
*
* We intentionally do NOT free the backing memory when the thread
* exits — the stack is held for the OS thread's lifetime. Under
* threadpool reuse this means the same memory is used across many
* logical work items, which is fine and exactly what we want.
*/
void ensure_large_sigaltstack(void) {
if (large_sigstack_installed) return;
stack_t cur;
if (sigaltstack(NULL, &cur) != 0) {
/* Very unlikely on Linux; leave the thread as-is. */
fprintf(stderr, "ensure_large_sigaltstack: sigaltstack(query) failed: %s\n",
strerror(errno));
return;
}
/* If the thread already has a sufficiently large alt stack (e.g.
* someone else installed one), leave it. This lets CoreCLR's own
* alt stack (if it already put one on the thread) stay in place
* too. */
if ((cur.ss_flags & SS_DISABLE) == 0 && cur.ss_size >= LARGE_SIGSTACK_SIZE) {
large_sigstack_installed = 1;
return;
}
/* Allocate: 1 page guard + LARGE_SIGSTACK_SIZE usable.
* mmap with PROT_NONE below the stack turns overflow into a
* clean SEGV_ACCERR at a known boundary. */
long pagesize = sysconf(_SC_PAGESIZE);
size_t total = (size_t)pagesize + LARGE_SIGSTACK_SIZE;
void* base = mmap(NULL, total, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
if (base == MAP_FAILED) {
fprintf(stderr, "ensure_large_sigaltstack: mmap failed: %s\n",
strerror(errno));
return;
}
/* Lower 1 page = guard. */
if (mprotect(base, (size_t)pagesize, PROT_NONE) != 0) {
fprintf(stderr, "ensure_large_sigaltstack: mprotect(guard) failed: %s\n",
strerror(errno));
/* Not fatal — keep going with the non-guarded stack. */
}
stack_t ss = {
.ss_sp = (char*)base + pagesize,
.ss_flags = 0,
.ss_size = LARGE_SIGSTACK_SIZE,
};
if (sigaltstack(&ss, NULL) != 0) {
fprintf(stderr, "ensure_large_sigaltstack: sigaltstack(install) failed: %s\n",
strerror(errno));
munmap(base, total);
return;
}
large_sigstack_base = base;
large_sigstack_installed = 1;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment