Skip to content

Instantly share code, notes, and snippets.

@NinoFloris
Created December 25, 2024 14:28
Show Gist options
  • Save NinoFloris/f08ff60ad9d6326b1cfadcc45c53aa94 to your computer and use it in GitHub Desktop.
Save NinoFloris/f08ff60ad9d6326b1cfadcc45c53aa94 to your computer and use it in GitHub Desktop.
Threading and I/O developments

Background

There has been a renewed interest in architectural designs for threading and I/O in the last few years. Thread-per-core architectures and io_uring being of particular relevance. Popular in high throughput, low latency applications like web servers and databases, areas .NET is good at these days.

While Rust is generally taking the lead here - both in activity and performance - authors of similar software in .NET (e.g. Kestrel/ASP.NET Core/YARP, Garnet, Orleans, RavenDB) looking to experiment or offer these architectures to their users would quickly find it infeasible today due to a few simple realities: tight coupling to the thread pool in the async and I/O internals, and a strong doctrine of ConfigureAwait(false) use in libraries.

These prevent alternative designs from maintaining workload affinity. Even if I/O is not a concern and ConfigureAwait(false) would not exist, achieving significant performance gains may still be challenging. TaskSchedulers and SynchronizationContexts are generally used to give frameworks control over where *application code* runs. As they are the uncommon case they come with a performance cost: captures, slow paths, allocations, forced dispatching; not suitable for running all code of a high throughput program in.

TL;DR

Any experiment in the area of threading and I/O needs a custom stack stretching end-to-end, avoiding async code in the BCL and libraries that could de-affinitize it. What's missing is a low overhead mechanism to control where libraries (and ideally much of the BCL's) code runs, selectively overriding the ThreadPool as the default scheduling target.

I've prototyped a narrow set of abstractions and an implementation for the existing .NET APIs in a core project, small enough it could be upstreamed. On top of this sits a stack of independent pieces:

  • Main implementation of core abstractions: contains the thread-per-core showcase, and an alternative I/O system (prototype is limited to sockets, io_uring needs exploring).
  • ASP.NET Core integration: a production quality kestrel transport using the core abstractions, adding affinity selection for new client sockets, and framework support for affinitized DI services.
  • Client-server protocol toolkit: support for the core abstractions, a pipeline datastructure (as if two System.Threading.Channels are in sequence and fused together), and abstractions for modeling client-server protocol interactions more easily.
  • Postgres driver: first class command batching and pipelining support, also includes cancellation, timeouts, and error handling to remove any unfair performance advantage.

A TechEmpower benchmark ties all of this together to track how we're doing. For a platform fortunes run it comes in ~5% below Ntex Tokio (third place, former Actix author). For micro fortunes (minimal apis) we're competing with Vert.x and Drogon, and doing so with a single 250 line C# file. As far as missing pieces goes, this is the one for TE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment