Skip to content

Instantly share code, notes, and snippets.

@EgorBo
Last active January 11, 2022 02:59
Show Gist options
  • Save EgorBo/33774251f97fab72a6e6f366c13cfbe9 to your computer and use it in GitHub Desktop.
Save EgorBo/33774251f97fab72a6e6f366c13cfbe9 to your computer and use it in GitHub Desktop.
TE_Inline_Cont_Sockets.md

DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1 noticeably improves simple TE benchmarks such as the following ones on all UNIX archs. From my understanding, it avoids dispatching from the event-thread to threadpool and does the work in the same thread it got request from.

TE Benchmark Baseline, RPS MyTest, RPS diff, %
ARM64 Platform-JSON PGO 661,663 778,925 +17.72%
ARM64 Platform-Caching PGO 186,188 218,004 +17.09%
ARM64 Platform-Plaintext PGO 6,933,964 7,563,428 +9.08%
x64 Platform-JSON PGO 1,299,388 1,432,200 +10.22%
x64 Platform-Caching PGO 413,123 445,144 +7.75%
x64 Platform-Plaintext PGO 12,529,587 13,137,836 +4.85%

(+17% on arm64 seems to be a sign that something can be improved on it, e.g. Threads-per-engine heuristic, or SpinWait params?)

However, it most likely regresses pretty much anything more complicated than "receive a tiny request and immediately send something back":

TE Benchmark Baseline, RPS MyTest, RPS diff, %
ARM64 Platform-Fortunes PGO 88,765 51,648 -41.81%
x64 Platform-Fortunes PGO 494,777 410,766 -16.98%

Can we do a sort of PGO (static or dynamic) but on managed level to adapt to users' workloads dynamicly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment