TE_Inline_Cont_Sockets.md

DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1 noticeably improves simple TE benchmarks such as the following ones on all UNIX archs. From my understanding, it avoids dispatching from the event-thread to threadpool and does the work in the same thread it got request from.

TE Benchmark	Baseline, RPS	MyTest, RPS	diff, %
ARM64 Platform-JSON PGO	661,663	778,925	+17.72%
ARM64 Platform-Caching PGO	186,188	218,004	+17.09%
ARM64 Platform-Plaintext PGO	6,933,964	7,563,428	+9.08%

x64 Platform-JSON PGO	1,299,388	1,432,200	+10.22%
x64 Platform-Caching PGO	413,123	445,144	+7.75%
x64 Platform-Plaintext PGO	12,529,587	13,137,836	+4.85%

(+17% on arm64 seems to be a sign that something can be improved on it, e.g. Threads-per-engine heuristic, or SpinWait params?)

However, it most likely regresses pretty much anything more complicated than "receive a tiny request and immediately send something back":

TE Benchmark	Baseline, RPS	MyTest, RPS	diff, %
ARM64 Platform-Fortunes PGO	88,765	51,648	-41.81%

x64 Platform-Fortunes PGO	494,777	410,766	-16.98%

Can we do a sort of PGO (static or dynamic) but on managed level to adapt to users' workloads dynamicly?

EgorBo/TE_Inline_Cont_Sockets.md