- Watch -> $(exception).InnerException ...
- First time ext - break on any CLR ext
- DebugerDisplay vs ToString()
- Just my Code -> uncheck to troubleshoot timer tick
- Threads window
- Debuging > Smbols > MS Sym Serv
- methods/types metadata in the .net assembly
- debug several apps at the same time via attach
- Debug > Windows > Parallel Stacks -> righe click + show external code
- ctrl+f10
- Modules view -> Optimized? then no brk pts
- Debug.WriteLine() -> sysinternals DebugView
- assigning $id to an object -> $id.GetHashCode() in the watch view
- Add function breakpoint -> point to .NET function, add condition this = $1 where $1 is our member variable we assined Object ID to
- use Managed COmpatibility Mode when debugger fails
- Commit Size FTW, Working Set sux
- Task Mgr, Resource Monitor, PerfMon
- Priv Byytes = Commited by Native + Commited by managed code
- Process Explorer -> Prov Bytes (Native) + Priv Bytes History (Native) + Heap Bytes (.NET)
- VMMap -> Managed Heap -> GC -> ... each gen has *cores of memo segments allocated
- dumpbin - gathers dump, process explorer fails at it for some reason
- You can load memo dump post mortem with VS Enterprise -> Debug Managed Memory, normal VS will just show u
- Use windbg+sos to find memo leaks: have two procdumps -> load em -> do
!dumpheap -stat
-> find obj type that increased -> find obj address -> find call stack via!mroot <addr>
- PerfView can load heap snapshots from crash dumps, then diff 'em and show you which obj leaked + callstack
- dotTrace | perfView | vs profiler
- ? (first 30 min)
- CI & CD pipeline using Github actions & secrets
- gRPC for .NET Core - contract-first API development, using Protocol Buffers (.proto) by default, reduced network usage with binary serialization, allowing for language agnostic implementations, autogenerates Base classes, tools available for many languages to generate strongly-typed servers and clients from *.proto files.
- .NET Core 3.0 has new type of application template called Worker Service Worker, a starting point for long running background processes like win services or linux daemons.
dotnet new worker
- Azure Cognitive services - realtime crowd insights, ink and form recognizer!
- Power BI - big data analysis (AI) and visualization
- Microsoft Flow - turn repetitive tasks into multistep workflows
- Azure Data Factory (doc) - build hybrid ETL and ELT pipelines via visual environment, serverless cloud data integration tool that scales on demand, pipeline runs by creating and scheduling triggers
- M365
- Always start with benchmark
- Optimize using classic techniques first:
- Try to have fewer heap allocations
- Avoid layer of indirection: wrappers around arrays, lists, matrices
- Avoid LINQ
- Balance performance, readability, maintenance effort, composition,
Sum
,Min
,Max
,Count
etc without predicates are a waste of resources
- Better data structures for your problem
- Vectorization in .NET Core 3.0:
System.Numerics
for SIMD,Vector<T>
is hardware dependent (256bit on AVX2 machines),Vector2...Vector4
andMatrix4x4
for specialized operations, backed by JIT intrinsics with software fallbackVector.AsSpan()
: TODO
System.Runtime.Intrinsics.*
: expose hardware instructions directly, for maximum performanceBitOperations
with software fallbackAvx.IsSupported
Avx.LoadAlignedVector256
+Avx.MoveMask
- Env variables can disable some instruction set architectures:
COMPlus_EnableAVX2=0
,COMPlus_EnableSSE41=0
- Going
unsafe
+ AVX2 can have significant improvement: 80% and upunsafe{ fixed(float* valuePtr = values){ for(int i=0; i < vectorizableLen; i+=Vector256<float>.Count){ var valuesVector = Avx.LoadVector256(valuePtr+i); // LoadAlignedVector256 tempVector = Avx.Add(tempVector, valuesVector); } } }
- TODO lab: Sum - naive vs linq vs
System.Numerics
vsAvx
vsAvx
AlignedPipelined - Check Intel Intrisics Guide: AVX + AVX2
using var client = new HttpClient()
- once conn is closed the socket is still unavailable for 240s (netstat -n| where { $_ -like '*TIME_WAIT*'}
) > we may exhaust the connection poolservices.AddSingleton<HttpClient>()
- no DNS refreshHttpClientFactory
+Polly
- Success!services.AddHttpClient()
- named client:
services.AddHttpClient("nova1", c => { c.BaseAddress = "..."; })
+factory.CreateClient("nova1");
- typed client:
services.AddHttpClient<INovaClient, NovaClient>()
+public NovaClient(HttpClient httpClient){ ... }
- delegating handlers:
services.AddHttpClient<...>(...).AddHttpMessageHandler<SomeFailureMetricHandler>();
- Polly: resilience and transient-fault-handling - retry, circuit brekers, timeouts, bulkhead isolation, cache, fallback, policywrap - handle 5XX and 408 via
.AddTransientHttpErrorPolicy(b => b.WaitAndRetryAsync(3, retryCount => TimeSpan.FromSeconds(Math.Pow(2, retryCount))));
- configuring primary handler:
services.AddHttpClient<INovaClient, NovaClient>().ConfogurePrimaryHttpMessageHandler(() => { var handler = new HttpClientHandler{ ClientCertificateOptions = ClientCertificateOption.Manual, SslProtocols = SslProtocols.Tls12, AllowAutoRedirect = false }; handler.ClientCertificates.Add(new X509Certificate2("KldDst.crt")); return handler; });
- tips & tricks
// don't var data = await response.Content.ReadAsStringAsync(); return JsonConvert.DeserializeObject<IEnumerable<SomeModel>>(data); // do: return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>();
var response = await client.SendAsync(request); // don't return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>(); // do: return (response.IsSuccessStatusCode) ? await response.Content.ReadAsAsync<IEnumerable<SomeModel>>() : Array.Empty<SomeModel>();
[HttpGet] public async Task<ActionResult<IEnumerable<SomeModel>>> GetSth(CancellationToken ct) => client.GetSthAsync(ct); //... var response = await client.SendAsync(request, ct); // pass cancellation tokens! return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>(ct);
var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, ct); try { return (response.IsSuccessStatusCode) ? await response.Content.ReadAsAsync<IEnumerable<SomeModel>>() : Array.Empty<SomeModel>(); } finally { response.Dispose(); } // u have to!
- When to dispose:
HttpClient
viaIHttpClientFactory
- no effectHttpClient
w/oIHttpClientFactory
- never!HttpRequestMessage
- only has an effect if sendingStreamContent
HttpResponseMessage
- no evvect unless usingResponseHeadersRead
- Using
SocketsHttpHandler
:new HttpClient(new SocketsHttpHandler{ MaxConnectionsPerServer=5 });
- Http/2 is supported, added for gRPC
- ETW and CLR Events: better performance
- https://github.com/chrisnas/clrevents
- Kernel logging system - very low impact on prod
- .NET Core 3.0 and Event Pipes: new tooling, cross-platform alternative to ETW
dotnet tool list -g
,dotnet tool update <dotnet-xxx> -g
- do
dotnet trace
and then open.nettrace
file with perfviewdotnet trace convert -format speedscope <logs>
and then open with speedscope.app visualizer (CPU Flame Graphs)
- do
dotnet counters monitor
for health monitoring or performance investigation - do
dotnet dump collect
to collect anddotnet dump analyze
to analyze Windows and Linux dumps all without any native debugger involved, it will allow you to run SOS commands to analyze crashes and the GC
dotnet-dump | dotnet-counters | dotnet-trace
- native thread memory footprint: user mode stack by default has 1 MB, kernel mode has 12/24 KB
- user mode has impersonation info, security context, Thread Local Storage
notepad.exe
have 22 threads right after the start- native thread (system) != managed thread (.NET): even ID is independent of native thread ID
- unhandled exception kills the application in most cases
ThreadPool
is static and is different from Win32 thread pool: used by tasks, asynchronous timers, wait handles andThreadPool.QueueUserWorkItem
.- threads work in background, do not clear the TLS, have default stack size and default priority
- one
ThreadPool
per process - two types of threads: for ordinary callbacks and for I/O operations
- thrown
Exception
is held until awaiting and then propagated if possible (thrown out of band forasync void
)
await
able type must be able to returnGetAwaiter()
with the following: implementsINotifyCompletion
,bool IsCompleted { get; }
,TResult GetResult()
async
means nothing, it only instructs the compiler to create a state machine- we can make any type awaitable using extension methods
await 9000; //... public static class AwaitableInt{ public static TaskAwaiter(this int ms) => TaskDelay(ms).GetAwaiter(); }
Task
splits into two types:- cpu-bound delegate task: TPL world, has code to run, can be scheduled and executed
- i/o-bound promise task: async world, signals completion of sth,
Task.FromResult()
,Task.Delay()
,Task.Tield()
- do not use
Task
s ctor, better doTask.Run()
orTask.Factory.StartNew()
or PLINQ TaskScheduler
schedules tasks on the threads, it makes sure that the work of a task is eventually executed- for TPL and PLINQ is based on the thread pool
- supports work-stealing, thread injection/retirement and fairness
- two types of queues: global for top level tasks, local for nested/child tasks, accessed in LIFO order
- long running tasks are handled separately, do not go via global/local queue
Task.ContinueWith
creates a continuation that executes asynchronously when the target task complets- can specify:
CancellationToken
,TaskScheduler
andTaskContinuationOptions
AttachedToParent
to create hierarchy of tasksExecuteSynchronously
,RunContinuationAsynchronously
to choose the thread running itHideScheduler
to run using the default scheduler instead of the current oneLongRunning
more or less to run on dedicated thread
- can specify:
Task.Id
is independent fromTaskScheduler.Id
, is being reused causing collision if storedTask
is a class so it is allocated on the heap and needs to be collected by GCValueTask<T>
is a struct and is allocated on the stackExecutionContext
is a bag holding logical context of the execution- contains
SynchronizationContext
,LogicalCallContext
,SecurityContext
,HostExecutionContext
,CallContext
etc - is passed correctly through asynchronous points -> will follow to the other thread
- methods with
Unsafe*
prefix do not propagate the context, for instanceThreadPool.UnsafeQueueUserWorkItem
- starting in .NET 4.6 there is an
AsyncLocal<T>
class working asThreadLocal<T>
(TLS variables) for tasks - TODO lab:
AsyncLocal<T>
vsThreadLocal<T>
- contains
SynchronizationContext
is a base class that provides a free-threaded context with no synchronizationOperationStarted
andOperationCompleted
handle notificationsSend
andPost
synchronous messageCurrent
gets synchronization context for the thread- for UI thread it is UI context, mostly implemented via event loop:
WindowsFormsSynchronizationContext
,DispatcherSynchronizationContext
,WinRTSynchronizationContext
,WinRTCoreDispatcherBasedSynchronizationContext
- for ASP.NET request it is ASP.NET context:
AspNetSynchronizationContext
where thread can be different than original one, but still the request context is the same - for ASP.NET Core there is no spearate context, thus no risk of deadlock, no need to use
ConfigureAwait(false)
- if
null
then it is the thread pool context, akaTaskScheduler.Default
- for UI thread it is UI context, mostly implemented via event loop:
- when
await
ing the awaitable type the current context is captured, later rest of the method is posted on it SynchronizationContext
is a global variable- each
async
method can have its own context async
method can avoid capturing the context viaConfigureAwait(false)
- in ASP.NET only one continuation can be executed at a time for given request - no concurrency
- in ASP.NET Core multiple continuations can run concurrently – we have concurrency and parallelism
Specific thread executing the code | Delegates executed serially | Delegates executed in order | Send is synchronous | Post is asynchronous | |
---|---|---|---|---|---|
Default (ThreadPool based) | ❌ – any thread in the thread pool | ❌ | ❌ | ✔️ | ✔️ |
ASP.NET | ❌ – any thread in the thread pool | ✔️ | ❌ | ✔️ | ❌ |
WinForms | ✔️ – UI thread | ✔️ | ✔️ | Only if called on UI thread | ✔️ |
WPF | ✔️ – UI thread | ✔️ | ✔️ | Only if called on UI thread | ✔️ |
await Task.FromResult(...)
will just run synchronouslyawait Task.Yield
will block and explicitly create continuation- state machine is a
class
in Debug mode and astruct
if in Release - don’t wait for asynchronous methods in synchronous code if you don’t have to
- TODO lab: UI w/ & w/o ConfigureAwait(false)
Task
vsvoid
- they capture the context in the same way
- if
Exception
is thrown inasync Task
, it is then remembered in the context ofTask
object and propagated when awaited or cleaned up - if an exception is thrown in
async Task
method but nobodyawait
s it, then it is not propagated until the GC cleans up theTask
- in
async void
methods theException
is propagated immediately. This results in throwing unhandled exception on the thread pool which kills the application
TaskCompletionSource
mostly runs continuations synchronously when doingSetResult()
- for
await
they run synchronously - for
ContinueWith
they run asynchronously - use
TaskCreationOptions.RunContinuationsAsynchronously
where possible
- for
AggregateException
forms a tree ifTask
has childTask
s- has method Flatten which makes a list from the exception tree
- even if only one exception is thrown, it is still wrapped
await
vsWait()
vsGetAwaiter().GetResult()
await
works asynchronouslyawait
usesGetAwaiter()
under the hoodWait()
andGetAwaiter().GetResult()
works synchronouslyWait()
doesn’t change the stacktrace, this results in showing a lot more physical threading detailsGetAwaiter().GetResult()
modifies the stacktraceWait()
wraps exceptions inAggregateException
GetAwaiter().GetResult()
andawait
do not wrap the exception- exceptions from
async void
are propagated as unhandled even in atry
/catch
- exceptions from
async Task
are stored inTask
and should beawait
ed/Wait()
/Handle()
..- when chaining in parent-child hierarchy (
TaskCreationOptions.AttachedToParent
) we may miss exceptions, even inAggregatedException
- if there is an unobserved exception, it is raised by finalizer thread in
UnobservedTaskException
event where it can be cleared. If not cleared, the process dies (.NET 4) or the exception is suppressed (.NET 4.5) - always await tasks, handle all the exceptions
- when chaining in parent-child hierarchy (
- always add handlers to unobserved exceptions and unhandled exceptions.
- under .NET/IL exception mechanism there is SEH/VEH mechanism, you can register your ex handlers in C++ via
AddVectoredExceptionHandler
/AddVectoredContinueHandler
- use
ExceptionDispatchInfo.Capture(e).Throw();
to rethrow
with additional detail like line nr of method that didthrow
anException
- uncatchable:
StackOverflowException
(only via hack with VEH handler with P/Invoke),ThreadAbortException
(cant catch, can't swallow),AccessViolationException
(can catch some with attr),SEHException
,OutOfMemoryException
[HandleProcessCorruptedStateExceptionsAttribute]
(and enablelegacyCorruptedStateExceptionsPolicy
in .NET Core) does catch some- unhandled exception that happens on a thread pool it is held until awaiting and then propagated if possible (thrown out of band for
async void
) - catching unhandled exception with
AppDomain.CurrentDomain.UnhandledException
doesn’t stop the application from terminating. - constrained executed region is an area of code in which the CLR is constrained from throwing out-of-band exceptions that would prevent the code from executing in its entirety
- in .NET
catch(...) when(...)
is not just syntatic sugar, its connected with two pass filter SEHException
:OutOfMemoryException
,AccessViolationException
when r/w outside of JIT-compiled code or inside but the address is outside of the null-pointers partition,NullReferenceException
when the r/w is inside JIT-compiled code and the address is inside nullpointers partition, everything else isSEHException
- .NET does not check whether the reference is
null
or not, it tries to r/w, MMU notifies CPU, .NET catch it and checks if this was null pointer partition
- async method does not run neither on thread nor on thread pool
- async is for scalability
- async ops are performed by the hardware (then I/O completion port)
- always distinguish between CPU and I/O bound operations
- without await the long-running/background processing op will block thread pool thread
- for async do not use
TaskCreationOptions.LongRunning
because it will create a thread and firstawait
wil destroy it await task
!=task.Wait()
since former one jumps back here as soon as the operation is completed (continuations and coroutines) and latter one is blocking coz it's waiting for completions- async all the way up - you shouldn’t mix synchronous and asynchronous code without carefully considering the consequences
- this will block the thread that enters and no effort has been made to prevent a present SynchronizationContext from becoming deadlocked:
public string DoOperationBlocking() { return DoAsyncOperation().Result; // .GetAwaiter().GetResult(); }
- this will block the thread that enters and
DoOperationBlocking()
will be scheduled on the default task scheduler, thus removing the risk of deadlocking:public string DoOperationBlocking() { return Task.Run( () => DoAsyncOperation() ).GetAwaiter().GetResult(); }
- this will block the thread that enters and thred pool thread inside. in case of exception we will get
AggregateException
inside ofAggregateException
:public string DoOperationBlocking() { return Task.Run( () => DoAsyncOperation().Result ).Result; }
await
instead ofContinueWith
-ContinueWith
existed before, it ignores SynchronizationContextpublic string DoSomethingAsync() { return CallDependencyAsync().ContinueWith( t => t.Result+1 ); }
- TODO lab: setup web api with: thread sleep vs task.delay, sync, sync over async, async over sync, async
TaskCompletionSource
is very dangerous,Try
/Set
(Result
/Exception
/Canceled
) runs inline, leads to re-entrancy, deadlocks, thread pool starvation and broken state,TaskCreationOptions.RunContinuationsAsynchronously
may help- async flush for
Stream
/StreamWriter
using var sw = new StreamWriter(s); await sw.WriteAsync(...); // uses tmp buffer, won't flush original buffer await sw.FlushAsync(); // flush original buffer
- check
ValueTask<T>
that is allocated on stack instead of a heap Timer
callbacks are a little bit different because ofTimerQueue
. Do not passasync void
method to thenew Timer(HERE, null 1000, 1000)
- avoid implicit
async void
:public static void FireAndForget(Action action);
add overload forasync
like:public static void FireAndForget(Func<Task> action);
- in dev trooubleshoot async issues (deadlocks, thread pool starvation) using VS's Parallel Stack window
- in prod do memory snapshot (
procdump -ma <pid>
) and check which foreground tread is still running, look for_agent
state- load via VS Enterprise
- thread call stacks do not give the full picture, so try diagnosing blocked tasks (
windbg > sosex!refs
) and follow reverse references chain
ThreadPool
is starved when:- there is no deadlock but everything is blocked
- 0%CPU and thread count is increasing
- waiting synchronously on a Task is dangerous
ThreadPool
scheduling is unfair, especially when usingTaskCreationOptions.LongRunning
arr.Skip(size/2).Take(size/4).ToArray()
>Array.Copy(arr, size/2, newArr, 0, size/4);
is x7 faster >arr.AsSpan().Slice(size/2, size/4)
is bezilion times fasterSystem.Memory
.Span<T>
is a value type that provides r/w view onto a contiguous region of memory: heap (Array
,String
), stack (viastackalloc
) and nativeReadOnlySpan<T>
- cannot be: boxed, a field or standard (non
ref
) struct, used as an arg or local variable insideasync
methods, captured by lambda, used as generic type arg
System.Memory
.Memory<T>
can live in a heap so the same limitations do not apply- is a readonly struct ut not
ref struct
- slightly slower to slice into it
- is a readonly struct ut not
- using
Span<T>
/Memory<T>
can not only speeds up the app, but reduce allocations significantly, thus removing Hen 0 collections System.Buffers
.ArrayPool<T>
is a pool of arrays for reuse- is likely to return arr larger than requested and not cleared
var arr = ArrayPool<T>.Shared
+var buff = arr.Rent(...)
+try
/finally
+arr.Return(buff)
- when focusing on performance:
- measure, don't assume
- be scientific: make small changes each time and measure again
- focus on hot paths
- don't copy memory, slice it with
Span<T>
- use
ArrayPool
s where appropriate to reduce array allocations
- GC has three generations where
- GEN0 is a stack
- GEN2 consists of LOH which contains objects having at least 85kb
new
is translated tonewobj
IL instruction- allocates a new instance of the class associated with ctor
- initializes all the fields in the new instance to 0 (of the proper type) or null references as appropriate
- calls the ctor with the given arguments along with the newly created instance
- after the constructor has been called, the now initialized object reference (type O) is pushed on the stack
- improving GC performance:
- allocating objects in custom pool doesn’t change GC logic, as soon as there is a reference pointing to them, GC will traverse the object graph
- use
ArrayPool<T>