Skip to content

Instantly share code, notes, and snippets.

@wi7a1ian
Last active February 11, 2022 09:37
Show Gist options
  • Save wi7a1ian/c998d89af9324fa4f1b74163dd5c8c73 to your computer and use it in GitHub Desktop.
Save wi7a1ian/c998d89af9324fa4f1b74163dd5c8c73 to your computer and use it in GitHub Desktop.
Notes from .NET Dev Days 2019 Warsaw conference #notes

.NET Dev Days Workshop

repo

Talk #1 Crash & Exceptions

  • Watch -> $(exception).InnerException ...
  • First time ext - break on any CLR ext
  • DebugerDisplay vs ToString()
  • Just my Code -> uncheck to troubleshoot timer tick
  • Threads window
  • Debuging > Smbols > MS Sym Serv
  • methods/types metadata in the .net assembly
  • debug several apps at the same time via attach
  • Debug > Windows > Parallel Stacks -> righe click + show external code
  • ctrl+f10
  • Modules view -> Optimized? then no brk pts
  • Debug.WriteLine() -> sysinternals DebugView
  • assigning $id to an object -> $id.GetHashCode() in the watch view

Workshop #1

  • Add function breakpoint -> point to .NET function, add condition this = $1 where $1 is our member variable we assined Object ID to
  • use Managed COmpatibility Mode when debugger fails

ClrMD nupkg allows better analysis of ur app

Talk #2 Memory Management

  • Commit Size FTW, Working Set sux
  • Task Mgr, Resource Monitor, PerfMon
  • Priv Byytes = Commited by Native + Commited by managed code
  • Process Explorer -> Prov Bytes (Native) + Priv Bytes History (Native) + Heap Bytes (.NET)
  • VMMap -> Managed Heap -> GC -> ... each gen has *cores of memo segments allocated

Workshop #2

  • dumpbin - gathers dump, process explorer fails at it for some reason
  • You can load memo dump post mortem with VS Enterprise -> Debug Managed Memory, normal VS will just show u
  • Use windbg+sos to find memo leaks: have two procdumps -> load em -> do !dumpheap -stat -> find obj type that increased -> find obj address -> find call stack via !mroot <addr>
  • PerfView can load heap snapshots from crash dumps, then diff 'em and show you which obj leaked + callstack

Talk #3 Thread Contention

Talk #3 Performance

  • dotTrace | perfView | vs profiler

.Net Dev Days Day 1

Keynote

  • ? (first 30 min)
  • CI & CD pipeline using Github actions & secrets
  • gRPC for .NET Core - contract-first API development, using Protocol Buffers (.proto) by default, reduced network usage with binary serialization, allowing for language agnostic implementations, autogenerates Base classes, tools available for many languages to generate strongly-typed servers and clients from *.proto files.
  • .NET Core 3.0 has new type of application template called Worker Service Worker, a starting point for long running background processes like win services or linux daemons. dotnet new worker
  • Azure Cognitive services - realtime crowd insights, ink and form recognizer!
  • Power BI - big data analysis (AI) and visualization
  • Microsoft Flow - turn repetitive tasks into multistep workflows
  • Azure Data Factory (doc) - build hybrid ETL and ELT pipelines via visual environment, serverless cloud data integration tool that scales on demand, pipeline runs by creating and scheduling triggers
  • M365

Writing Fast Code Using .NET Core 3.0 Hardware Intrinsics - Martin Ullrich

  • Always start with benchmark
  • Optimize using classic techniques first:
    • Try to have fewer heap allocations
    • Avoid layer of indirection: wrappers around arrays, lists, matrices
    • Avoid LINQ
      • Balance performance, readability, maintenance effort, composition,
      • Sum, Min, Max, Count etc without predicates are a waste of resources
    • Better data structures for your problem
  • Vectorization in .NET Core 3.0: System.Numerics for SIMD, Vector<T> is hardware dependent (256bit on AVX2 machines), Vector2...Vector4 and Matrix4x4 for specialized operations, backed by JIT intrinsics with software fallback
    • Vector.AsSpan(): TODO
  • System.Runtime.Intrinsics.*: expose hardware instructions directly, for maximum performance
    • BitOperations with software fallback
    • Avx.IsSupported
    • Avx.LoadAlignedVector256 + Avx.MoveMask
  • Env variables can disable some instruction set architectures: COMPlus_EnableAVX2=0, COMPlus_EnableSSE41=0
  • Going unsafe + AVX2 can have significant improvement: 80% and up
    unsafe{
      fixed(float* valuePtr = values){
        for(int i=0; i < vectorizableLen; i+=Vector256<float>.Count){
          var valuesVector = Avx.LoadVector256(valuePtr+i); // LoadAlignedVector256
          tempVector = Avx.Add(tempVector, valuesVector);
        }
      }
    }
    
  • TODO lab: Sum - naive vs linq vs System.Numerics vs Avx vs AvxAlignedPipelined
  • Check Intel Intrisics Guide: AVX + AVX2

HTTP in .NET Core - Steve Gordon

slides

  • using var client = new HttpClient() - once conn is closed the socket is still unavailable for 240s (netstat -n| where { $_ -like '*TIME_WAIT*'}) > we may exhaust the connection pool
  • services.AddSingleton<HttpClient>() - no DNS refresh
  • HttpClientFactory + Polly - Success!
    • services.AddHttpClient()
    • named client: services.AddHttpClient("nova1", c => { c.BaseAddress = "..."; }) + factory.CreateClient("nova1");
    • typed client: services.AddHttpClient<INovaClient, NovaClient>() + public NovaClient(HttpClient httpClient){ ... }
  • delegating handlers: services.AddHttpClient<...>(...).AddHttpMessageHandler<SomeFailureMetricHandler>();
  • Polly: resilience and transient-fault-handling - retry, circuit brekers, timeouts, bulkhead isolation, cache, fallback, policywrap - handle 5XX and 408 via .AddTransientHttpErrorPolicy(b => b.WaitAndRetryAsync(3, retryCount => TimeSpan.FromSeconds(Math.Pow(2, retryCount))));
  • configuring primary handler:
    services.AddHttpClient<INovaClient, NovaClient>().ConfogurePrimaryHttpMessageHandler(() => 
    {
      var handler = new HttpClientHandler{
        ClientCertificateOptions = ClientCertificateOption.Manual,
        SslProtocols = SslProtocols.Tls12,
        AllowAutoRedirect = false
      };
      handler.ClientCertificates.Add(new X509Certificate2("KldDst.crt"));
      return handler;
    });
    
  • tips & tricks
    // don't
    var data = await response.Content.ReadAsStringAsync();
    return JsonConvert.DeserializeObject<IEnumerable<SomeModel>>(data);
    // do:
    return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>();
    
    var response = await client.SendAsync(request);
    // don't
    return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>();
    // do:
    return (response.IsSuccessStatusCode) ? await response.Content.ReadAsAsync<IEnumerable<SomeModel>>() : Array.Empty<SomeModel>();
    
    [HttpGet]
    public async Task<ActionResult<IEnumerable<SomeModel>>> GetSth(CancellationToken ct) => client.GetSthAsync(ct);
    //...
    var response = await client.SendAsync(request, ct); // pass cancellation tokens!
    return await response.Content.ReadAsAsync<IEnumerable<SomeModel>>(ct);
    
    var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, ct);
    try { return (response.IsSuccessStatusCode) ? await response.Content.ReadAsAsync<IEnumerable<SomeModel>>() : Array.Empty<SomeModel>(); }
    finally { response.Dispose(); } // u have to!
    
  • When to dispose:
    • HttpClient via IHttpClientFactory - no effect
    • HttpClient w/o IHttpClientFactory - never!
    • HttpRequestMessage - only has an effect if sending StreamContent
    • HttpResponseMessage - no evvect unless using ResponseHeadersRead
  • Using SocketsHttpHandler: new HttpClient(new SocketsHttpHandler{ MaxConnectionsPerServer=5 });
  • Http/2 is supported, added for gRPC

Spying on .NET with EventPipes and ETW - Christophe Nasarre

  • ETW and CLR Events: better performance
  • .NET Core 3.0 and Event Pipes: new tooling, cross-platform alternative to ETW
    • dotnet tool list -g, dotnet tool update <dotnet-xxx> -g
    • do dotnet trace and then open .nettrace file with perfview
      • dotnet trace convert -format speedscope <logs> and then open with speedscope.app visualizer (CPU Flame Graphs)
    • do dotnet counters monitor for health monitoring or performance investigation
    • do dotnet dump collect to collect and dotnet dump analyze to analyze Windows and Linux dumps all without any native debugger involved, it will allow you to run SOS commands to analyze crashes and the GC

dotnet-dump | dotnet-counters | dotnet-trace

  • native thread memory footprint: user mode stack by default has 1 MB, kernel mode has 12/24 KB
  • user mode has impersonation info, security context, Thread Local Storage
  • notepad.exe have 22 threads right after the start
  • native thread (system) != managed thread (.NET): even ID is independent of native thread ID
  • unhandled exception kills the application in most cases
  • ThreadPool is static and is different from Win32 thread pool: used by tasks, asynchronous timers, wait handles and ThreadPool.QueueUserWorkItem.
    • threads work in background, do not clear the TLS, have default stack size and default priority
    • one ThreadPool per process
    • two types of threads: for ordinary callbacks and for I/O operations
    • thrown Exception is held until awaiting and then propagated if possible (thrown out of band for async void)
  • awaitable type must be able to return GetAwaiter() with the following: implements INotifyCompletion, bool IsCompleted { get; }, TResult GetResult()
  • async means nothing, it only instructs the compiler to create a state machine
  • we can make any type awaitable using extension methods
    await 9000;
    //...
    public static class AwaitableInt{ public static TaskAwaiter(this int ms) => TaskDelay(ms).GetAwaiter(); }
  • Task splits into two types:
    • cpu-bound delegate task: TPL world, has code to run, can be scheduled and executed
    • i/o-bound promise task: async world, signals completion of sth, Task.FromResult(), Task.Delay(), Task.Tield()
  • do not use Tasks ctor, better do Task.Run() or Task.Factory.StartNew() or PLINQ
  • TaskScheduler schedules tasks on the threads, it makes sure that the work of a task is eventually executed
    • for TPL and PLINQ is based on the thread pool
    • supports work-stealing, thread injection/retirement and fairness
    • two types of queues: global for top level tasks, local for nested/child tasks, accessed in LIFO order
    • long running tasks are handled separately, do not go via global/local queue
  • Task.ContinueWith creates a continuation that executes asynchronously when the target task complets
    • can specify: CancellationToken, TaskScheduler and TaskContinuationOptions
      • AttachedToParent to create hierarchy of tasks
      • ExecuteSynchronously, RunContinuationAsynchronously to choose the thread running it
      • HideScheduler to run using the default scheduler instead of the current one
      • LongRunning more or less to run on dedicated thread
  • Task.Id is independent from TaskScheduler.Id, is being reused causing collision if stored
  • Task is a class so it is allocated on the heap and needs to be collected by GC
  • ValueTask<T> is a struct and is allocated on the stack
  • ExecutionContext is a bag holding logical context of the execution
    • contains SynchronizationContext, LogicalCallContext, SecurityContext, HostExecutionContext, CallContext etc
    • is passed correctly through asynchronous points -> will follow to the other thread
    • methods with Unsafe* prefix do not propagate the context, for instance ThreadPool.UnsafeQueueUserWorkItem
    • starting in .NET 4.6 there is an AsyncLocal<T> class working as ThreadLocal<T> (TLS variables) for tasks
    • TODO lab: AsyncLocal<T> vs ThreadLocal<T>
  • SynchronizationContext is a base class that provides a free-threaded context with no synchronization
    • OperationStarted and OperationCompleted handle notifications
    • Send and Post synchronous message
    • Current gets synchronization context for the thread
      • for UI thread it is UI context, mostly implemented via event loop: WindowsFormsSynchronizationContext, DispatcherSynchronizationContext, WinRTSynchronizationContext, WinRTCoreDispatcherBasedSynchronizationContext
      • for ASP.NET request it is ASP.NET context: AspNetSynchronizationContext where thread can be different than original one, but still the request context is the same
      • for ASP.NET Core there is no spearate context, thus no risk of deadlock, no need to use ConfigureAwait(false)
      • if null then it is the thread pool context, aka TaskScheduler.Default
    • when awaiting the awaitable type the current context is captured, later rest of the method is posted on it
    • SynchronizationContext is a global variable
    • each async method can have its own context
    • async method can avoid capturing the context via ConfigureAwait(false)
    • in ASP.NET only one continuation can be executed at a time for given request - no concurrency
    • in ASP.NET Core multiple continuations can run concurrently – we have concurrency and parallelism
Specific thread executing the code Delegates executed serially Delegates executed in order Send is synchronous Post is asynchronous
Default (ThreadPool based) ❌ – any thread in the thread pool ✔️ ✔️
ASP.NET ❌ – any thread in the thread pool ✔️ ✔️
WinForms ✔️ – UI thread ✔️ ✔️ Only if called on UI thread ✔️
WPF ✔️ – UI thread ✔️ ✔️ Only if called on UI thread ✔️
  • await Task.FromResult(...) will just run synchronously
  • await Task.Yield will block and explicitly create continuation
  • state machine is a class in Debug mode and a struct if in Release
  • don’t wait for asynchronous methods in synchronous code if you don’t have to
    • TODO lab: UI w/ & w/o ConfigureAwait(false)
  • Task vs void
    • they capture the context in the same way
    • if Exception is thrown in async Task, it is then remembered in the context of Task object and propagated when awaited or cleaned up
    • if an exception is thrown in async Task method but nobody awaits it, then it is not propagated until the GC cleans up the Task
    • in async void methods the Exception is propagated immediately. This results in throwing unhandled exception on the thread pool which kills the application
  • TaskCompletionSource mostly runs continuations synchronously when doing SetResult()
    • for await they run synchronously
    • for ContinueWith they run asynchronously
    • use TaskCreationOptions.RunContinuationsAsynchronously where possible
  • AggregateException forms a tree if Task has child Tasks
    • has method Flatten which makes a list from the exception tree
    • even if only one exception is thrown, it is still wrapped
  • await vs Wait() vs GetAwaiter().GetResult()
  • await works asynchronously
  • await uses GetAwaiter() under the hood
  • Wait() and GetAwaiter().GetResult() works synchronously
  • Wait() doesn’t change the stacktrace, this results in showing a lot more physical threading details
  • GetAwaiter().GetResult() modifies the stacktrace
  • Wait() wraps exceptions in AggregateException
  • GetAwaiter().GetResult() and await do not wrap the exception
  • exceptions from async void are propagated as unhandled even in a try/catch
  • exceptions from async Task are stored in Task and should be awaited/Wait()/Handle()..
    • when chaining in parent-child hierarchy (TaskCreationOptions.AttachedToParent) we may miss exceptions, even in AggregatedException
    • if there is an unobserved exception, it is raised by finalizer thread in UnobservedTaskException event where it can be cleared. If not cleared, the process dies (.NET 4) or the exception is suppressed (.NET 4.5)
    • always await tasks, handle all the exceptions
  • always add handlers to unobserved exceptions and unhandled exceptions.

.Net Dev Days Day 2

  • under .NET/IL exception mechanism there is SEH/VEH mechanism, you can register your ex handlers in C++ via AddVectoredExceptionHandler/AddVectoredContinueHandler
  • use ExceptionDispatchInfo.Capture(e).Throw(); to rethrow with additional detail like line nr of method that did throw an Exception
  • uncatchable: StackOverflowException (only via hack with VEH handler with P/Invoke), ThreadAbortException (cant catch, can't swallow), AccessViolationException (can catch some with attr), SEHException, OutOfMemoryException
  • [HandleProcessCorruptedStateExceptionsAttribute] (and enable legacyCorruptedStateExceptionsPolicy in .NET Core) does catch some
  • unhandled exception that happens on a thread pool it is held until awaiting and then propagated if possible (thrown out of band for async void)
  • catching unhandled exception with AppDomain.CurrentDomain.UnhandledException doesn’t stop the application from terminating.
  • constrained executed region is an area of code in which the CLR is constrained from throwing out-of-band exceptions that would prevent the code from executing in its entirety
  • in .NET catch(...) when(...) is not just syntatic sugar, its connected with two pass filter
  • SEHException: OutOfMemoryException, AccessViolationException when r/w outside of JIT-compiled code or inside but the address is outside of the null-pointers partition, NullReferenceException when the r/w is inside JIT-compiled code and the address is inside nullpointers partition, everything else is SEHException
  • .NET does not check whether the reference is null or not, it tries to r/w, MMU notifies CPU, .NET catch it and checks if this was null pointer partition
  • async method does not run neither on thread nor on thread pool
  • async is for scalability
  • async ops are performed by the hardware (then I/O completion port)
  • always distinguish between CPU and I/O bound operations
  • without await the long-running/background processing op will block thread pool thread
  • for async do not use TaskCreationOptions.LongRunning because it will create a thread and first await wil destroy it
  • await task != task.Wait() since former one jumps back here as soon as the operation is completed (continuations and coroutines) and latter one is blocking coz it's waiting for completions
  • async all the way up - you shouldn’t mix synchronous and asynchronous code without carefully considering the consequences
  • this will block the thread that enters and no effort has been made to prevent a present SynchronizationContext from becoming deadlocked:
    public string DoOperationBlocking() {
      return DoAsyncOperation().Result; // .GetAwaiter().GetResult();
    }
  • this will block the thread that enters and DoOperationBlocking() will be scheduled on the default task scheduler, thus removing the risk of deadlocking:
    public string DoOperationBlocking() {
      return Task.Run( () => DoAsyncOperation() ).GetAwaiter().GetResult();
    }
  • this will block the thread that enters and thred pool thread inside. in case of exception we will get AggregateException inside of AggregateException:
    public string DoOperationBlocking() {
      return Task.Run( () => DoAsyncOperation().Result ).Result;
    }
  • await instead of ContinueWith - ContinueWith existed before, it ignores SynchronizationContext
    public string DoSomethingAsync() {
      return CallDependencyAsync().ContinueWith( t => t.Result+1 );
    }
  • TODO lab: setup web api with: thread sleep vs task.delay, sync, sync over async, async over sync, async
  • TaskCompletionSource is very dangerous, Try/Set(Result/Exception/Canceled) runs inline, leads to re-entrancy, deadlocks, thread pool starvation and broken state, TaskCreationOptions.RunContinuationsAsynchronously may help
  • async flush for Stream/StreamWriter
    using var sw = new StreamWriter(s);
    await sw.WriteAsync(...); // uses tmp buffer, won't flush original buffer
    await sw.FlushAsync(); // flush original buffer
  • check ValueTask<T> that is allocated on stack instead of a heap
  • Timer callbacks are a little bit different because of TimerQueue. Do not pass async void method to the new Timer(HERE, null 1000, 1000)
  • avoid implicit async void: public static void FireAndForget(Action action); add overload for async like: public static void FireAndForget(Func<Task> action);

Debugging Asynchronous Scenarios in .NET - Christophe Nasarre, Kevin Gosse

decompiled code

  • in dev trooubleshoot async issues (deadlocks, thread pool starvation) using VS's Parallel Stack window
  • in prod do memory snapshot (procdump -ma <pid>) and check which foreground tread is still running, look for _agent state
    • load via VS Enterprise
    • thread call stacks do not give the full picture, so try diagnosing blocked tasks (windbg > sosex!refs) and follow reverse references chain
  • ThreadPool is starved when:
    • there is no deadlock but everything is blocked
    • 0%CPU and thread count is increasing
  • waiting synchronously on a Task is dangerous
  • ThreadPool scheduling is unfair, especially when using TaskCreationOptions.LongRunning
  • arr.Skip(size/2).Take(size/4).ToArray() > Array.Copy(arr, size/2, newArr, 0, size/4); is x7 faster > arr.AsSpan().Slice(size/2, size/4) is bezilion times faster
  • System.Memory.Span<T> is a value type that provides r/w view onto a contiguous region of memory: heap (Array, String), stack (via stackalloc) and native
    • ReadOnlySpan<T>
    • cannot be: boxed, a field or standard (non ref) struct, used as an arg or local variable inside async methods, captured by lambda, used as generic type arg
  • System.Memory.Memory<T> can live in a heap so the same limitations do not apply
    • is a readonly struct ut not ref struct
    • slightly slower to slice into it
  • using Span<T>/Memory<T> can not only speeds up the app, but reduce allocations significantly, thus removing Hen 0 collections
  • System.Buffers.ArrayPool<T> is a pool of arrays for reuse
    • is likely to return arr larger than requested and not cleared
    • var arr = ArrayPool<T>.Shared + var buff = arr.Rent(...) + try/finally + arr.Return(buff)
  • when focusing on performance:
    • measure, don't assume
    • be scientific: make small changes each time and measure again
    • focus on hot paths
    • don't copy memory, slice it with Span<T>
    • use ArrayPools where appropriate to reduce array allocations

  • GC has three generations where
    • GEN0 is a stack
    • GEN2 consists of LOH which contains objects having at least 85kb
  • new is translated to newobj IL instruction
    • allocates a new instance of the class associated with ctor
    • initializes all the fields in the new instance to 0 (of the proper type) or null references as appropriate
    • calls the ctor with the given arguments along with the newly created instance
    • after the constructor has been called, the now initialized object reference (type O) is pushed on the stack
  • improving GC performance:
    • allocating objects in custom pool doesn’t change GC logic, as soon as there is a reference pointing to them, GC will traverse the object graph
    • use ArrayPool<T>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment