streamable heap snapshot system for constrained memory (WASM)
- existing browser heap snapshot infrastructure (i.e. chrome and firefox's) assume it is possible to build a full retained graph of the heap in memory, and sometimes require multiple passes to generate the snapshot file before saving it. this can result in out-of-memory crashes when attempting to save or load a snapshot. our customers have very large heaps they want to take snapshots of, so this approach is unsuitable
- these snapshot tools only understand the JS heap, and don't have any hooks we could use to feed our GC heap layout into them when a snapshot is taken anyway
- existing viewers for these formats are incredibly fragile and basically only work on snapshots that were generated by a recent version of the same browser, so we can't realistically generate snapshots in this format. if a snapshot fails to load you barely get any sort of error message
- .net has a couple answers for this (can we leverage their tooling?), coreclr gcdump and classic netfx heap snapshots in VS
- i wasn't able to find any format documentation for these, and the code in tree looked hairy, but they may still be the right format to use
- mono has an existing 'heapshot' system built on top of the profiler and sgen instrumentation, but the format is complex and the viewer is unmaintained (untouched for 7+ years)
- we might be able to revive it and make it usable again, but i don't know how much value we'd get out of it. i remember it having limited functionality
- it appears to be necessary to activate it at startup and collect heapshots during execution instead of activated on-demand with a hotkey. this means multiple heapshots will get buffered up during execution, which is a problem, because...
- unlike other platforms, writing to the filesystem in WASM causes the entire file to get buffered in the browser tab's memory right next to application data
- this means if you're already close to running out of memory due to your heap being large, the snapshot will actually cause you to run out of memory
- the wasm filesystem implementation is slow and single-threaded, which means that having it taking snapshots constantly during execution will make the application run much slower
resulting goals for the snapshot system:
- hotkey or console command can be used in a running application at any point to capture a snapshot of the important information about the heap: object types, sizes, and references
- also include general metadata, like how much memory is in use, how big the large object heap is vs the nursery, how many GCs have happened
- format should be versioned and not encode too many mono-isms, so that we can extend it later on as necessary and generate it from non-mono .NET (i.e. wasm nativeaot) if necessary
- it should be possible to stream this format without needing to buffer large chunks of it up in the native or JS heap
- we can either stream it to a socket (so a helper process can save it to disk) or to a JS worker/iframe with its own heap to avoid running out of memory (the latter would then generate a downloadable file)
- generating it should not require multiple passes or full retained graphs
- instead of trying to generate an existing format like chrome's, generate a simple and well-defined format that we can then translate into what a specific viewer app expects
- though we may end up having to build a viewer or report generator ourselves anyway
- this should enable us to iterate separately on the capture system (in the shipped runtime) and the viewer/reporting infrastructure, so users don't have to update their runtime to get analysis improvements
- must work for multi-gigabyte heaps
- should be able to capture in a reasonable amount of time (i.e. <1 minute)
- should be able to cleanly represent a typical coreclr or mono heap, i.e. many roots, lots of objects of various types including boxed VTs, encodes type information in a reasonable way without duplication
- encoding a full representation of a type and its fields is a non-goal
- encoding which field referenced which object could be useful, but is nontrivial with the mono_gc_walk_heap API
- storing the actual contents of the heap is a non-goal
- though doing it for specific types, i.e. strings, COULD be very useful for analyzing heaps. The VS heap snapshot system can introspect strings and arrays in this way