## V8 GC internals ### How it used to be V8 used to use a basic form of generational GC, with semi-space scavenging in the young generation, and a mix of mark-sweep/mark-compact in the old generation. Large objects are allocated directly from the OS instead of from the GC managed heap, and is only swept and never compacted by the GC. ### Current status (Ab-)use stack guard for safepoint polling. Stack guard checks are emitted in function prologue and on backedges. `Heap::CollectGarbage()` Entry point to: * minor GC: `Heap::Scavenge()` * global GC: `MarkCompactCollector::CollectGarbage()` #### Triggering a collection An allocation request might have come directly from JIT compiled JavaScript code, where the fast path allocation logic is inlined, and the slow path calls `Runtime::PerformGC()` (generated by `CEntryStub::GenerateCore()`). The retry policy is in C function entry (`CEntryStub::Generate()`). An example of this route is the fast path of creating a new JavaScript object from object literal in JIT'd code: * `HOptimizedGraphBuilder::VisitObjectLiteral()` creats `HAllocate` nodes to allocate new objects * `HAllocate` gets lowered into `LAllocate` * `LAllocate` generates fast path code by `LCodeGen::DoAllocate()` via `MacroAssembler::Allocate()` * when `inline_new` flag is on (on by default on x86), generate inline allocation code in the specified allocation space (doesn't have to be `NEW_SPACE`) * `LAllocate` generates slow path code by `LCodeGen::DoDeferredAllocate()` via `MacroAssembler::CallRuntimeSaveDoubles()` * `MacroAssembler::CallRuntimeSaveDoubles()` creats a `CEntryStub` to do the runtime call, which eventually goes to `Allocate()` in `runtime.cc` for slow path allocation, and let the C entry stub initiate a GC on allocation failure. Also, look at the slow path: * `HOptimizedGraphBuilder::VisitObjectLiteral()` creats a `HCallRuntime` node to call `Runtime_CreateObjectLiteral()` in slow path * `HCallRuntime` gets lowered into `LCallRuntime` * `LCallRuntime` generates code via `MacroAssembler::CallRuntime()` * `MacroAssembler::CallRuntime()` creates a `CEntryStub` to do the runtime call * (*and now we see how runtime calls can potentially trigger GCs*) * (*but actually in the case of object literal handling, there are calls to `Factory::NewJSObjectFromMap()` which triggers GCs in the macro mentioned below, instead of in the `CEntryStub`*) The allocation request could also have come from, for example, `Factory::NewJSObject()` and friends. In the case, it calls `Heap::AllocateJSObject()` wrapped by a `CALL_HEAP_FUNCTION` macro. And it is that macro that iniiates a garbage collection, with `Heap::CollectGarbage()`. Both paths implement a retry policy as: * (*debug build only*) if `gc_greedy` is set, and if neither the bootstrapper of this isolate is active nor allocation failure is allowed, then initiate a minor GC. This is to minimize garbage during execution, for debugging purpose. * (*1st time*) make the heap-related runtime function call (most probably an allocation request) * if the call returns a normal `Object`, return it * else, if the call returns a `Failure` immediate object, whose value is `OutOfMemoryException`, handle the OOM (the default action is to report OOM and exit process, via `FatalProcessOutOfMemory()`) * else, if the call returns a `Failure` immediate object, whose value is `RetryAfterGC`, * (*`Runtime::PerformGC()` only*) try to add a fresh `Page` to new space * if new space expansion succeeds, skip the GC on next line * initiate a GC in the allocation space encoded in this `Failure` with the reason `"allocation failure"`, via `CollectGarbage()` * (*2nd time*) retry the runtime function call again after GC * if the call returns a normal `Object`, return it * else, if the call returns a `Failure` immediate object, whose value is `OutOfMemoryException`, handle the OOM (the default action is to report OOM and exit process, via `FatalProcessOutOfMemory()`) * else, if the call returns a `Failure` immediate object, whose value is `RetryAfterGC`, * (*`CALL_AND_RETRY` only*) * increment counter `gc_last_resort_from_handles` * initiate a global GC with reason `"last resort gc"`, via `CollectAllAvailableGarbage()` * (*`Runtime::PerformGC()` only*) * increment counter `gc_last_resort_from_js` * initiate a global GC with reason `"Runtime::PerformGC"`, via `CollectAllGarbage()` * enter a `AlwaysAllocateScope` * (*3rd time*) retry the runtime function call after global GC * leave the `AlwaysAllocateScope` * if the call returns a normal `Object`, return it * else, if the call returns a `Failure` immediate object, whose value is `OutOfMemoryException`, handle the OOM (the default action is to report OOM and exit process, via `FatalProcessOutOfMemory()`) * else, if the call returns a `Failure` immediate object, whose value is `RetryAfterGC`, treat it as an OOM and handle it (report OOM and exit process, via `FatalProcessOutOfMemory()`) * otherwise, return an empty value. * otherwise, return an empty value. * if it gets here, return an empty value. `Heap::CollectGarbage(AllocationSpace space, const char* gc_reason)` is the main entrance to GC. It chooses the collector with `Heap::SelectGarbageCollector()`, and passes that to another overload of `CollectGarbage()`. `Heap::SelectGarbageCollector()` chooses the `SCAVENGER` collector as default, and the `MARK_COMPACT` collector in the following 5 cases: 1. The specified allocation space is one of the old spaces - "GC in old space requested", a global GC is requested + Global GC forced by flags - "GC in old space forced by flags", either `gc_global` is on, or `stress_compaction` is on and the GC count is odd. + Enough data was promoted in the last minor GC - "promotion limit reached", OldGenerationAllocationLimitReached() is true. This is to avoid promotion failure during a scavenge + Allocation in OLD or LO failed - "old generations exhausted" + Not enough space in OLD to guarantee a scavenge can succeed - "scavenge might not succeed" `Heap::CollectGarbage(AllocationSpace space, GarbageCollector collector, const char* gc_reason, const char* collector_reason)` puts the VM into GC state, and starts the real collection. --- Take allocation for a JSObject for example. In `Heap::AllocateJSObjectFromMap()`, it calls `Heap::SelectSpace()` for the `AllocationSpace` of the new object, which could be one of * `LO_SPACE` - for objects with size too large to fit in a page (`Page::kMaxNonCodeHeapObjectSize`, which is 1MB by default) * `OLD_POINTER_SPACE` - for pretenured objects with embedded object pointers * `OLD_DATA_SPACE` - for pretenured objects without embedded object pointers * `NEW_SPACE` - for normal allocation V8 may choose to do global pretenuring if the `pretenuring` flag is true (true by default) and when there's sustained high survival rate in new space. V8 may get pretenuring support for specific allocation sites in the future, but it's not there yet. Once the allocation space is chosen, `Heap::Allocate()` is called to select a retry space in case allocation fails, and then `Heap::AllocateRaw()` is called dispatch the actual allocation to the specified space. If the allocation space is `NEW_SPACE`, the allocation failed and there is an active `AlwaysAllocateScope`, then it may be retried in the specified retry space. `NewSpace::AllocateRaw()` is a typical bump-pointer style allocation, with a fallback slow path in `NewSpace::SlowAllocateRaw()`. The slow path does one of: * Incremental mark step - the fast path failed because incremental marking artifically lowered the limit in new space; do an incremental mark step and try the allocation with `NewSpace::AllocateRaw()` again * Expansion - add a new page to the new space and try allocating again; incremental mark piggy backs on this, too * Trigger GC - return `Failure::RetryAfterGC()` to indicate a GC should be triggered; the failing allocation space is encoded into the `Failure` immediate object. `PagedSpace::AllocateRaw()` tries lieaner allocation first, and fallback to trying free list or moving to the next `Page`. The fast path logic is basically the same as the fast path in `NewSpace::AllocateRaw()`. The same logic is inlined in the JIT'd code. #### Minor GC Stop-the-world scavenging. Time is proportional to the size of live objects in young generation, and also affected by fragmentation in the old generation when promoting objects. The copying logic is implemented in `EvacuateObject()`; whether or not an object should be promoted is controlled by `Heap::ShouldBePromoted()` ```c++ bool Heap::ShouldBePromoted(Address old_address, int object_size) { // An object should be promoted if: // - the object has survived a scavenge operation or // - to space is already 25% full. NewSpacePage* page = NewSpacePage::FromAddress(old_address); Address age_mark = new_space_.age_mark(); bool below_mark = page->IsFlagSet(MemoryChunk::NEW_SPACE_BELOW_AGE_MARK) && (!page->ContainsLimit(age_mark) || old_address < age_mark); return below_mark || (new_space_.Size() + object_size) >= (new_space_.EffectiveCapacity() >> 2); } ``` #### Allocation site tracking When `track_allocation_sites` flag is turned on (on by default), V8 could install allocation memento to track allocations, so as to reduce transitions. ## Tips on GC tuning ### Turn on GC logging with `--trace-gc` and `--trace-gc-verbose` Use `--trace-gc-nvp` if you want to automate analyzing the GC logs. Use `--trace-gc-ignore-scavenger` if you want to focus on major GC events only, and ignore minor GC (scavenging) events. Use `--trace-external-memory` for external memory allocation tracing. ### GC heap sizing with `--max-new-space-size` and `--max-old-space-size` `--max-new-space-size` is in KB, where as `--max-old-space-size` is in MB. (figure: Generational GC heap layout) ### Tune GC algorithms Basic serial mark-sweep Incremental marking Parallel marking Lazy sweeping Parallel/Concurrent sweeping ### Turn off idle notification with `--nouse-idle-notification` ### Be cautious when using `--expose-gc` Using the `--expose-gc` flag in Node will expose a `gc()` function to JavaScript code, which could be used to force initiate a major GC. Implemented by `GCExtension::GC()` in `gc-extension.cc`. There's also a `--expose-gc-as` where you can specify the name of the `gc()` function. This function takes a boolean argument, it'll invoke a minor GC when the argument is true, otherwise invoke a major GC. Demonstrating in a d8 shell (using VM flags: `--trace_gc --expose_gc`): ``` d8> gc() [8368] 4956 ms: Mark-sweep 1.6 (37.0) -> 1.4 (37.0) MB, 71.7 ms [gc extension] [GC in old space requested]. undefined d8> gc(true) [8368] 11372 ms: Scavenge 1.4 (37.0) -> 1.4 (37.0) MB, 3.6 ms [gc extension]. undefined ``` But turning this option on will cause some unexpected side effects in current implementation of V8. It effectively cancels incremental marking and lazy/parallel/concurrent/precise sweeps, opting to a stop-the-world mark and conservative serial sweep. There was a change to allow incremental marking when `expose_gc` is turned on (in [r13072](https://codereview.chromium.org/11299154)), but was [later reverted](https://codereview.chromium.org/11316268/). ## Quotes ### Node.js FAQ > #### What is the memory limit on a node process? > > Currently, by default v8 has a memory limit of 512mb on 32-bit systems, and 1gb on 64-bit systems. The limit can be raised by setting --max-old-space-size to a maximum of ~1gb (32-bit) and ~1.7gb (64-bit), but it is recommended that you split your single process into several workers if you are hitting memory limits. ## References 1. [Garbage Collection](http://cs.au.dk/~jmi/VM/GC.pdf), Vyacheslav Egorov, 2012-02-28 + [A game changer for interactive performance.](http://blog.chromium.org/2011/11/game-changer-for-interactive.html), Chromium Blog, 2011-11-21 + [Escape the 1.4GB V8 heap limit in Node.js!](http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js/), Aaron "Caustik" Robinson, 2012-04-11 + [Node.js FAQ](https://github.com/joyent/node/wiki/FAQ) + [V8 javascript VM and Node.js memory management options](http://code.osnap.us/wp/?p=21) + [Node.js Memory Leaks – Monitoring, Troubleshooting, Tips](http://code.osnap.us/wp/?p=43) + [ガベージコレクションのアルゴリズムと実装](http://book.douban.com/subject/4881935/) + [Bug 634503 - Investigate V8's GC](https://bugzilla.mozilla.org/show_bug.cgi?id=634503) + [Garbage collection - SpiderMonkey Internals](https://developer.mozilla.org/en-US/docs/SpiderMonkey/Internals/Garbage_collection), Mozilla Developer Network ## Relevant Changesets 1. [Remove parallel marking support.](https://codereview.chromium.org/25260003) r17014 + [Added parallel marking threads.](https://codereview.chromium.org/12047044) r13569 + [Support compaction for code space pages.](http://codereview.chromium.org/7834018) r9155 2011-09-06 + [Split single slots buffer into per page slots buffers.](http://codereview.chromium.org/7326012) r8582 2011-07-08 + [Support slots recording for compaction during incremental marking.](http://codereview.chromium.org/7302003) r8559 2011-07-07 + [Simple non-incremental compaction by evacuation.](http://codereview.chromium.org/7189066) r8350 2011-06-21 + [Unify markbits for old and new spaces.](http://codereview.chromium.org/7032005) r7911 2011-05-17 + [Introduce lazy sweeping.](http://codereview.chromium.org/6970004) r7836 2011-05-10 + [Make the marking stack into a deque](http://codereview.chromium.org/6928010) r7791 2011-05-05 + [Get rid of distinction between below- and above-watermark in page allocation.](http://codereview.chromium.org/6639024) r7247 2011-03-18 Now all allocation is in linear areas taken from the free list. + [Basic implementation of incremental marking.](http://codereview.chromium.org/6542047) r6916 2011-02-23 + [Start using store buffers. Handle store buffer overflow situation.](http://codereview.chromium.org/6250076) r6616 2011-02-03 this change removed `SweepLargeObjectSpace()` + [Introduce conservative sweeping.](http://codereview.chromium.org/6321008) r6406 2011-01-20 + [Start to implement new write barrier.](http://codereview.chromium.org/5889001) r6021 2010-12-15 + [Introduce conservative sweeping.](http://codereview.chromium.org/6321008) r6406 2011-01-20 + [Separate markbits from heap object map words into bitmaps.](http://codereview.chromium.org/6088012) r6230 2011-01-07 degraded marking and sweeping perf but is intended