Skip to content

Instantly share code, notes, and snippets.

@pettyjamesm
Created March 7, 2025 15:49
Show Gist options
  • Save pettyjamesm/31c45ae3716ade5a838283f90c1609c1 to your computer and use it in GitHub Desktop.
Save pettyjamesm/31c45ae3716ade5a838283f90c1609c1 to your computer and use it in GitHub Desktop.
New FlatGroupByHash Memory Usage

Initial observations about FlatGroupByHash memory usage compared to the previous MultiChannelGroupByHash that was removed in Trino 427 are documented in this gist. Those observations prompted a deeper investigation and discussion how FlatHash and the flat layout scheme should work and was carried forward into a reimplementation in trino#25127.

Summary of Changes

  1. Reduce the flat memory layout width for variable width types from 16 to 4 bytes per entry
  2. Reduce the expansion factor of VariableWidthData from 2x to 1.5x
  3. Reduce the variable width pointer from 12 to 8 bytes per entry
  4. Move the fixed size portion of each FlatHash entry from the hash table to a dense lookup table

A more detailed explanation of the changes is described in trino#25127

Example Memory Usage Comparison

For the same example scenario described in the previous gist, here is the memory consumption charts for the old vs new FlatGroupByHash implementation both overall and within the first 1M groups.

Note that since old implementation eagerly claimed the table memory, the data from the first 1M groups is a fairly sparse graph.

Old FlatGroupByHash image

Old FlatGroupByHash - First 1M Groups image Memory Usage at 1M groups: ~140.5MB

New FlatGroupByHash ("Minimal") image

New FlatGroupByHash - First 1M Groups image Memory Usage at 1M groups: ~53.7MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment