Initial observations about FlatGroupByHash
memory usage compared to the previous MultiChannelGroupByHash
that was removed
in Trino 427 are documented in this gist. Those observations
prompted a deeper investigation and discussion how FlatHash
and the flat layout scheme should work and was carried forward
into a reimplementation in trino#25127.
- Reduce the flat memory layout width for variable width types from 16 to 4 bytes per entry
- Reduce the expansion factor of
VariableWidthData
from 2x to 1.5x - Reduce the variable width pointer from 12 to 8 bytes per entry
- Move the fixed size portion of each
FlatHash
entry from the hash table to a dense lookup table
A more detailed explanation of the changes is described in trino#25127
For the same example scenario described in the previous gist,
here is the memory consumption charts for the old vs new FlatGroupByHash
implementation both overall and within the first 1M groups.
Note that since old implementation eagerly claimed the table memory, the data from the first 1M groups is a fairly sparse graph.
Old FlatGroupByHash - First 1M Groups
Memory Usage at 1M groups: ~140.5MB
New FlatGroupByHash ("Minimal")
New FlatGroupByHash - First 1M Groups
Memory Usage at 1M groups: ~53.7MB