The essential idea behind a high-level-graph is this: it's a lazy mapping which can produce low-level Dask task graphs on demand. Until these low-level tasks are produced (called "materialization"), they are a couple of advantages:
- They allow for higher level reasoning about graph structure, including optimizations that would be challenging or impossible once the graph is represented by many low-level tasks.
- They can be used to produce only the necessary keys for a full computation. That is, later operations like slicing can feed back into previous HLG layers and allow them to not produce tasks which won't be needed (called HLG culling). This can be a significant time and memory saving process.
- They can be much cheaper to serialize and communicate than low level task graphs.
However, HLG Layers have proven difficult to write. Broadly speaking, these difficulties have been for two reasons: algorithmic (specifically regarding culling) and serializability.