Skip to content

Instantly share code, notes, and snippets.

@monadplus
Last active June 12, 2019 09:48
Show Gist options
  • Save monadplus/f15aee8f54363805d797af40140e9d80 to your computer and use it in GitHub Desktop.
Save monadplus/f15aee8f54363805d797af40140e9d80 to your computer and use it in GitHub Desktop.
Functional Data Structures

Exhaustive list of data structures here.

import cats.collection._

List of data structures: BitSet, Dequeue, Diet, Discrete, DisjointSets, Heap, ISet, AvlMap, PairingHeap, PartiallyOrderedSet, Range, AvlSet, TreeList

BitSet

A Bitset is a specialized type of set that tracks the Int values it contains: for each integer value, a BitSet uses a single bit to track whether the value is present (1) or absent (0). Bitsets are often sparse, since "missing" bits can be assumed to be zero.

Unlike scala's default immutable this BitSet does not do a full copy on each added value.

Interally the implementation is a tree. Each leaf uses an Array[Long] value to hold up to 2048 bits, and each branch uses an Array[BitSet] to hold up to 32 subtrees (null subtrees are treated as empty).

Bitset treats the values it stores as 32-bit unsigned values, which is relevant to the internal addressing methods as well as the order used by iterator.

The benchmarks suggest this bitset is MUCH faster than Scala's built-in bitset for cases where you may need many modifications and merges, (for example in a BloomFilter).

Binary heap (priority queues)

Binary tree based on Vladimir Kostyukov. The key stored in each node is either >= or <= to the keys in the node's children according to some total order.

Operations: add, remove (the min value), min, toList.

Dequeue

Double-ended queue. Offers a constant time cons, snoc and amortized constant time uncons and unsnoc'.

Operations: cons(add on left), snoc(add on right), uncons, unsnoc.

Diet (Discrete Interval Encoding Tree)

Diet is a binary searh tree where each node contains a set of continuous values. The stored type must have a total order.

Best case scenario: no holes in the stored set, so the tree contains only one node (a single interval). In this case, inserting, deleting, finding O(1).

Worst case scenario: hole of size one between each node and each node in fact a set on size one. The operations require O(n).

Operations: add, addRange, remove, removeRange, contains, constainsRange, (-) remove range, (&) intersection, (|) and (++) union, min, max, intervals, toList.

Haskell implementation here

Discrete (type class)

succ(x): successor value of x
pred(x): predecessor value of x
adj(i, j): succ(i) == j

Disjoin Sets

An Union-Find structure is a set of sets where the intersection of any two sets is empty.

DS = {S0, S1, ..., Sn} ∀i,j ∈ {0,1,...,n}, Sij = ∅

Fast unions ( average O(1)). Perfect tool for clustering algorithm such as calculating the connected components in a graph.

Operations: add, union (fusion two disjoin sets), find, toSets.

Each element starts as a disjoint set. A set with two or more elements is always the result of one or several union operations. A multi-element set is composed of set of just one element called components. The components are linked.

Forest of tree-sets.

Reduce the depth of each tree (rank) representing a set (the lower the tree height, the fewer operations are needed to find one element). Two heuristics:

  • Union by rank: avoid incrementing maxh
  • Path compression

Set (Extensional Set)

Set is a tree-based set which stores Orderable elements in AVL balanced binary tree.

ISet (Intensional Set)

Membership in a ISet is defined by a predicate function that returns true for any value which is member of the set.

Range

Range [x, y] that can be generate by using discrete operations Discrete.

Operations: start, end, toList, contains(value), contains(range), reverse, (-) difference with Range, toString.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment