Skip to content

Instantly share code, notes, and snippets.

@rapiz1
Last active August 26, 2020 03:22
Show Gist options
  • Save rapiz1/b480f0621cd93a0313773a8aa452e345 to your computer and use it in GitHub Desktop.
Save rapiz1/b480f0621cd93a0313773a8aa452e345 to your computer and use it in GitHub Desktop.
GSoC 2020 - Chapel - Notes on Data Structure

DS Libraries in other languages

Rust

std::collections

C++

Boost.Intrusive

Boost DS

STL Container

Java

Appache Common Collections

Go

src/containers

Proposed DS for Chapel

Heap

Heap, a.k.a. priority_queue, is a very common data structure in most languages.

In other langauges

LANG LIB
C++ std::priority_queue, Boost.Heap
Python Lib heapq.py
Rust binary_heap
Go /src/container/heap

Use Case

Dijkstra's algorithm for finding the shortest path in weighted graph

Interface

chapel-lang/chapel#15667

SearchTree.Treap / TreeMap

A randomlized serach tree. Fast in practice.

In other langauges

LANG LIB
C++ Boost.Intrusive.treap

vs other balanced tree

chapel-lang/chapel#15658 (comment)

Interface

Same as Skip List

Another record, TreapMap, same as Map but with Treap as the underlying structure

SkipList / SkipListMap

Red-Black Tree alternative.

Used in many popular projects like LevelDB, Reddis, Bigtable

vs Red-Black Tree

SkipList Red-Black Tree
Larger constant factor, but fast in practice
Easy to implement Very difficut to implement
Could be lock-free Need a lock
With expected time complexity, not stable With upper bound complexity

And Skip List will on expectation use less space than all balanced trees.

In practice, it's fairly fast

So between Skip List and Treap, it's a time-space trade off.

Interface

  • eltType
  • record comparator
  • size
  • insert(x: eltType): insert an element.
  • contains(x: eltType): bool: return if x is in the list
  • lower_bound(x: eltType, out result: eltType): bool same as std::lower_bound(), return false if no occurence.
  • upper_bound(x: eltType, out result: eltType): bool same as std::upper_bound(), return false if no occurence.
  • remove(x: eltType): bool: remove x in the list. Does nothing if not present
  • these(): iterate over the list
  • this(idx: int): return k-th element in the sorted sequence, throw an error if out of range
  • kth(idx: int): return k-th element in the sorted sequence, throw an error if out of range
  • toArray: convert to an array
  • clear()
  • predecessor(x: eltType, out result: eltType): bool return the predecessor of one element, return false if it's the first one.
  • successor(x: eltType, out result: eltType): bool return the successor of one element, return false if it's the last one.

SkipListMap same as Map but with SkipList as the underlying structure

Vector

A vector-based list is a list that store elements in continuous area. Vector can improve cache performane and have smaller constant factor.

Whether it's faster than list depends on the operation pattern and ratio. Vector is faster at accessing but slower at inserting than List.

With properly reserve, Vector can be much faster than List.

In other langauges

LANG LIB
C++ std::vector
Rust Vec

vs List.chpl

Vector List.chpl
Better cache performance
Friendly for compiler optimization and prefetch
Expensive copy penalty for appending No copy penalty for appending
No memory overhead Little memory overhead
Fast random access Slower random access since calling log2

Note

After digging into the source of List.chpl, I don't think it's a good idea to put them together because they have a very different logic.

However, an identical interface is possible.

Interface

same as List

Unrolled Linked List

No sight of it in other langauges or popular libraries.

The use case would be on a machine whose memory is small and cahce miss is expensive. Users can trade off between time and space by specifying the number of elements in one node.

vs Linked List

Unrolled Normal
Less memory overhead
Slower insertion and deletion in the same node
Better cache performance

Notes

I think it could be more useful if support slicing/linking nodes.

Not proposed

Interval Tree

No sight of it in other languages or popular libraries.

As for the use case, I have saw one of its variants used in algorithm contest, but haven't saw it elsewhere. Maybe users would like to have more control over it when solving a algorithm problem. So I suppose it should be implemented by users themselves.

I wonder are there other use cases and is Interval Tree module necessary?

Bouns

I would like to get back to this if I have spare time after completing all above.

Trie

In other langauges

LANG LIB
Java org.apache.commons.collections4.trie

Red-Black Tree

A search tree. Similar to Treap.

CircularBuffer

A queue based on an fixed length array.

In other languages

LANG LIB
C++ Boost.CircularBuffer

Use case

  • Need a queue, whose maximal length is known.
  • Need random access

vs Linked List

Buffer LinkedList
Fixed space. Can't resize Dynamic space
Faster
Better cache performance
Random access Sequential access
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment