find
insert
delete
- Each has time complexity proportional to depth of the tree.
- Ideally tree is
O(log n)
depth. That is average case, but worst isO(n)
. Will discuss later.
- All values on the left side of a vertex are less than the vertex's value.
- All values on the right side of the vertex are greater than the vertex's value.
- This rule applies to all descendants, not just children.
@value
@left
@right
@parent
- Sometimes convenient, but more often unnecessary.
- Often don't need because of recursion.
- Disallows "persistent" trees (more later).
- Defined recursively.
- Pretty much all tree algorithms are recursive.
- Typically should be able to handle
nil
as a vertex.
- If vertex is
nil
, returnnil
(not found). - Else, compare
value
tovertex.value
. - If matches, return
vertex
. - If
value
is less, move left. Else move right.
- I would like to avoid mutating
@left
or@right
. I would like myVertex
class to be immutable. This can be really nice. - This is the default for languages like Haskell.
- Data structures which are never mutated in-place are called persistent.
- One advantage of persistent data structures is for concurrency.
- Regardless it's often harder to make mistakes with persistent structures.
- But they are often practically slower than in-place approaches.
insert
will always return a new root tree node for a new tree with the value inserted.- If
vertex
isnil
, make a new tree node with the@value
set and@left = @right = nil
. Return the new node. - Else, if
vertex.value > value
, then recurse onvertex.left
. - When the recursive call completes and returns a new left child
vertex, create a new parent vertex with the
@value = vertex.value
,@left = new_left_vertex
, and@right = vertex.right
- This is called path-copying.
- Same idea. If
vertex.nil?
returnnil
. - If the
vertex.value > value
, recurse left (else right). - After, create a new vertex with the returned new left (right) child.
- If
vertex.value == value
AND has no children, returnnil
. - Else, randomly choose left (or right). Find max (or min value) in
that subtree. Call this
max_value
(min_value
). - Recursively call
delete(vertex.left, max_value)
. - Make a new node here with
@value = max_value
, and@left
set to new left subtree.
- We know that each level
i
can individually store2 ** (i - 1)
values. - How many values can a tree with
k
levels store overall? - That is, what is
\sum 2**i
fori
from 1 tok
? - I immediately say
2**k - 1
. - What is the largest three digit decimal number? 999. Right?
- The first four digit number
10**4
is the first that can not be stored in three digits.
- If you use self-balancing, is worst case
O(log n)
. - No amortization from resizing. More responsive.
- Doesn't rely on hashing function.
- BTW, hashing function really is
O(log n)
if the hash looks at all the bits of the value. - Persistent.
- Most of all: in-order traversal and range queries.
- If
vertex.nil?
return[]
. - If
min < vertex.value
, recurse on the left. - If
vertex.value < max
, add invertex.value
. - if
vertex.value < max
, recurse on the right. O(n)
.
- Keep filesystem in a BST, list all subfiles.
- Keep a dictionary in a BST, return all the words that start with the letter "b".
- Keep Google Calendar events sorted by start-time.
- Let's say Amazon wants to find the products you and I have both viewed in the past. We've both looked at many many items, though our intersection of items may be either large or small.
- Simplest way: my
m
items are in an unsorted array, yourn
items are in an unsorted array. We do anO(m * n)
nested-loops join. - Better: if our
m
items are sorted byitem_id
, we can do a single pass through each of the arrays.O(m + n)
time. - We want to interactively be able to insert/delete. Anywhere we want an interactive sorted array, think BST.
- Sort-Merge is great if data is already sorted. Otherwise, sorting is
O(n log n)
. - Faster to just build a hash table from one list and do lookup from the other.
- That hits a wall if the relations to merge don't fit in memory.
- Self-Balancing Trees: AVL and Red-Black Trees
- B-Tree: a disk-friendly, self-balancing tree used for databases and file-systems.
- Trie (pronounced "try"): stores prefixes.