-
-
Save md2perpe/1886070 to your computer and use it in GitHub Desktop.
For binary trees I've found a variant of nested sets. | |
The idea is to let | |
- the root node be represented by the interval [0, 1], | |
- the two children of the root node be represented by the intervals [0, 1/2] and [1/2, 1] respectively, | |
- the grandchild nodes by [0, 1/4], [1/4, 1/2], [1/2, 3/4] and [3/4, 1]. | |
Then it's easy to find: | |
- the left and right children (one or both), | |
- all descendents of a node, | |
- all ancestors of a node, | |
- all nodes where you only go left (or right) from a given node. |
I wouldn't use a floating-point value. Instead I would use fractions, represented by a pair of integers.
The fractions scheme doesn't work. The numbers need to be unique to a node.
How do you mean? Aren't the pairs of fractions unique to a node?
USE test;
DROP TABLE IF EXISTS broken_ns;
CREATE TABLE broken_ns (
id CHAR(1),
ln INT COMMENT 'left numerator',
ld INT COMMENT 'left denominator',
rn INT COMMENT 'right numerator',
rd INT COMMENT 'right denominator'
);
/* Example tree can be represented as: A(B(D,E),C(F,G))
That is, A has children B and C.
B has children D and E.
C has children F and G. */
INSERT INTO broken_ns (id, ln, ld, rn, rd) VALUES
('A', 0,1, 1,1), -- root
('B', 0,1, 1,2), -- A/B
('C', 1,2, 1,1), -- A/C
('D', 0,1, 1,4), -- A/B/D
('E', 1,4, 1,2), -- A/B/E
('F', 1,2, 3,4), -- A/C/F
('G', 3,4, 1,1); -- A/C/G
SELECT root.id, child.* FROM broken_ns AS root
JOIN broken_ns AS child ON child.ln/child.ld BETWEEN root.ln/root.ld AND root.rn/root.rd
WHERE root.id = 'A';
/*This first query seems to return all descendants of A all right.
+------+------+------+------+------+------+
| id | id | ln | ld | rn | rd |
+------+------+------+------+------+------+
| A | A | 0 | 1 | 1 | 1 |
| A | B | 0 | 1 | 1 | 2 |
| A | C | 1 | 2 | 1 | 1 |
| A | D | 0 | 1 | 1 | 4 |
| A | E | 1 | 4 | 1 | 2 |
| A | F | 1 | 2 | 3 | 4 |
| A | G | 3 | 4 | 1 | 1 |
+------+------+------+------+------+------+ */
SELECT root.id, child.* FROM broken_ns AS root
JOIN broken_ns AS child ON child.ln/child.ld BETWEEN root.ln/root.ld AND root.rn/root.rd
WHERE root.id = 'B';
/* But the second query of a subtree incorrectly includes F, a "nephew" of B.
It also incorrectly includes A, the parent of B.
+------+------+------+------+------+------+
| id | id | ln | ld | rn | rd |
+------+------+------+------+------+------+
| B | A | 0 | 1 | 1 | 1 |
| B | B | 0 | 1 | 1 | 2 |
| B | C | 1 | 2 | 1 | 1 |
| B | D | 0 | 1 | 1 | 4 |
| B | E | 1 | 4 | 1 | 2 |
| B | F | 1 | 2 | 3 | 4 |
+------+------+------+------+------+------+ */
/* Conclusion: in the nested sets design, each number (or fraction in this case) must be used by only one node, so we don't get overlaps. */
I consider your queries to be broken. A condition on the upper end is also needed.
Show an example of a query that works, with output.
/* The child's interval should be contained in the parent's interval */
SELECT root.id, child.*
FROM broken_ns AS root
JOIN broken_ns AS child
ON child.ln/child.ld >= root.ln/root.ld
AND child.rn/child.rd <= root.rn/root.rd
WHERE root.id = 'B';
/*
+------+------+------+------+------+------+
| id | id | ln | ld | rn | rd |
+------+------+------+------+------+------+
| B | B | 0 | 1 | 1 | 2 |
| B | D | 0 | 1 | 1 | 4 |
| B | E | 1 | 4 | 1 | 2 |
+------+------+------+------+------+------+
3 rows in set (0.01 sec)
*/
Very nice!
With the more commonly used design for Nested Sets, it's not necessary to use both left and right; you can just use BETWEEN like I showed.
Note that your design can't use indexes as effectively, both because of the division expressions, and because you use both left and right in inequality comparisons. It will get the right answer, but it won't perform well if the dataset gets large.
I found this variant when helping another developer who wanted to fetch the left-left-left-... and right-right-right-... paths from a node. He was satisfied with just a loop construct, but I wanted to look for a solution where I can find all nodes in a single select.
Now I realise that it can be done also with "normal" nested sets, but it's not as obvious. And "normal" nested sets has a weakness that you have to update a lot of nodes when inserting a new node. You don't have to do that with my solution.
The difficulties caused by the division expressions can be avoided by multiplying all fractions with a power of two. Instead of letting the root node be identified with [0, 1], it can be identified with for example [0, 256]. The next level will then be [0, 128] and [128, 256]. And so on. You will then get inequalities like child.left >= root.left
and child.right <= root.right
that can use an index.
Yes, one could use a floating-point value instead of integers. Works for n-ary trees as well as binary trees. It means you can insert branches and nodes without having to renumber the rest of the tree (at least as long as your floating-point precision lasts). But like integer-based nested sets, it still fails to support referential integrity.