md2perpe/gist:1886070

billkarwin · 2012-05-09T23:15:08Z

Yes, one could use a floating-point value instead of integers. Works for n-ary trees as well as binary trees. It means you can insert branches and nodes without having to renumber the rest of the tree (at least as long as your floating-point precision lasts). But like integer-based nested sets, it still fails to support referential integrity.

md2perpe · 2012-05-10T06:15:24Z

I wouldn't use a floating-point value. Instead I would use fractions, represented by a pair of integers.

billkarwin · 2012-05-10T15:23:24Z

The fractions scheme doesn't work. The numbers need to be unique to a node.

md2perpe · 2012-05-10T15:44:10Z

How do you mean? Aren't the pairs of fractions unique to a node?

billkarwin · 2012-05-10T20:56:41Z

USE test;
DROP TABLE IF EXISTS broken_ns;
CREATE TABLE broken_ns (
 id CHAR(1),
 ln INT COMMENT 'left numerator',
 ld INT COMMENT 'left denominator',
 rn INT COMMENT 'right numerator',
 rd INT COMMENT 'right denominator'
);

/* Example tree can be represented as: A(B(D,E),C(F,G))
That is, A has children B and C.
B has children D and E.
C has children F and G.  */

INSERT INTO broken_ns (id, ln, ld, rn, rd) VALUES
('A', 0,1, 1,1), -- root
('B', 0,1, 1,2), -- A/B
('C', 1,2, 1,1), -- A/C
('D', 0,1, 1,4), -- A/B/D
('E', 1,4, 1,2), -- A/B/E
('F', 1,2, 3,4), -- A/C/F
('G', 3,4, 1,1); -- A/C/G

SELECT root.id, child.* FROM broken_ns AS root
JOIN broken_ns AS child ON child.ln/child.ld BETWEEN root.ln/root.ld AND root.rn/root.rd
WHERE root.id = 'A';

/*This first query seems to return all descendants of A all right.
+------+------+------+------+------+------+
| id   | id   | ln   | ld   | rn   | rd   |
+------+------+------+------+------+------+
| A    | A    |    0 |    1 |    1 |    1 |
| A    | B    |    0 |    1 |    1 |    2 |
| A    | C    |    1 |    2 |    1 |    1 |
| A    | D    |    0 |    1 |    1 |    4 |
| A    | E    |    1 |    4 |    1 |    2 |
| A    | F    |    1 |    2 |    3 |    4 |
| A    | G    |    3 |    4 |    1 |    1 |
+------+------+------+------+------+------+ */

SELECT root.id, child.* FROM broken_ns AS root
JOIN broken_ns AS child ON child.ln/child.ld BETWEEN root.ln/root.ld AND root.rn/root.rd
WHERE root.id = 'B';

/* But the second query of a subtree incorrectly includes F, a "nephew" of B.
  It also incorrectly includes A, the parent of B.
+------+------+------+------+------+------+
| id   | id   | ln   | ld   | rn   | rd   |
+------+------+------+------+------+------+
| B    | A    |    0 |    1 |    1 |    1 |
| B    | B    |    0 |    1 |    1 |    2 |
| B    | C    |    1 |    2 |    1 |    1 |
| B    | D    |    0 |    1 |    1 |    4 |
| B    | E    |    1 |    4 |    1 |    2 |
| B    | F    |    1 |    2 |    3 |    4 |
+------+------+------+------+------+------+ */

/* Conclusion: in the nested sets design, each number (or fraction in this case) must be used by only one node, so we don't get overlaps. */

md2perpe · 2012-05-10T21:44:35Z

I consider your queries to be broken. A condition on the upper end is also needed.

billkarwin · 2012-05-10T22:47:02Z

Show an example of a query that works, with output.

md2perpe · 2012-05-11T07:30:29Z

/* The child's interval should be contained in the parent's interval */
SELECT root.id, child.* 
FROM broken_ns AS root
JOIN broken_ns AS child 
    ON  child.ln/child.ld >= root.ln/root.ld
    AND child.rn/child.rd <= root.rn/root.rd
WHERE root.id = 'B';

/*
+------+------+------+------+------+------+
| id   | id   | ln   | ld   | rn   | rd   |
+------+------+------+------+------+------+
| B    | B    |    0 |    1 |    1 |    2 |
| B    | D    |    0 |    1 |    1 |    4 |
| B    | E    |    1 |    4 |    1 |    2 |
+------+------+------+------+------+------+
3 rows in set (0.01 sec)
*/

billkarwin · 2012-05-11T17:34:53Z

Very nice!

With the more commonly used design for Nested Sets, it's not necessary to use both left and right; you can just use BETWEEN like I showed.

Note that your design can't use indexes as effectively, both because of the division expressions, and because you use both left and right in inequality comparisons. It will get the right answer, but it won't perform well if the dataset gets large.

md2perpe · 2012-05-11T21:11:43Z

I found this variant when helping another developer who wanted to fetch the left-left-left-... and right-right-right-... paths from a node. He was satisfied with just a loop construct, but I wanted to look for a solution where I can find all nodes in a single select.

Now I realise that it can be done also with "normal" nested sets, but it's not as obvious. And "normal" nested sets has a weakness that you have to update a lot of nodes when inserting a new node. You don't have to do that with my solution.

The difficulties caused by the division expressions can be avoided by multiplying all fractions with a power of two. Instead of letting the root node be identified with [0, 1], it can be identified with for example [0, 256]. The next level will then be [0, 128] and [128, 256]. And so on. You will then get inequalities like child.left >= root.left and child.right <= root.right that can use an index.

	For binary trees I've found a variant of nested sets.

	The idea is to let
	- the root node be represented by the interval [0, 1],
	- the two children of the root node be represented by the intervals [0, 1/2] and [1/2, 1] respectively,
	- the grandchild nodes by [0, 1/4], [1/4, 1/2], [1/2, 3/4] and [3/4, 1].

	Then it's easy to find:
	- the left and right children (one or both),
	- all descendents of a node,
	- all ancestors of a node,
	- all nodes where you only go left (or right) from a given node.

md2perpe/gist:1886070

billkarwin commented May 9, 2012

Uh oh!

md2perpe commented May 10, 2012

Uh oh!

billkarwin commented May 10, 2012

Uh oh!

md2perpe commented May 10, 2012

Uh oh!

billkarwin commented May 10, 2012

Uh oh!

md2perpe commented May 10, 2012

Uh oh!

billkarwin commented May 10, 2012

Uh oh!

md2perpe commented May 11, 2012

Uh oh!

billkarwin commented May 11, 2012

Uh oh!

md2perpe commented May 11, 2012

Uh oh!