Skip to content

Instantly share code, notes, and snippets.

@toff63
Last active December 11, 2015 17:19
Show Gist options
  • Save toff63/4633787 to your computer and use it in GitHub Desktop.
Save toff63/4633787 to your computer and use it in GitHub Desktop.

Hierarchical data structure in database

Definition

Taxonomy: the conception, naming, and classification of organism groups.

Ontology: ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about entities. Ontologies are the structural frameworks for organizing information

Relational Database

The Adjacent List Model

Table Structure

    CREATE TABLE category(
            category_id INT AUTO_INCREMENT PRIMARY KEY,
            name VARCHAR(20) NOT NULL,
            parent INT DEFAULT NULL
    );

    +-------------+----------------------+--------+
    | category_id | name                 | parent |
    +-------------+----------------------+--------+
    |           1 | ELECTRONICS          |   NULL |
    |           2 | TELEVISIONS          |      1 |
    |           3 | TUBE                 |      2 |
    |           4 | LCD                  |      2 |
    |           5 | PLASMA               |      2 |
    |           6 | PORTABLE ELECTRONICS |      1 |
    |           7 | MP3 PLAYERS          |      6 |
    |           8 | FLASH                |      7 |
    |           9 | CD PLAYERS           |      6 |
    |          10 | 2 WAY RADIOS         |      6 |
    +-------------+----------------------+--------+

                  ELECTRONICS
                 /          \ 
        TELEVISIONS          PORTABLE ELECTRONICS
        /    |    \                /       |         \
     TUBE  LCD   PLASMA   MP3 PLAYERS    CD PLAYERS   2 WAY RADIOS
                              |
                            FLASH

Query

Retrieving the full Tree:

    SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as levement an4
    FROM category AS t1
    LEFT JOIN category AS t2 ON t2.parent = t1.category_id
    LEFT JOIN category AS t3 ON t3.parent = t2.category_id
    LEFT JOIN category AS t4 ON t4.parent = t3.category_id
    WHERE t1.name = 'ELECTRONICS';

You will notice that you need to know the tree depth

    +-------------+----------------------+--------------+-------+
    | lev1        | lev2                 | lev3         | lev4  |
    +-------------+----------------------+--------------+-------+
    | ELECTRONICS | TELEVISIONS          | TUBE         | NULL  |
    | ELECTRONICS | TELEVISIONS          | LCD          | NULL  |
    | ELECTRONICS | TELEVISIONS          | PLASMA       | NULL  |
    | ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS  | FLASH |
    | ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS   | NULL  |
    | ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL  |
    +-------------+----------------------+--------------+-------+

Limitations

  • Difficult
  • To get the full path, you need to know the level where it resides
  • No consistency check when deleting node => client side code OR stored procedures

The Set Nested Model

Looking at the tree as sets

    +--------------------------------------------------------------------------------------------------+
    |                                      ELECTRONICS                                                 |
    |   +-----------------------------+ +-----------------------------------------------------------+  |
    |   |      TELEVISIONS            | |     PORTABLE ELECTRONICS                                  |  |
    |   | +------+ +-----+ +--------+ | |  +---------------+ +--------------+ +------------------+  |  |
    | 1 |2|3TUBE4| |5LCD6| |7PLASMA8|9| |10|11MP3 PLAYERS14| |15CD PLAYERS16| |(17)2 WAY RADIOS18|19|20|
    |   | +------+ +-----+ +--------+ | |  | +---------+   | +--------------+ +------------------+  |  |
    |   +-----------------------------+ |  | |12FLASH13|   |                                        |  |
    |                                   |  | +---------+   |                                        |  |
    |                                   |  +---------------+                                        |  |
    |                                   +-----------------------------------------------------------+  |
    +--------------------------------------------------------------------------------------------------+

Table Structure

    +-------------+----------------------+-----+-----+
    | category_id | name                 | lft | rgt |
    +-------------+----------------------+-----+-----+
    |           1 | ELECTRONICS          |   1 |  20 |
    |           2 | TELEVISIONS          |   2 |   9 |
    |           3 | TUBE                 |   3 |   4 |
    |           4 | LCD                  |   5 |   6 |
    |           5 | PLASMA               |   7 |   8 |
    |           6 | PORTABLE ELECTRONICS |  10 |  19 |
    |           7 | MP3 PLAYERS          |  11 |  14 |
    |           8 | FLASH                |  12 |  13 |
    |           9 | CD PLAYERS           |  15 |  16 |
    |          10 | 2 WAY RADIOS         |  17 |  18 |
    +-------------+----------------------+-----+-----+

We define bounds of sets using numbers so we can traverse from left to right easily. This approach is called the modified preorder tree traversal algorithm.

Query

Retrieve the whole tree.

SELECT node.name
FROM nested_category AS node,
     nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
  AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;

Conclusion

Nested Set gives a greater flexibility to manipulate the tree and extract information from it. This technic exists has been around for over a decade and is described in many books. However, each time you add a parent node in the tree, you have to compute all the left and right numbers.

PostgreSQL ltree structure

                            Top
                         /   |  \
                 Science Hobbies Collections
                     /       |              \
            Astronomy   Amateurs_Astronomy Pictures
               /  \                            |
    Astrophysics  Cosmology                Astronomy
                                            /  |    \
                                     Galaxies Stars Astronauts

ltree representation

CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING gist(path);
CREATE INDEX path_idx ON test USING btree(path);

inheritance

    ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
                    path
    ------------------------------------
     Top.Science
     Top.Science.Astronomy
     Top.Science.Astronomy.Astrophysics
     Top.Science.Astronomy.Cosmology
    (4 rows)

path matching:

    ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*';
                         path
    -----------------------------------------------
     Top.Science.Astronomy
     Top.Science.Astronomy.Astrophysics
     Top.Science.Astronomy.Cosmology
     Top.Collections.Pictures.Astronomy
     Top.Collections.Pictures.Astronomy.Stars
     Top.Collections.Pictures.Astronomy.Galaxies
     Top.Collections.Pictures.Astronomy.Astronauts
    (7 rows)

    ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*';
                    path
    ------------------------------------
     Top.Science.Astronomy
     Top.Science.Astronomy.Astrophysics
     Top.Science.Astronomy.Cosmology
    (3 rows)

full text search

    ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@';
                    path
    ------------------------------------
     Top.Science.Astronomy
     Top.Science.Astronomy.Astrophysics
     Top.Science.Astronomy.Cosmology
     Top.Hobbies.Amateurs_Astronomy
    (4 rows)

And a lot of tree related subfunctions. It is very simple to use but the querying is a bit different from what we are used to.

Graph Database: Neo4j

Neo4j let you create an oriented graph with typed relationships.

The category problem

                               ROOT
                                 |
                                 |
                                \|/
                              SUBROOT
                                 |                  
                                 |
                                \|/
                        Name: ELECTRONICS
                                 |
                                 | SUBCATEGORY
                                 |
                                \|/
                        Name: COMPUTERS
                              /    \
                             /      \
                    SUBCATEGORY   SUBCATEGORY
                           /          \
                         |/            \|
                    Name: Desktop    Name: Laptops

Here you have to type relationship. Not sure we have typed relationship for ontology service.

MongoDB

Model tree with child references

db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "Postgres", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "Postgres" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )

To get immediate childre:

db.categories.findOne( { _id: "Databases" } ).children

Find the parent:

db.categories.find( { children: "MongoDB" } )

The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.

Model Tree Structures with Parent References

db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "Postgres", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )

The query to retrieve the parent of a node is fast and straightforward:

db.categories.findOne( { _id: "MongoDB" } ).parent

You can query by the parent field to find its immediate children nodes:

db.categories.find( { parent: "Databases" } )

The Parent Links pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.

Model Tree Structures with an Array of Ancestors

db.categories.insert( { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Programming", ancestors: [ "Books" ], parent: "Books" } )
db.categories.insert( { _id: "Books", ancestors: [ ], parent: null } )

The query to retrieve the ancestors or path of a node is fast and straightforward:

db.categories.findOne( { _id: "MongoDB" } ).ancestors

You can query by the ancestors to find all its descendants:

db.categories.find( { ancestors: "Programming" } )

The Array of Ancestors pattern provides a fast and efficient solution to find the descendants and the ancestors of a node by creating an index on the elements of the ancestors field. This makes Array of Ancestors a good choice for working with subtrees.

If you modify your array, you have to update all nodes.

References

http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

http://www.postgresql.org/docs/9.0/static/ltree.html

http://blog.neo4j.org/2010/03/modeling-categories-in-graph-database.html

http://docs.mongodb.org/manual/tutorial/model-tree-structures/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment