Taxonomy: the conception, naming, and classification of organism groups.
Ontology: ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about entities. Ontologies are the structural frameworks for organizing information
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
ELECTRONICS
/ \
TELEVISIONS PORTABLE ELECTRONICS
/ | \ / | \
TUBE LCD PLASMA MP3 PLAYERS CD PLAYERS 2 WAY RADIOS
|
FLASH
Retrieving the full Tree:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as levement an4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';
You will notice that you need to know the tree depth
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
+-------------+----------------------+--------------+-------+
- Difficult
- To get the full path, you need to know the level where it resides
- No consistency check when deleting node => client side code OR stored procedures
Looking at the tree as sets
+--------------------------------------------------------------------------------------------------+
| ELECTRONICS |
| +-----------------------------+ +-----------------------------------------------------------+ |
| | TELEVISIONS | | PORTABLE ELECTRONICS | |
| | +------+ +-----+ +--------+ | | +---------------+ +--------------+ +------------------+ | |
| 1 |2|3TUBE4| |5LCD6| |7PLASMA8|9| |10|11MP3 PLAYERS14| |15CD PLAYERS16| |(17)2 WAY RADIOS18|19|20|
| | +------+ +-----+ +--------+ | | | +---------+ | +--------------+ +------------------+ | |
| +-----------------------------+ | | |12FLASH13| | | |
| | | +---------+ | | |
| | +---------------+ | |
| +-----------------------------------------------------------+ |
+--------------------------------------------------------------------------------------------------+
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 20 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
+-------------+----------------------+-----+-----+
We define bounds of sets using numbers so we can traverse from left to right easily. This approach is called the modified preorder tree traversal algorithm.
Retrieve the whole tree.
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
Nested Set gives a greater flexibility to manipulate the tree and extract information from it. This technic exists has been around for over a decade and is described in many books. However, each time you add a parent node in the tree, you have to compute all the left and right numbers.
Top
/ | \
Science Hobbies Collections
/ | \
Astronomy Amateurs_Astronomy Pictures
/ \ |
Astrophysics Cosmology Astronomy
/ | \
Galaxies Stars Astronauts
ltree representation
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING gist(path);
CREATE INDEX path_idx ON test USING btree(path);
inheritance
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
path matching:
ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
full text search
ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
And a lot of tree related subfunctions. It is very simple to use but the querying is a bit different from what we are used to.
Neo4j let you create an oriented graph with typed relationships.
The category problem
ROOT
|
|
\|/
SUBROOT
|
|
\|/
Name: ELECTRONICS
|
| SUBCATEGORY
|
\|/
Name: COMPUTERS
/ \
/ \
SUBCATEGORY SUBCATEGORY
/ \
|/ \|
Name: Desktop Name: Laptops
Here you have to type relationship. Not sure we have typed relationship for ontology service.
db.categories.insert( { _id: "MongoDB", children: [] } )
db.categories.insert( { _id: "Postgres", children: [] } )
db.categories.insert( { _id: "Databases", children: [ "MongoDB", "Postgres" ] } )
db.categories.insert( { _id: "Languages", children: [] } )
db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } )
db.categories.insert( { _id: "Books", children: [ "Programming" ] } )
To get immediate childre:
db.categories.findOne( { _id: "Databases" } ).children
Find the parent:
db.categories.find( { children: "MongoDB" } )
The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.
db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
db.categories.insert( { _id: "Postgres", parent: "Databases" } )
db.categories.insert( { _id: "Databases", parent: "Programming" } )
db.categories.insert( { _id: "Languages", parent: "Programming" } )
db.categories.insert( { _id: "Programming", parent: "Books" } )
db.categories.insert( { _id: "Books", parent: null } )
The query to retrieve the parent of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).parent
You can query by the parent field to find its immediate children nodes:
db.categories.find( { parent: "Databases" } )
The Parent Links pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.
db.categories.insert( { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } )
db.categories.insert( { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" } )
db.categories.insert( { _id: "Programming", ancestors: [ "Books" ], parent: "Books" } )
db.categories.insert( { _id: "Books", ancestors: [ ], parent: null } )
The query to retrieve the ancestors or path of a node is fast and straightforward:
db.categories.findOne( { _id: "MongoDB" } ).ancestors
You can query by the ancestors to find all its descendants:
db.categories.find( { ancestors: "Programming" } )
The Array of Ancestors pattern provides a fast and efficient solution to find the descendants and the ancestors of a node by creating an index on the elements of the ancestors field. This makes Array of Ancestors a good choice for working with subtrees.
If you modify your array, you have to update all nodes.
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
http://www.postgresql.org/docs/9.0/static/ltree.html
http://blog.neo4j.org/2010/03/modeling-categories-in-graph-database.html
http://docs.mongodb.org/manual/tutorial/model-tree-structures/