- 株式会社スマートラウンドのプロダクトエンジニア
- 関数型言語/関数型プログラミングが好き
- RDB/SQLが好き
- プログラミングを始めて最初に好きになった言語がSQLだった
-
事前準備
-
隣接リスト(adjacency list)
-
経路列挙(path enumeration)
-
入れ子集合(nested sets)
-
閉包テーブル(closure table)
lagenorhynque/tree-representations-in-rdb
# 実験環境の準備
docker compose up -d
make migrate
- ローカルDB: MySQL 8.4
CREATE TABLE `department` (
`department_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`department_id`)
);
CREATE TABLE `department_adjacency_list` (
`department_id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`department_id`),
FOREIGN KEY (`department_id`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT,
FOREIGN KEY (`parent_id`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT
);
-
parent_id
: 親ノード -
本体データ用テーブルに含めても良い
WITH RECURSIVE department_tree AS (
SELECT d.*, dal.parent_id, 0 distance
FROM department d
JOIN department_adjacency_list dal USING (department_id)
WHERE dal.parent_id IS NULL
UNION ALL
SELECT d.*, dal.parent_id, dt.distance + 1
FROM department d
JOIN department_adjacency_list dal USING (department_id)
JOIN department_tree dt ON dal.parent_id =
dt.department_id
)
SELECT *
FROM department_tree;
distance
: ルートノードからの距離
cf. MySQL :: MySQL 8.4 Reference Manual :: 15.2.20 WITH (Common Table Expressions)
WITH RECURSIVE department_tree AS (
SELECT d.*, dal.parent_id, 0 distance
FROM department d
JOIN department_adjacency_list dal USING (department_id)
WHERE d.department_id = 10
UNION ALL
SELECT d.*, dal.parent_id, dt.distance + 1
FROM department d
JOIN department_adjacency_list dal USING (department_id)
JOIN department_tree dt ON dal.parent_id =
dt.department_id
)
SELECT *
FROM department_tree;
distance
: ノード10
からの距離
SELECT d.*
FROM department d
JOIN department_adjacency_list dal USING (department_id)
WHERE dal.parent_id = 10;
SELECT d.*
FROM department d
JOIN department_adjacency_list dal
ON d.department_id = dal.parent_id
WHERE dal.department_id = 10;
-
モデルとして単純(良くも悪くもナイーブ)
-
直近の親/子ノードに対するアクセスが容易
-
挿入操作が容易
-
参照整合性が保証できる
-
再帰CTE (recursive common table expressions)が使えないと任意階層に対するクエリが困難
-
削除操作が煩雑
-
parent_id
でNULL
が現れてしまう- ルートノード管理用のテーブルを用意することで回避可能
a.k.a. 経路実体化(materialized path)
CREATE TABLE `department_path_enumeration` (
`department_id` int(10) unsigned NOT NULL,
`path` varchar(1000) NOT NULL,
PRIMARY KEY (`department_id`),
FOREIGN KEY (`department_id`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT
);
-
path
: 先祖からそのノードまでの経路(パス) -
本体データ用テーブルに含めても良い
SELECT d.*, dpe.path,
CHAR_LENGTH(dpe.path) - CHAR_LENGTH(REPLACE(dpe.path, '/',
'')) - 1 depth
FROM department d
JOIN department_path_enumeration dpe USING (department_id);
depth
: ルートノードからの距離
SELECT d.*, dpe.path,
CHAR_LENGTH(dpe.path) - CHAR_LENGTH(REPLACE(dpe.path, '/',
'')) - 1 depth
FROM department d
JOIN department_path_enumeration dpe USING (department_id)
WHERE dpe.path LIKE CONCAT((
SELECT path
FROM department_path_enumeration
WHERE department_id = 10
), '%');
depth
: ルートノードからの距離
SELECT d.*
FROM department d
JOIN department_path_enumeration dpe USING (department_id)
WHERE dpe.path REGEXP CONCAT((
SELECT path
FROM department_path_enumeration
WHERE department_id = 10
), '\\d+/$');
SELECT d.*
FROM department d
JOIN department_path_enumeration dpe USING (department_id)
WHERE dpe.path = (
SELECT CASE WHEN path = CONCAT(department_id, '/') THEN
NULL
ELSE
REGEXP_REPLACE(path, '^(.+/)\\d+/$', '$1')
END
FROM department_path_enumeration
WHERE department_id = 10
);
-
子孫/先祖ノードに対するアクセスが容易
-
更新系操作が容易
-
パス文字列をアプリケーションコードで管理しなければならない
- 正規化されていない(第1正規形でさえない)
-
参照整合性が保証できない
cf. 入れ子区間(nested intervals)
CREATE TABLE `department_nested_sets` (
`department_id` int(10) unsigned NOT NULL,
`left` int(11) NOT NULL,
`right` int(11) NOT NULL,
`depth` int(10) unsigned NOT NULL,
PRIMARY KEY (`department_id`),
FOREIGN KEY (`department_id`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT
);
-
left
,right
: 数直線上の左右の点を表す -
depth
: ルートノードからの距離(必須ではない) -
本体データ用テーブルに含めても良い
SELECT d.*, dns.left, dns.right, dns.depth
FROM department d
JOIN department_nested_sets dns USING (department_id);
SELECT d.*, dns.left, dns.right, dns.depth
FROM department d
JOIN department_nested_sets dns USING (department_id)
JOIN department_nested_sets dns2 ON dns.left BETWEEN
dns2.left AND dns2.right
WHERE dns2.department_id = 10;
SELECT d.*
FROM department d
JOIN department_nested_sets dns USING (department_id)
JOIN department_nested_sets dns2 ON dns.left BETWEEN
dns2.left AND dns2.right
WHERE dns2.department_id = 10
AND dns.depth = (
SELECT depth
FROM department_nested_sets
WHERE department_id = 10
) + 1;
SELECT d.*
FROM department d
JOIN department_nested_sets dns USING (department_id)
JOIN department_nested_sets dns2 ON dns2.left BETWEEN
dns.left AND dns.right
WHERE dns2.department_id = 10
AND dns.depth + 1 = (
SELECT depth
FROM department_nested_sets
WHERE department_id = 10
);
-
子孫/先祖ノードに対するアクセスが容易
-
削除操作が容易
-
左右の点の値をアプリケーションコードで管理しなければならない
- 正規化されていない(第1正規形でさえない)
-
挿入操作が非常に煩雑でコストも高い
- 「入れ子区間」では改善される
-
参照整合性が保証できない
CREATE TABLE `department_closure_table` (
`ancestor` int(10) unsigned NOT NULL,
`descendant` int(10) unsigned NOT NULL,
`path_length` int(10) unsigned NOT NULL,
PRIMARY KEY (`ancestor`, `descendant`),
FOREIGN KEY (`ancestor`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT,
FOREIGN KEY (`descendant`)
REFERENCES `department` (`department_id`)
ON DELETE CASCADE ON UPDATE RESTRICT
);
-
ancestor
,descendant
: 先祖/子孫関係にあるノードの組 -
path_length
:ancestor
からdescendant
までの距離(必須ではない)
SELECT d.*, COUNT(*) - 1 depth
FROM department d
JOIN department_closure_table dct
ON d.department_id = dct.descendant
GROUP BY dct.descendant;
depth
: ルートノードからの距離MAX(dct.path_length)
でも求められる
SELECT d.*, dct.path_length distance, dct2.depth
FROM department d
JOIN department_closure_table dct
ON d.department_id = dct.descendant
JOIN (
SELECT descendant, COUNT(*) - 1 depth
FROM department_closure_table
GROUP BY descendant
) dct2
ON d.department_id = dct2.descendant
WHERE dct.ancestor = 10;
-
distance
: ノード10
からの距離 -
depth
: ルートノードからの距離
SELECT d.*
FROM department d
JOIN department_closure_table dct
ON d.department_id = dct.descendant
WHERE dct.ancestor = 10
AND dct.path_length = 1;
SELECT d.*
FROM department d
JOIN department_closure_table dct
ON d.department_id = dct.ancestor
WHERE dct.descendant = 10
AND dct.path_length = 1;
-
子孫/先祖ノードに対するアクセスが容易
-
更新系操作が容易
-
参照整合性が保証できる
- 他のモデルよりも保持すべきデータが多くなる
-
「閉包テーブル」は総合的に優れている
-
「隣接リスト」は最も素朴(ナイーブ)な設計だがデメリットもいくつかある
-
「経路列挙」「入れ子集合」は参照整合性が保証できない(RDBの強みが犠牲になる)
-
- 2章 ナイーブツリー(素朴な木)
-
- 第10章 グラフに立ち向かう
- 10.4 ツリー(木)
- 第10章 グラフに立ち向かう
-
- 第10章 階層的なデータ構造