The strategy is about representing the initial state of a file, adding patches (which add changes to a file), and the reconstruction of a file from a given patch's ancestor tree is about calculating offsets.
A rough schema looks like...
// the initial file
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY AUTOINCREMENT,
directory INTEGER CHECK( directory in (0, 1) ) NOT NULL DEFAULT 0,
content BLOB DEFAULT NULL,
added_on INTEGER NOT NULL,
deleted_on INTEGER
);
CREATE TABLE IF NOT EXISTS patches(
id INTEGER PRIMARY KEY AUTOINCREMENT,
parent_id INTEGER,
summary TEXT,
FOREIGN KEY(parent_id) REFERENCES patch(id)
)
CREATE TABLE IF NOT EXISTS changes(
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_id INTEGER NOT NULL,
patch_id INTEGER,
type TEXT CHECK( type IN ('insert','delete', 'edit') ) NOT NULL DEFAULT 'insert',
new_content BLOB,
FOREIGN KEY(file_id) REFERENCES files(id)
FOREIGN KEY(patch_id) REFERENCES patch(id)
)
CREATE TABLE IF NOT EXISTS offsets (
id INTEGER PRIMARY KEY AUTOINCREMENT,
change_id INTEGER NOT NULL,
offset INTEGER NOT NULL,
size INTEGER NOT NULL,
FOREIGN KEY(change_id) REFERENCES changes(id)
)
Let's say our initial file looks like a grocery list
grapes
cookies
coffee
tea
An initial commit would add this files content's to the files table, and insert the change into the patches table
Nothing is inserted into the other tables yet, but let's do our first change!
grapes
cookies
BEER
coffee
tea
We added BEER to the file, which a myers diff will tell us, but more importantly, we added it at offset 15 and it has a size of 4.
This is the only file being patched, so we insert a new patch, and then update the changes and offsets table to include a BEER insert at offset 15 with a size of 4 then we add another patch which does 2 more things
grapes
cookies
HARD LIQUOR
coffee
tea
advil
so now the new patch has 2 changes which look like...
insert HARD LIQUOR at offset 15 with a size 11
insert advil at offset 42 with a size of 5
now how do we reconstruct this file based on the patches?
since each patch can have a parent that is also a patch, we recursively fetch the ancestor changes from a given patch WHILE filtering out anything older that is overlapped by anything newer.
WITH RECURSIVE
ancestors(id, parent_id, summary, offset, size, type, content) AS (
SELECT p.id, p.parent_id, p.summary, o.offset, o.size, c.type, c.new_content
FROM patches as p
inner join changes as c on p.id = c.patch_id
inner JOIN offsets as o on o.change_id = c.id
WHERE p.id = 4 -- given base id value
UNION ALL
SELECT a.id, a.parent_id, a.summary, ao.offset, ao.size, ac.type, ac.new_content
FROM patches as a
inner join changes as ac on a.id = ac.patch_id
inner JOIN offsets as ao on ao.change_id = ac.id
JOIN ancestors as aa on a.id = aa.parent_id
where ((ao.offset <= aa.offset AND ao.offset + ao.size > aa.offset) AND (ao.offset + ao.size >= aa.offset + aa.size)) || NOT (ao.offset + ao.size > aa.offset)
) select * from ancestors
so what ends up coming back is the following changes
insert HARD LIQUOR at offset 15 with size 11
insert advil at offset 42 with a size of 5
then going linearally, we reconstruct the file based on the type of change (insert, edit, or deletion) Notice that BEER doesn't come up in the list of changes Now one thing I think is interesting is that a branch in this model just means that there are 2 patches with the same parent and merging them, and identifying conflicts has everything to do with locating intersecting offsets between 2 patches with the same parent and if I remove the HARD LIQUOR patch, it will still recompute with the BEER patch