idea · GitHub

The strategy is about representing the initial state of a file, adding patches (which add changes to a file), and the reconstruction of a file from a given patch's ancestor tree is about calculating offsets.

A rough schema looks like...

 // the initial file
    CREATE TABLE IF NOT EXISTS files (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      directory INTEGER CHECK( directory in (0, 1) ) NOT NULL DEFAULT 0,
      content BLOB DEFAULT NULL,
      added_on INTEGER NOT NULL,
      deleted_on INTEGER
    );

    CREATE TABLE IF NOT EXISTS patches(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      parent_id INTEGER,
      summary TEXT,
      FOREIGN KEY(parent_id) REFERENCES patch(id)
    )

    CREATE TABLE IF NOT EXISTS changes(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      file_id INTEGER NOT NULL,
      patch_id INTEGER,
      type TEXT CHECK( type IN ('insert','delete', 'edit') ) NOT NULL DEFAULT 'insert',
      new_content BLOB,
      FOREIGN KEY(file_id) REFERENCES files(id)
      FOREIGN KEY(patch_id) REFERENCES patch(id)
    )

    CREATE TABLE IF NOT EXISTS offsets (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      change_id INTEGER NOT NULL,
      offset INTEGER NOT NULL,
      size INTEGER NOT NULL,
      FOREIGN KEY(change_id) REFERENCES changes(id)
    )

Let's say our initial file looks like a grocery list

grapes
cookies
coffee
tea

An initial commit would add this files content's to the files table, and insert the change into the patches table

Nothing is inserted into the other tables yet, but let's do our first change!

grapes
cookies
BEER
coffee
tea

We added BEER to the file, which a myers diff will tell us, but more importantly, we added it at offset 15 and it has a size of 4

This is the only file being patched, so we insert a new patch, and then update the changes and offsets table to include a BEER insert at offset 15 with a size of 4

then we add another patch which does 2 more things

insert HARD LIQUOR at offset 15 with a size 11
insert advil at offset 42 with a size of 5

now how do we reconstruct this file based on the patches? since each patch can have a parent that is also a patch, we recursively fetch the ancestor changes from a given patch WHILE filtering out anything older that is overlapped by anything newer.

WITH RECURSIVE 
ancestors(id, parent_id, summary, offset, size, type, content) AS (
  SELECT p.id, p.parent_id, p.summary, o.offset, o.size, c.type, c.new_content
  FROM patches as p
  inner join changes as c on p.id = c.patch_id
  inner JOIN offsets as o on o.change_id = c.id
  WHERE p.id = <the current patch>
  UNION ALL  
  SELECT a.id, a.parent_id, a.summary, ao.offset, ao.size, ac.type, ac.new_content
  FROM patches as a
  inner join changes as ac on a.id = ac.patch_id
  inner JOIN offsets as ao on ao.change_id = ac.id
  JOIN ancestors as aa on a.id = aa.parent_id
  where ((ao.offset <= aa.offset AND ao.offset + ao.size > aa.offset) AND (ao.offset + ao.size >= aa.offset + aa.size)) || NOT (ao.offset + ao.size > aa.offset)
) select * from ancestors

so what ends up coming back is the following changes

insert HARD LIQUOR at offset 15 with size 11
insert advil at offset 42 with a size of 5

then going linearally in order of the offsets, we reconstruct the file based on the type of change (insert, edit, or deletion)

Notice that BEER doesn't come up in the list of changes

Now one thing I think is interesting is that a branch in this model just means that there are 2 different patch histories with a common parent in there, so merging them, and identifying conflicts should have everything to do with locating intersecting offsets.

Conflicts could be stored in a separate table for a fault tolerant VCS, and since we are using offsets, conflicts should be refined down to the letter.

Note: if I remove the HARD LIQUOR patch, it would still return with the BEER patch for reconstruction of the file, and if I remove the BEER patch, it would still compute with the HARD LIQUOR patch, no commutation needed.

skinnyjames/idea.md

skinnyjames commented Jun 18, 2025

Uh oh!