Skip to content

Instantly share code, notes, and snippets.

@skinnyjames
Last active June 18, 2025 06:41
Show Gist options
  • Save skinnyjames/584d95e34a581e511a7e1cc81a706929 to your computer and use it in GitHub Desktop.
Save skinnyjames/584d95e34a581e511a7e1cc81a706929 to your computer and use it in GitHub Desktop.
idea

The strategy is about representing the initial state of a file, adding patches (which add changes to a file), and the reconstruction of a file from a given patch's ancestor tree is about calculating offsets.

A rough schema looks like...

 // the initial file
    CREATE TABLE IF NOT EXISTS files (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      directory INTEGER CHECK( directory in (0, 1) ) NOT NULL DEFAULT 0,
      content BLOB DEFAULT NULL,
      added_on INTEGER NOT NULL,
      deleted_on INTEGER
    );

    CREATE TABLE IF NOT EXISTS patches(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      parent_id INTEGER,
      summary TEXT,
      FOREIGN KEY(parent_id) REFERENCES patch(id)
    )

    CREATE TABLE IF NOT EXISTS changes(
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      file_id INTEGER NOT NULL,
      patch_id INTEGER,
      type TEXT CHECK( type IN ('insert','delete', 'edit') ) NOT NULL DEFAULT 'insert',
      new_content BLOB,
      FOREIGN KEY(file_id) REFERENCES files(id)
      FOREIGN KEY(patch_id) REFERENCES patch(id)
    )

    CREATE TABLE IF NOT EXISTS offsets (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      change_id INTEGER NOT NULL,
      offset INTEGER NOT NULL,
      size INTEGER NOT NULL,
      FOREIGN KEY(change_id) REFERENCES changes(id)
    ) 

Let's say our initial file looks like a grocery list

grapes
cookies
coffee
tea

An initial commit would add this files content's to the files table, and insert the change into the patches table

Nothing is inserted into the other tables yet, but let's do our first change!

grapes
cookies
BEER
coffee
tea

We added BEER to the file, which a myers diff will tell us, but more importantly, we added it at offset 15 and it has a size of 4

This is the only file being patched, so we insert a new patch, and then update the changes and offsets table to include a BEER insert at offset 15 with a size of 4

then we add another patch which does 2 more things

insert HARD LIQUOR at offset 15 with a size 11
insert advil at offset 42 with a size of 5

now how do we reconstruct this file based on the patches? since each patch can have a parent that is also a patch, we recursively fetch the ancestor changes from a given patch WHILE filtering out anything older that is overlapped by anything newer.

WITH RECURSIVE 
ancestors(id, parent_id, summary, offset, size, type, content) AS (
  SELECT p.id, p.parent_id, p.summary, o.offset, o.size, c.type, c.new_content
  FROM patches as p
  inner join changes as c on p.id = c.patch_id
  inner JOIN offsets as o on o.change_id = c.id
  WHERE p.id = <the current patch>
  UNION ALL  
  SELECT a.id, a.parent_id, a.summary, ao.offset, ao.size, ac.type, ac.new_content
  FROM patches as a
  inner join changes as ac on a.id = ac.patch_id
  inner JOIN offsets as ao on ao.change_id = ac.id
  JOIN ancestors as aa on a.id = aa.parent_id
  where ((ao.offset <= aa.offset AND ao.offset + ao.size > aa.offset) AND (ao.offset + ao.size >= aa.offset + aa.size)) || NOT (ao.offset + ao.size > aa.offset)
) select * from ancestors

so what ends up coming back is the following changes

insert HARD LIQUOR at offset 15 with size 11
insert advil at offset 42 with a size of 5

then going linearally in order of the offsets, we reconstruct the file based on the type of change (insert, edit, or deletion)

Notice that BEER doesn't come up in the list of changes

Now one thing I think is interesting is that a branch in this model just means that there are 2 different patch histories with a common parent in there, so merging them, and identifying conflicts should have everything to do with locating intersecting offsets.

Conflicts could be stored in a separate table for a fault tolerant VCS, and since we are using offsets, conflicts should be refined down to the letter.

Note: if I remove the HARD LIQUOR patch, it would still return with the BEER patch for reconstruction of the file, and if I remove the BEER patch, it would still compute with the HARD LIQUOR patch, no commutation needed.

A

one
two
three

B

one
+ four (offset 4, size 5)
two
- three (offset 13, size 6)

C

one
two
+ five (offset 8 size 5)

somewhere else is D who's parent is A

one
two
three
+ four (offset 14, size 5)
+ five (offset 19, size 5)

I'm thinking a few things, if we reconstruct the document on the merge

We go

Patch A.

one
two
three

Patch B + Patch D would need to be reconciled because they share the same parent Patch B.

one
+ four (offset 4, size 5)
two
- three (offset 13, size 6)

Patch D

one
two [conflict]
three [conflict]
+ four (offset 14, size 5) [conflict]
+ five (offset 19, size 5) [conflict]

So the patch fix might look like


@skinnyjames
Copy link
Author

I ended up using a linked list which represents cursors for each patch going back. When I need to resolve the "true" offset of a given line for a given patch, it seems that i can retrack the changes by using adding and subtracting all the cursors under a given patch while traversing the changes. Probably not perfect though.

Patch A.

+ One (0)
+ Two (4)
+ Three (8)

Patch B.

One
- Two (4)
+ Five (4)
Three

Patch C.

One
Five
Three (now 9, every offset below this line for future patches is incremented by 1)
+ Eight (15 patch C) = (14 patch b)

Patch D.

One
Five
- Three (8 in patch A (introduced)) = (9 patch B and C) + 1
+ Four
Eight (14 patch D, now every offset below this line for future patches is decremented by 1)
+ Ten ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment