Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save 0xdevalias/20cfaaeb33698440b58b00ab97dd3bbc to your computer and use it in GitHub Desktop.
Save 0xdevalias/20cfaaeb33698440b58b00ab97dd3bbc to your computer and use it in GitHub Desktop.
Some thoughts RE: Copy on Write + Event Sourcing for edit history

Copy on Write + Event Sourcing for Edit History

Table of Contents

Notes

Some thoughts RE: Copy on Write + Event Sourcing for edit history.

From a chat conversation:

Random reading today; note at the end of this article on how git commits are actually stored: https://manishearth.github.io/blog/2017/03/05/understanding-git-filter-branch/#appendix-how-are-commits-actually-stored

The part that sparked my brain just now was this:

The way the actual implementation of a commit works is that each file being stored is hashed and stored in a compressed format, indexed by the hash. A directory (“tree”) will be a list of hashes, one for each file/directory inside it, alongside the filenames and other metadata. This list will be hashed and used everywhere else to refer to the directory.

A commit will reference the “tree” object for the root directory via its hash.

Now, if you make a commit changing some files, most of the files will be unchanged. So will most of the directories. So the commits can share the objects for the unchanged files/directories, reducing their size. This is basically a copy-on-write model.

Specifically the part about using a hash of the file contents as a ‘key’ to point to it; and how that basically allows a ‘copy on write’ model.

The brain spark was how something like that could potentially be used if storing DB records for changes (eg. to a Response), where the actual response content is stored in an external bucket. Could hash the response content, store that in the DB record for the Response; and then store the actual response in the bucket.

That way if a new response record was created, but the content hadn’t changed (eg. Only metadata, linked citations, etc); then wouldn’t be storing duplicates of the response content each time.

Definitely not something we need to implement currently; but I thought it was a cool pattern and wanted to share

Random article about copy on write: https://softwarepatternslexicon.com/103/4/5/

The article itself doesn’t matter hugely, but the ‘related patterns’ at the end reminded me of this:

Event Sourcing: Used to record changes as a sequence of events, which enables reconstructing states from these events, similar to CoW’s rollback and history features.

Which coincides with something that was floating in my mind recently RE: how we could store ‘in progress drafts’ of responses in case the user’s browser crashes or similar.

Obviously wouldn’t want to pollute the main response table with this as it would get too big too quickly; so probably would be a new table with a single ‘in progress draft’ per user at most.

Could also just store this in browser local storage if we didn’t want to persist it in the DB.

But anyways; then I was thinking about undo history, or being able to see more fine grained changes on a version; and that’s where the event sourcing pattern could fit in.

Whether it’s on an in progress draft, or on a saved response version itself; there could be an additional field (probably JSON field) that stores the ‘edit history’ events since the last ‘snapshot’ (for a saved model, that would basically be the edits made since the last saved model)

That would allow seeing finer grained edits as a sort of metadata; while having a single ‘snapshot’ of the main content as a single block of text still.

See Also

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment