shitty_datalog_db.md

informal spec for shitty datalog db implemented on a filesystem:

a directory represents a table
a file in the directory represents a row
the filename is a universal hash of the content
the file/directory extension is the format of the content (e.g. .ssii: two strings, two integers)
row format could be binary or single-line csv

some nice properties:

needs no container format, leverages existing technology for hi-throughput caching and buffering as well as shared network access
separates unindexed data from index
create/drop tables by just adding/removing directories
index by table just by searching directory
add/remove rows by just writing/removing files
rows are content-addressed, therefore:
- rows are immutable
- deduplicates, since duplicate content -> duplicate filename
- corruption can be detected by checking if content matches filename
each row has a creation date and time, as well as last access (permitting LRU drops)
alter columns by transitioning file extensions; can be safely resumed after interruption
online indices can update after file system notifications
bonus: permission flags could do something interesting

drawbacks:

max 64K rows per table, beyond that a HAMT-like structure is needed, i.e. sort into subdirectories by first few digits of hash
possibly too taxing for SSDs
incomplete: applications still need to build indices over the data

paniq/shitty_datalog_db.md