Skip to content

Instantly share code, notes, and snippets.

@billiegoose
Last active February 25, 2016 20:32
Show Gist options
  • Save billiegoose/cfa5aa3194561d3db08d to your computer and use it in GitHub Desktop.
Save billiegoose/cfa5aa3194561d3db08d to your computer and use it in GitHub Desktop.
Thoughts on a Universal Data schema
// TODO: Clean up thoughts into a nice document.
// Thoughts on Universal Data language / description
// See: https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html#comment-2534873305
/* Preface:
Why do Thrift, Protobuf, and Avro each define their own IDL?
I would think that there could be an implementation-independent "master" IDL
to standardize the syntax (the semantics depends on the feature set of the implementation).
It makes it hard to try out these different libraries when they require rewriting
the message schema each time.
*/
/*
Fundamental core: Algebraic Data Types (product types and sum types)
Correspondences: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence
English | Logic | Programming | Type theory -> Set theory
| true | unit type
Error? | false | bottom type
One of each | and | Tuple | product type -> Cartesian product
Just one of | xor | Enum | sum type -> Disjoint unions
Any combination of | or | Flag | ? -> ?
Symbols are Atoms
Enums are Mutually Exclusive Symbols in a Namespace
Namespace EnumName { Symbol, Symbol, Symbol }
Flags are Mutually Compatible Symbols in a Namespace where each is optional
Namespace FlagName { Symbol?, Symbol?, Symbol? }
Tuples are Mutually Required Symbols in a Namespace where one of each must exist
There is a Global Namespace
Enums and Flags can share the global namespace and be refered by unqualified symbols IF there are no symbol name conflicts.
Top Level Enum { bool, int, int32, int64, float, string, table }
Enum Presence { required, optional }
Flag Attributes { read, write, protected }
Use something like
import <type.def>
to load the basic types into the top-level enum, which is essentially "Type" since everything inside
is a type.
Virtual Enums are namespaces that are indicated by syntax rather than by qualifying with an Enum name.
All "strings" for instance, belong to a global Name Namespace.
Field ID numbers, could belong to a global Field namespace, and be identified by @ signs (as in capnproto.org)
This way, by default any string is assumed to be a field name for the algebraic type.
>> "myfield" bool int @1
// expands: Name.myfield Type.bool Type.bool Field.1
// error because use of same Type enum twice.
>> string, read, write, "name"
// expands: Name.name Type.string Attributes.read Attributes.write
>> "email" read string faker="internet.email"
// is unneeded. "internet.email" is poorly typed. Instead, a faker enum is simple to construct:
Enum faker { "internet_email", "name_firstName", "name_lastName", ... }
// and then the attribute assigned to the property
>> "email" read string internet.email
// or fully qualified:
>> "email" read string faker.internet_email
All top level defined messages are part of the global Type Enum. E.g.
"User" {
"name" string,
"email" string
}
"Company" {
"owner" User required
}
A field consists of:
:flags := (:epsilon | :flag1 | :flag2 | :flag3) & :flags
:enum := (:epsilon | :enum1 | :enum2 | :enum3)
:attr := (:epsilon | :flags | :enum)
:field := :attr & Type & :attr
/*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment