This is a specification for an s-expression interchange format that attempts to improve upon [2]rivest's canonical s-expressions.
It is an output format for a subset of s-expressions. Those containing only pairs and atoms.
It was designed with the following desirable properties in mind:
- It has the canonicity property that
(EQUAL? A B)
implies the DCS output of A is byte equal to the DCS output of B. - It has the non-escaping property that arbitrary binary blobs can be contained as atoms without any processing. A consequence of this is that dcsexps can be nested easily.
- Simple to parse: It is much simpler to parse compared to rivest's canonical s-expressions because we use
.
instead of(
and)
.
The empty symbol (length 0) may be used as a stand-in for ()
.
<DCS> ::= <length> ':' <data[length]>
| '.' <DCS> <DCS>
Why would you use this instead of regular s-expressions with the WRITE feature (that you could in theory turn off indentation, pretty printing to produce a function with the canonicity property)?
The value of this over that is that it is much more efficient in machine to machine interchange. For example between a web server and client.
Why would you use this instead of rivest canonical s-exps? It has a much simpler specification and the parsing algorithm is a fraction of the complexity.
This format is very raw, it only has pairs and atoms. We may need more data types. For that we can use tagged canonical s-exps.
<TCS> ::= <tag> <length> ':' <data[length]>
| '.' <TCS> <TCS>
<tag> ::= 'A' ;; Atom
| 'S' ;; String
| 'N' ;; Number
| 'C' ;; Character: content must have length one.
| 'B' ;; Boolean: content must be 't' or 'f'
| 'Z' ;; nil (): content must have length 0
tag is a single character that explains which type to interpret the content of the atom as.
- messagepack - "It's like JSON but fast an small"
- bencode - netstring based format that can encode lists, used in bittorrent.
- flatbuffers - interchange without any parsing
Thanks for the comment.
Good idea to support vectors.
I would like to support them using '#' because we write vectors like this
#()
.How should we do hash tables?