The format is based on modern S-expression notation used widely today, for example in lisp family of programming languages. However the format is redefined from the ground up and is not compatible with any existing formats (unless unintentionally so).
Format is defined in terms of ASCII byte values. Any ASCII-compatible encoding will work. The input is steam of values, not bytes, hence encodings like UTF-32 may work as well. Preferred encoding is UTF-8, but it's not required.
Space character is defined as one of: \r
, \n
, \t
, <space>
.
Non-scalar character is defined as one of: \r
, \n
, \t
, "
, (
, )
, ;
, <backquote>
. It is possible to escape grave accent mark in markdown, but I don't do that and use <backquote>
instead.
String literals are defined with minimal amount of escape sequences. Some escape sequences are there simply for readability purposes.
String literal starts with a quotation mark "
and ends with a quotation mark "
. String literal may contain one or more: valid escape sequence or any other byte, except \n
and "
.
Valid escape sequences are:
\r
- is converted to0x0D
byte\n
- is converted to0x0A
byte\t
- is converted to0x09
byte\\
- is converted to0x5C
byte\xHH
- is converted to0xHH
byte,H
is a valid hex digit, upper-case or lower-case
Invalid escape sequence is an error and should not be allowed.
Uninterpreted string literal starts with a <backquote>
and ends with a <backquote>
. You can use any byte in-between, except \n
and <backquote>
. There are no escape sequences. Uninterpreted strings are useful to represent regular expressions and file paths on some operating systems.
Example: `C:\Program Files\ABC\Data`
Multi-line string literal is a special lexical element which contains a set of raw strings. Multi-line string literal starts with triple <backquote>
and ends with triple <backquote>
. However lines defined by it can also contain triple <backquote>
if necessary. How does it work? Within a multi-line string literal, a line starts with a |
character followed by an optional <space>
and lasts to the first \n
. The optional <space>
is not included. So this string literal:
```
| Greetings, {{name}}.
|
| Welcome to this wonderful place called ```home```
```
Yields:
Greetings, {{name}}.
Welcome to this wonderful place called ```home```
As you can see this scheme allows absolutely any character inside a multi-line string. You can even have a multi-line string inside a multi-line string. Because it's a simple convetion, you take a line, you strip everything up to first |
and optional <space>
and this is your new line. Nothing new in fact, inspired by comment syntax in many languages which allows anything inside of a comment line.
Scalar starts with a first non-scalar character and ends with a last non-scalar character.
List may contain scalars, strings or other lists. List starts with an opening parenthesis (
and ends with a closing parenthesis )
. You can use space characters as separators for list elements, but it's not required in some cases. For example:
hello(iam"John")world
is a valid sequence of a scalar hello
, a list with two elements iam
(scalar) and John
(string) followed by a scalar world
. While this form is allowed by definition, it's not recommended. Please, use at least a single space character to separate list elements from each other. A preferred way to write the example above is:
hello (iam "John") world
Comment starts with a semicolon ;
and ends with a newline byte \n
. Anything in-between is allowed.