GhostSON is a lightweight, human-readable data serialization format that combines features of BSON (Binary JSON) and TAG/LENGTH/DATA techniques. It uses Unicode 8 characters, including selected "ghost characters" that are printable but lack semantic meaning in any language, to keep the format compact yet readable.
Each GhostSON document consists of a series of elements. Each element has the following structure:
墸TYPE垈KEY粐LENGTH糘VALUE粭
Where:
墸 (U+58B8) marks the start of an element
TYPE is a single character indicating the data type
垈 (U+5788) separates the type from the key
KEY is the field name (string)
粐 (U+7790) separates the key from the length
LENGTH is the length of the value in characters
糘 (U+7358) separates the length from the value
VALUE is the actual data
粭 (U+7CAD) marks the end of an element
s: String
i: Integer
f: Float
b: Boolean
n: Null
a: Array
o: Object (embedded document)
d: Date
岾 (U+5CBE): Array start
恷 (U+6077): Array end
橸 (U+6A78): Object start
汢 (U+6C62): Object end
Simple key-value pairs:
墸s垈name粐5糘Alice粭墸i垈age粐2糘30粭
Nested object:
墸o垈person粐47糘橸墸s垈name粐5糘Alice粭墸i垈age粐2糘30粭汢粭
Array:
墸a垈scores粐15糘岾3糘5糘7糘9糘11恷粭
Complete document:
橸
墸s垈name粐5糘Alice粭
墸i垈age粐2糘30粭
墸a垈hobbies粐17糘岾6糘coding糘5糘music恷粭
墸o垈address粐47糘橸墸s垈street粐10糘Main St 1粭墸s垈city粐7糘New York汢粭
汢
Start parsing from the beginning of the document.
For each element:
a. Read the type character after 墸
b. Read the key string between 垈 and 粐
c. Read the length value between 粐 and 糘
d. Read the value of specified length after 糘
e. Continue to the next element marked by 墸 or end of document
For nested structures (arrays and objects):
a. Arrays are enclosed between 岾 and 恷
b. Objects are enclosed between 橸 and 汢
c. Parse recursively within these structures
- Hum-man-readable: Uses printable Unicode characters
- Compact: Utilizes ghost characters to reduce overhead
- Self-describing: Includes type information
- Nestable: Supports complex data structures
- Length-prefixed: Allows for efficient parsing and skipping of elements
- Insane looking
Limitations
- Limited to Unicode 8 character set ????
- May not be as compact as pure binary formats
- Requires support for displaying Unicode ghost characters
- Insane looking