Skip to content

Instantly share code, notes, and snippets.

@mindfulvector
Last active August 29, 2024 20:46
Show Gist options
  • Save mindfulvector/17c0d23e6f83a7de0879ce06eb1df60f to your computer and use it in GitHub Desktop.
Save mindfulvector/17c0d23e6f83a7de0879ce06eb1df60f to your computer and use it in GitHub Desktop.
GhostSON where'd you find this madness ?

GhostSON Specification

Overview

GhostSON is a lightweight, human-readable data serialization format that combines features of BSON (Binary JSON) and TAG/LENGTH/DATA techniques. It uses Unicode 8 characters, including selected "ghost characters" that are printable but lack semantic meaning in any language, to keep the format compact yet readable.

Basic Structure

Each GhostSON document consists of a series of elements. Each element has the following structure:

墸TYPE垈KEY粐LENGTH糘VALUE粭

Where:

墸 (U+58B8) marks the start of an element
TYPE is a single character indicating the data type
垈 (U+5788) separates the type from the key
KEY is the field name (string)
粐 (U+7790) separates the key from the length
LENGTH is the length of the value in characters
糘 (U+7358) separates the length from the value
VALUE is the actual data
粭 (U+7CAD) marks the end of an element

Data Types

s: String
i: Integer
f: Float
b: Boolean
n: Null
a: Array
o: Object (embedded document)
d: Date

Special Characters

岾 (U+5CBE): Array start
恷 (U+6077): Array end
橸 (U+6A78): Object start
汢 (U+6C62): Object end

Examples

Simple key-value pairs:

墸s垈name粐5糘Alice粭墸i垈age粐2糘30粭

Nested object:

墸o垈person粐47糘橸墸s垈name粐5糘Alice粭墸i垈age粐2糘30粭汢粭

Array:

墸a垈scores粐15糘岾3糘5糘7糘9糘11恷粭

Complete document:

橸
墸s垈name粐5糘Alice粭
墸i垈age粐2糘30粭
墸a垈hobbies粐17糘岾6糘coding糘5糘music恷粭
墸o垈address粐47糘橸墸s垈street粐10糘Main St 1粭墸s垈city粐7糘New York汢粭
汢

Parsing Rules

Start parsing from the beginning of the document.

For each element:

a. Read the type character after 墸
b. Read the key string between 垈 and 粐
c. Read the length value between 粐 and 糘
d. Read the value of specified length after 糘
e. Continue to the next element marked by 墸 or end of document

For nested structures (arrays and objects):

a. Arrays are enclosed between 岾 and 恷
b. Objects are enclosed between 橸 and 汢
c. Parse recursively within these structures

Benefits

  • Hum-man-readable: Uses printable Unicode characters
  • Compact: Utilizes ghost characters to reduce overhead
  • Self-describing: Includes type information
  • Nestable: Supports complex data structures
  • Length-prefixed: Allows for efficient parsing and skipping of elements
  • Insane looking

Limitations

  • Limited to Unicode 8 character set ????
  • May not be as compact as pure binary formats
  • Requires support for displaying Unicode ghost characters
  • Insane looking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment