Brian151 · September 19, 2023 08:19
diff --git a/KESF.txt b/KESF.txt
 file format : 
 base : RIFX (formtype : "KESF" ("Kitten Engine Serialization Format"))

 chunks:

 	OBJ_ (with LETTER "o" ) {
 		// serialized object data
 	}
 	
 	HEAD {
 		// header data (TBD)
 		U32 flags {
 			compressed? // TBD
 			hasOBJ? // valid files must have at least 1 of these 3 set
 			hasTypes?
 			hasStrings? // not required, but throw error if no string table exists and a string is encoded as reference
 		} 
 		// it is perfectly valid to encode : 
 		// database[n].kesf : serialized objects only
 		// typedefs.kesf : typedefs only
 		// strings.kesf : strings only
 		// and then load all three into the parser
 		// a good reason to do this would be having multiple serialized object files with the same typedefs and string table
 	}
 	
 	STR_ {
 		// strings table (names for sure, literals?)
 		// String format : length(varint)-prefixed UTF-8
 	}
 	
 	TDEF {
 		// datastuct definitions table
 	}
 	
 	META { // ?
 		// meta-data (such as comments, creation tool, etc...)
 		// TBD , not required
 	}
 	
 object serialization format :

 Byte type { // all objects start with a 1-byte type
 	u4 dataBaseType
 	u4 dataSubType
 }

 basetype = {
 	number = 0 // ints, floats
 	string = 1
 	array = 2
 	object = 3
 	datastruct = 4 // required type because datastructs are a stored as raw bytestream
 }


 subtypeNumber = {
 	u8 = 0
 	i8 = 1
 	u16 = 2
 	i16 = 3
 	u32 = 4
 	i32 = 5
 	float32 = 6
 	float64 = 7
 	u64 = 8 //but JS doesn't natively support
 	i64 = 9
 	dec64 = 10 // not natively supported anywhere???
 	BCD = 11 // may not implement these... 
 	varint = 12
 	float16 = 13
 	u24 = 14
 	i24 = 15
 }

 subtypeString = {
 	flag isLiteral // non-zero : string data follows, zero : reference into string array follows
 }

 subTypeArray = {
 	number = 0
 	string = 1
 	dynamic = 2 // most common
 	datastruct = 3
 	numberfixed = 4 // numbers of a known type (preferred if possible since repeating number typeID is wasteful)
 }

 subTypeObject = {
 	none = 0 // objects are dynamic
 }

 subTypeDataStruct = {
 	// structure varies by implementation and is specified in the datastruct, itself
 	// to achieve max efficiency, we'll convert this to a bitfield
 	flags = {
 		embedName // if 1, include the name reference
 		embedOrdinal // if 1, include the ordinal ID, takes priority since it's the fastest
 		hasTypedef // if 1, we expect a name or ordinal. if zero, we assume an external implementation handles parsing
 		sizeFixed // arrays only, we expect the typedef or an external implementation to supply the length
 	}
 }

 format of serialized items : 

 number = {
 	[typeID] // an array with type : numberFixed, does not need this
 	number
 }

 string = {
 	[typeID] // if in array, do not encode this
 	reference or literal, depending on "subtype"
 }

 array = {
 	typeID
 	U32 length
 	switch(subtype) {
 		0 :
 		<length> serialized numbers // maximally inefficient method of encoding number array
 		1 : 
 		<length> serialized strings 
 		U8Array(Math.ceil(length / 8)) literalOrReferenceFlags
 		// can skip the string typeID since it's explicitly declared here
 		2 : 
 		<length> serialized things // since it's anything, all items require their type ID , maximally inefficient, try to avoid this
 		3 : 
 		U8 flags = {
 			embedTypedefNames // if 1, each data struct includes its type name; if zero, we declare it here
 			// if 1, there actually is an included typedef and we should parse this as an object
 			// if zero, we assume an external implementation decodes the data. useful for hiding your "secrets" and embedding files.
 				// also the most efficient way to handle [de]serialization
 					// it is even perfectly valid to include the typedefs in the file, but not use them
 			hasTypeDef
 			useOrdinal // if 1, we use type typedef at a provided index in the types array 
 		}
 		[String typedefName] // name of the typedef to use for this datastruct
 		[U32] ordinalID
 		<length> datastructs // this is typically the most effient method of storing objects, assuming a common format exists, ex. a map file in some game
 		4 :
 		numberTYPEID
 		<length> numbers of a type:TYPEID, this is the optimal way to encode an array of numbers because we don't need to specify a typeID per-number
 	}
 }

 object = {
 	typeID
 	U32 byteLength
 	Byte[] data(byteLength) = {
 		Array [ // we do not actually count these, just read/write them, we stop whenever byteLength bytes have been read or write byteLength when all properties have been serialized
 			{
 				String name
 				serializedData value
 			}
 		]
 	}
 }

 datastruct = {
 	typeID
 	[String typedefName] // in an array, we can and should [try to] skip encoding the type IDs per-entry
 	[U32 typedefOrdinal] // takes priority over name since it's faster
 	[u32 byteLength] // if it has a known size, we don't need this
 	Byte[] structData
 }

 // type definitions are essentially the object format again, but instead of the value, they encode the primitive type and the name
 datastruct typedef format = {
 	// this being the actual type definition, name is mandatory
 	String name
 	{implicit U32 ordinalID} = the index of this item in the typedef array
 	U32 byteLength
 	U32 structFixedSize // can be zero, if non-zero and we're reading from an array : parser should assume encoded struct data is this size
 	Byte[] data(byteLength) = {
 		Array [ // we do not actually count these, just read/write them, we stop whenever byteLength bytes have been read or write byteLength when all properties have been serialized
 			{
 				typeID type
 				String name
 			}
 		]
 	}
 }

 examples : 

 given JSON :
 {
 	"tuna" : "fish",
 	"catnames" : ["garfield","felix","nolegs"],
 	"a"  : {
 		"b" : {
 			"c": 0,
 			"d": 1,
 			"e": 2,
 			"f": 3,
 		}
 	}
 } // 148 chars, prob 148 bytes in this case.

 return serialized object (represented as string with comments denoting bytes and sizes) :

 Object { // 30 <LL LL LL LL> (5)
 	Prop {type = String , name = "tuna", value = "fish"}, // 11 04 <T- U- N- A-> 11 04 <F- I- S- H-> (12)
 	Prop {type = Array<String>, name = "catnames" , value = ["garfield","felix","nolegs"]}, 
 	// 11 08 <c- a- t- n- a- m- e- s-> 21 00 00 00 03 (15)
 	0b{1110 0000} (1)
 	[  
 		08 <g- a- r- f- i- e- l- d->, (9)
 		05 <f- e- l- i- x->, (6)
 		06 <n- o- l- e- g- s->, (7)
 	]
 	Prop {
 		type = Object, 
 		name = "a", // 11 01 <a-> (3)
 		value = Object { // 30 <LL LL LL LL> (5)
 			Prop { 
 				type = Object, 
 				name ="b", // 11 01 <b-> (3)
 				value = { // 30 <LL LL LL LL> (5)
 					Prop {type = U8 , name = "c", value = 0}, // 11 01 <c-> 00 00 (5)
 					Prop {type = U8 , name = "d", value = 1}, // 11 01 <d-> 00 01 (5)
 					Prop {type = U8 , name = "e", value = 2}, // 11 01 <e-> 00 03 (5)
 					Prop {type = U8 , name = "f", value = 3}, // 11 01 <f-> 00 04 (5)
 				}
 			}
 		}
 	}
 } // all total : 91 bytes, marginally smaller even in this absurd case, assuming no math errors

 more realistic example, a color palette : 

 given JSON : 
 {
 	"colors" : [
 		{"r":0,"g":0,"b":0,"a":0}, ... // we'll say there's 256 of these
 	]
 }

 return types: 

 Array (2) [ // 00 00 00 02 (4)
 	Typedef {
 		name = "Palette" // 11 07 <P- a- l- e- t- t- e-> (9)
 		id = 0 // implicit (0)
 		length // <LL LL LL LL> (4)
 		structSize = 0 // 00 00 00 00 (4)
 		def = {
 			TypeProp {
 				// we also include a length for array def, if zero, length is supplied in the encoded struct
 				type = Array<TypeDef(ord:1 = "Color")>(0) // 23 0b{0110 0000} 00 00 00 01 , 00 00 00 00 (10)
 				name = "colors" // 11 06 <c- o- l- o- r- s-> (8)
 			} 
 		} 
 	},
 	Typedef {
 		name = "Color" // 11 05 <C- o- l- o- r-> (7)
 		id = 1 // implicit (0)
 		length // <LL LL LL LL> (4)
 		structSize = 4 // 00 00 00 04 (4)
 		def = {
 			TypeProp {
 				type = U8 // 00 (1)
 				name = "r" // 11 <r-> (2)
 			}
 			TypeProp {
 				type = U8 // 00 (1)
 				name = "g" // 11 <g-> (2)
 			}
 			TypeProp {
 				type = U8 // 00 (1)
 				name = "b" // 11 <b-> (2)
 			}
 			TypeProp {
 				type = U8 // 00 (1)
 				name = "a" // 11 <a-> (2)
 			}
 		}
 	}
 ] // all total 66 bytes (again, math!)

 return serialized object : 

 DataStruct { // 04 0b{0110 0000} (2) , has ordinal, has typedef
 	ordinal = 1 // 00 00 00 01 (4)
 	length // <LL LL LL LL> (4)
 	// 00 00 01 00 (4) , we infer property is named "palette" from typedef[1] : "Palette" and is type Array<Color>, all we need to "define" the array is its length
 	Array<Color> palette(256) : [ 
 		struct Color = { // the typedef also explicitely delcares that we have a data struct with a known type and fixed size
 			U8 r ,
 			U8 g ,
 			U8 b ,
 			U8 a
 		} , ... // 4 bytes a piece times 256 entries is 1024 bytes , we don't need any of the struct declaration overhead because the typedefs handled this for as us as well
 	]
 } // all total 1038 bytes

 combined, 1104 bytes... many times smaller than the JSON equivalent
 also smaller than the serialized object equivalent

 another noteworthy optimization is this entire thing can simply be encoded as a U8[](1024)
 because color palettes are such a basic structure(an array of 32-bit RGB[A] values, often 256 total), overhead to define them is simply not necesarry

 care should always be given to the most optimal way to encode your data
 it is missing the point to exclusively use serialized objects

 basic rules would be as follows :


 1] single declaration object : serialized object
 2] array of objects with a common format : array of datastructs + typedefs (if needed)
 3] array of integer numbers : array of [U]Ints of appropriate size + format for your data
 3] multiple files with a common format [containing objects of a common format] : n files with arrays of datastructs + 1 file with typedefs
 4] repeated strings : string table + string reference declaration
 so on, use common sense

 it also is strongly advised to code-gen encoders/decoders for datastructs
 object declaration and typedefs are extremely reflective by nature
 datastructs are especially bad because most real-world examples will likely contain nested structs
 this means recursively calling the typedef encoder/decoder, which is even more detrimental to performance than reflectively reading/writing an object

 the most efficient use of typedefs is for documentation/debugging purposes or authortime tool[chain]s
 runtime production code should use generated parsing code or JIT generate/compile the most optimized parser possible
 the performance penalties otherwise may be servere, especially with larger and more complex files
 that said, just because you don't use the typedef in production, doesn't mean you shouldn't include it
 that is solely at your discretion. OS/FOSS-oriented folk would prefer to include type information :P 
 also, treat typedefs with the same level of importance as source code
 it is reasonable to assume that you will change data formats through the development/update cycle
 if you lose your type definitions, you're gonna have a bad time.

 the file format overhead is not required, but is encouraged.
 any "valid"/compliant implementation of the object reader/writer should not care about it
 it is however quite useful for organizing stuff
 and, it doesn't consume a whole lot of bytes

 rationale : 

 "but...":
 JSON ? : 
 no types, size and bandwidth consumption bloats proportional to complexity/amount of data (a common problem all text-based serializers have).
 can trade-off some performance to run compression over the data

 BSON ? :
 JSON, but binary
 KESF can be used exclusively in object serialization format to be "yet another BSON", but this is not how i intended it to be used

 messagepack ? : 
 a step in the right direction, yes... but i still feel its main focus was in being another BSON
 same as for BSON, yes, KESF can more or less act as a substitute, that is not its intended use
 due to my zealous focus on declaring types and sub-types, there are situations where messagepack probably produces [marginally] smaller files
 most likely, however; you mis-used it for that to happen

 kaitai structs ? : 
 uses a text-based seriliazer with a name that says enough
 while the side effect of KESF is it can define file formats up to a certain extent, it was designed first and foremost to serialize high-level objects for games and software, compactly

 plain old binary ? : 
 go right ahead, can't get any smaller and more optimized than that.
 just remember that it doesn't typically have the same ease of use
 in the most optimal conditions [agressive use of typedefs and codegen], production KESF code/files are essentially plain old binary

 "that one engine!" :
 they each do their own thing, some better than others
 in a rare win for unity [IMO], type information is stored in a dedicated section/file rather than constantly re-specified every time an object is encoded
 my decision to create a type tree/table is pure coincidence, done in complete oblivion to the fact unity uses a similar system, which i learned several months later

 "goals?" :

 1] create a serialization format that was reasonably small without compression while still being capable of containing all the information needed to re-construct a high-level object
 2] enforce typing as loose or as strict as an invidual developer/team wishes to impose on themselves
 3] bridge the gap between "simple text format anyone with notepad can edit" and "small binary format optimized for your computer processor to read", without a paywall/license and tons of complexity as has traditionally been the case
 4] self-documenting binary blob without the insanity and bloat traditionally associated with it
 5] allow for the possiblity to embed arbitary binary data in a serialized object with minimal overhead and especially no use of hex/base64/etc.. text encodings
 6] a file format which clearly identifies itself, declares what it contains, and where those contents are, compactly. RIFX was chosen for the following reasons : 
 	6a] it's a chunk-container format using 4CCs to indentify its chunks. it also specifies an additional "form type" field which should help discriminate files a specific software isn't meant to parse
 		6aa] the only other known RIFX implementation is macromedia/adobe director/shockwave. it seems likely to be the case that macromedia designed* this format. 
 		"designed" loosely speaking, this is just another incidence of a series of blatant plaigarizations from amgia's IFF format. the "evolution" of RIFX naturally would be IFF (amgia) > RIFF (microsoft) > RIFX (macromedia[?]) > KESF (myself); with each additional party apparently contributing less singiciant change from the chosen base format. (and meaning that i copied it as-is)
 	6b] it uses big-endian byte order which is more intuitive to HUMAN PROGRAMMERS, unlike its own base format
 	6c] by default, it has practically no meangingful overhead. all chunks have a 4CC (ASCII String with exactly 4 characters, no length or null terminator. this also means it can be read/written as a U32) ID and Uint32 length. the main chunk also has the aforementioned (6a) 4CC formtype. 12 bytes file header information and 8 bytes section header is pretty decent.
 7] as extension of 6, create a data/file format and system which with relative ease can be integrated into any workflow/engine/framework/app/etc... even a novice programmer should be able to figure this out.

 "the name?" : 
 see my game library/engine, "Kitten Engine", once i have an implementation/workflow ready, this will be its default data serializtion format, although using it won't be strictly enforced

 WARNING : 
 in its current stage, this is still in draft phase and likely contains errors/oversight

 author/copyright : https://github.com/Brian151/
	file format :
	base : RIFX (formtype : "KESF" ("Kitten Engine Serialization Format"))

	chunks:

	OBJ_ (with LETTER "o" ) {
	// serialized object data
	}

	HEAD {
	// header data (TBD)
	U32 flags {
	compressed? // TBD
	hasOBJ? // valid files must have at least 1 of these 3 set
	hasTypes?
	hasStrings? // not required, but throw error if no string table exists and a string is encoded as reference
	}
	// it is perfectly valid to encode :
	// database[n].kesf : serialized objects only
	// typedefs.kesf : typedefs only
	// strings.kesf : strings only
	// and then load all three into the parser
	// a good reason to do this would be having multiple serialized object files with the same typedefs and string table
	}

	STR_ {
	// strings table (names for sure, literals?)
	// String format : length(varint)-prefixed UTF-8
	}

	TDEF {
	// datastuct definitions table
	}

	META { // ?
	// meta-data (such as comments, creation tool, etc...)
	// TBD , not required
	}

	object serialization format :

	Byte type { // all objects start with a 1-byte type
	u4 dataBaseType
	u4 dataSubType
	}

	basetype = {
	number = 0 // ints, floats
	string = 1
	array = 2
	object = 3
	datastruct = 4 // required type because datastructs are a stored as raw bytestream
	}


	subtypeNumber = {
	u8 = 0
	i8 = 1
	u16 = 2
	i16 = 3
	u32 = 4
	i32 = 5
	float32 = 6
	float64 = 7
	u64 = 8 //but JS doesn't natively support
	i64 = 9
	dec64 = 10 // not natively supported anywhere???
	BCD = 11 // may not implement these...
	varint = 12
	float16 = 13
	u24 = 14
	i24 = 15
	}

	subtypeString = {
	flag isLiteral // non-zero : string data follows, zero : reference into string array follows
	}

	subTypeArray = {
	number = 0
	string = 1
	dynamic = 2 // most common
	datastruct = 3
	numberfixed = 4 // numbers of a known type (preferred if possible since repeating number typeID is wasteful)
	}

	subTypeObject = {
	none = 0 // objects are dynamic
	}

	subTypeDataStruct = {
	// structure varies by implementation and is specified in the datastruct, itself
	// to achieve max efficiency, we'll convert this to a bitfield
	flags = {
	embedName // if 1, include the name reference
	embedOrdinal // if 1, include the ordinal ID, takes priority since it's the fastest
	hasTypedef // if 1, we expect a name or ordinal. if zero, we assume an external implementation handles parsing
	sizeFixed // arrays only, we expect the typedef or an external implementation to supply the length
	}
	}

	format of serialized items :

	number = {
	[typeID] // an array with type : numberFixed, does not need this
	number
	}

	string = {
	[typeID] // if in array, do not encode this
	reference or literal, depending on "subtype"
	}

	array = {
	typeID
	U32 length
	switch(subtype) {
	0 :
	<length> serialized numbers // maximally inefficient method of encoding number array
	1 :
	<length> serialized strings
	U8Array(Math.ceil(length / 8)) literalOrReferenceFlags
	// can skip the string typeID since it's explicitly declared here
	2 :
	<length> serialized things // since it's anything, all items require their type ID , maximally inefficient, try to avoid this
	3 :
	U8 flags = {
	embedTypedefNames // if 1, each data struct includes its type name; if zero, we declare it here
	// if 1, there actually is an included typedef and we should parse this as an object
	// if zero, we assume an external implementation decodes the data. useful for hiding your "secrets" and embedding files.
	// also the most efficient way to handle [de]serialization
	// it is even perfectly valid to include the typedefs in the file, but not use them
	hasTypeDef
	useOrdinal // if 1, we use type typedef at a provided index in the types array
	}
	[String typedefName] // name of the typedef to use for this datastruct
	[U32] ordinalID
	<length> datastructs // this is typically the most effient method of storing objects, assuming a common format exists, ex. a map file in some game
	4 :
	numberTYPEID
	<length> numbers of a type:TYPEID, this is the optimal way to encode an array of numbers because we don't need to specify a typeID per-number
	}
	}

	object = {
	typeID
	U32 byteLength
	Byte[] data(byteLength) = {
	Array [ // we do not actually count these, just read/write them, we stop whenever byteLength bytes have been read or write byteLength when all properties have been serialized
	{
	String name
	serializedData value
	}
	]
	}
	}

	datastruct = {
	typeID
	[String typedefName] // in an array, we can and should [try to] skip encoding the type IDs per-entry
	[U32 typedefOrdinal] // takes priority over name since it's faster
	[u32 byteLength] // if it has a known size, we don't need this
	Byte[] structData
	}

	// type definitions are essentially the object format again, but instead of the value, they encode the primitive type and the name
	datastruct typedef format = {
	// this being the actual type definition, name is mandatory
	String name
	{implicit U32 ordinalID} = the index of this item in the typedef array
	U32 byteLength
	U32 structFixedSize // can be zero, if non-zero and we're reading from an array : parser should assume encoded struct data is this size
	Byte[] data(byteLength) = {
	Array [ // we do not actually count these, just read/write them, we stop whenever byteLength bytes have been read or write byteLength when all properties have been serialized
	{
	typeID type
	String name
	}
	]
	}
	}

	examples :

	given JSON :
	{
	"tuna" : "fish",
	"catnames" : ["garfield","felix","nolegs"],
	"a" : {
	"b" : {
	"c": 0,
	"d": 1,
	"e": 2,
	"f": 3,
	}
	}
	} // 148 chars, prob 148 bytes in this case.

	return serialized object (represented as string with comments denoting bytes and sizes) :

	Object { // 30 <LL LL LL LL> (5)
	Prop {type = String , name = "tuna", value = "fish"}, // 11 04 <T- U- N- A-> 11 04 <F- I- S- H-> (12)
	Prop {type = Array<String>, name = "catnames" , value = ["garfield","felix","nolegs"]},
	// 11 08 <c- a- t- n- a- m- e- s-> 21 00 00 00 03 (15)
	0b{1110 0000} (1)
	[
	08 <g- a- r- f- i- e- l- d->, (9)
	05 <f- e- l- i- x->, (6)
	06 <n- o- l- e- g- s->, (7)
	]
	Prop {
	type = Object,
	name = "a", // 11 01 <a-> (3)
	value = Object { // 30 <LL LL LL LL> (5)
	Prop {
	type = Object,
	name ="b", // 11 01 <b-> (3)
	value = { // 30 <LL LL LL LL> (5)
	Prop {type = U8 , name = "c", value = 0}, // 11 01 <c-> 00 00 (5)
	Prop {type = U8 , name = "d", value = 1}, // 11 01 <d-> 00 01 (5)
	Prop {type = U8 , name = "e", value = 2}, // 11 01 <e-> 00 03 (5)
	Prop {type = U8 , name = "f", value = 3}, // 11 01 <f-> 00 04 (5)
	}
	}
	}
	}
	} // all total : 91 bytes, marginally smaller even in this absurd case, assuming no math errors

	more realistic example, a color palette :

	given JSON :
	{
	"colors" : [
	{"r":0,"g":0,"b":0,"a":0}, ... // we'll say there's 256 of these
	]
	}

	return types:

	Array (2) [ // 00 00 00 02 (4)
	Typedef {
	name = "Palette" // 11 07 <P- a- l- e- t- t- e-> (9)
	id = 0 // implicit (0)
	length // <LL LL LL LL> (4)
	structSize = 0 // 00 00 00 00 (4)
	def = {
	TypeProp {
	// we also include a length for array def, if zero, length is supplied in the encoded struct
	type = Array<TypeDef(ord:1 = "Color")>(0) // 23 0b{0110 0000} 00 00 00 01 , 00 00 00 00 (10)
	name = "colors" // 11 06 <c- o- l- o- r- s-> (8)
	}
	}
	},
	Typedef {
	name = "Color" // 11 05 <C- o- l- o- r-> (7)
	id = 1 // implicit (0)
	length // <LL LL LL LL> (4)
	structSize = 4 // 00 00 00 04 (4)
	def = {
	TypeProp {
	type = U8 // 00 (1)
	name = "r" // 11 <r-> (2)
	}
	TypeProp {
	type = U8 // 00 (1)
	name = "g" // 11 <g-> (2)
	}
	TypeProp {
	type = U8 // 00 (1)
	name = "b" // 11 <b-> (2)
	}
	TypeProp {
	type = U8 // 00 (1)
	name = "a" // 11 <a-> (2)
	}
	}
	}
	] // all total 66 bytes (again, math!)

	return serialized object :

	DataStruct { // 04 0b{0110 0000} (2) , has ordinal, has typedef
	ordinal = 1 // 00 00 00 01 (4)
	length // <LL LL LL LL> (4)
	// 00 00 01 00 (4) , we infer property is named "palette" from typedef[1] : "Palette" and is type Array<Color>, all we need to "define" the array is its length
	Array<Color> palette(256) : [
	struct Color = { // the typedef also explicitely delcares that we have a data struct with a known type and fixed size
	U8 r ,
	U8 g ,
	U8 b ,
	U8 a
	} , ... // 4 bytes a piece times 256 entries is 1024 bytes , we don't need any of the struct declaration overhead because the typedefs handled this for as us as well
	]
	} // all total 1038 bytes

	combined, 1104 bytes... many times smaller than the JSON equivalent
	also smaller than the serialized object equivalent

	another noteworthy optimization is this entire thing can simply be encoded as a U8[](1024)
	because color palettes are such a basic structure(an array of 32-bit RGB[A] values, often 256 total), overhead to define them is simply not necesarry

	care should always be given to the most optimal way to encode your data
	it is missing the point to exclusively use serialized objects

	basic rules would be as follows :


	1] single declaration object : serialized object
	2] array of objects with a common format : array of datastructs + typedefs (if needed)
	3] array of integer numbers : array of [U]Ints of appropriate size + format for your data
	3] multiple files with a common format [containing objects of a common format] : n files with arrays of datastructs + 1 file with typedefs
	4] repeated strings : string table + string reference declaration
	so on, use common sense

	it also is strongly advised to code-gen encoders/decoders for datastructs
	object declaration and typedefs are extremely reflective by nature
	datastructs are especially bad because most real-world examples will likely contain nested structs
	this means recursively calling the typedef encoder/decoder, which is even more detrimental to performance than reflectively reading/writing an object

	the most efficient use of typedefs is for documentation/debugging purposes or authortime tool[chain]s
	runtime production code should use generated parsing code or JIT generate/compile the most optimized parser possible
	the performance penalties otherwise may be servere, especially with larger and more complex files
	that said, just because you don't use the typedef in production, doesn't mean you shouldn't include it
	that is solely at your discretion. OS/FOSS-oriented folk would prefer to include type information :P
	also, treat typedefs with the same level of importance as source code
	it is reasonable to assume that you will change data formats through the development/update cycle
	if you lose your type definitions, you're gonna have a bad time.

	the file format overhead is not required, but is encouraged.
	any "valid"/compliant implementation of the object reader/writer should not care about it
	it is however quite useful for organizing stuff
	and, it doesn't consume a whole lot of bytes

	rationale :

	"but...":
	JSON ? :
	no types, size and bandwidth consumption bloats proportional to complexity/amount of data (a common problem all text-based serializers have).
	can trade-off some performance to run compression over the data

	BSON ? :
	JSON, but binary
	KESF can be used exclusively in object serialization format to be "yet another BSON", but this is not how i intended it to be used

	messagepack ? :
	a step in the right direction, yes... but i still feel its main focus was in being another BSON
	same as for BSON, yes, KESF can more or less act as a substitute, that is not its intended use
	due to my zealous focus on declaring types and sub-types, there are situations where messagepack probably produces [marginally] smaller files
	most likely, however; you mis-used it for that to happen

	kaitai structs ? :
	uses a text-based seriliazer with a name that says enough
	while the side effect of KESF is it can define file formats up to a certain extent, it was designed first and foremost to serialize high-level objects for games and software, compactly

	plain old binary ? :
	go right ahead, can't get any smaller and more optimized than that.
	just remember that it doesn't typically have the same ease of use
	in the most optimal conditions [agressive use of typedefs and codegen], production KESF code/files are essentially plain old binary

	"that one engine!" :
	they each do their own thing, some better than others
	in a rare win for unity [IMO], type information is stored in a dedicated section/file rather than constantly re-specified every time an object is encoded
	my decision to create a type tree/table is pure coincidence, done in complete oblivion to the fact unity uses a similar system, which i learned several months later

	"goals?" :

	1] create a serialization format that was reasonably small without compression while still being capable of containing all the information needed to re-construct a high-level object
	2] enforce typing as loose or as strict as an invidual developer/team wishes to impose on themselves
	3] bridge the gap between "simple text format anyone with notepad can edit" and "small binary format optimized for your computer processor to read", without a paywall/license and tons of complexity as has traditionally been the case
	4] self-documenting binary blob without the insanity and bloat traditionally associated with it
	5] allow for the possiblity to embed arbitary binary data in a serialized object with minimal overhead and especially no use of hex/base64/etc.. text encodings
	6] a file format which clearly identifies itself, declares what it contains, and where those contents are, compactly. RIFX was chosen for the following reasons :
	6a] it's a chunk-container format using 4CCs to indentify its chunks. it also specifies an additional "form type" field which should help discriminate files a specific software isn't meant to parse
	6aa] the only other known RIFX implementation is macromedia/adobe director/shockwave. it seems likely to be the case that macromedia designed* this format.
	"designed" loosely speaking, this is just another incidence of a series of blatant plaigarizations from amgia's IFF format. the "evolution" of RIFX naturally would be IFF (amgia) > RIFF (microsoft) > RIFX (macromedia[?]) > KESF (myself); with each additional party apparently contributing less singiciant change from the chosen base format. (and meaning that i copied it as-is)
	6b] it uses big-endian byte order which is more intuitive to HUMAN PROGRAMMERS, unlike its own base format
	6c] by default, it has practically no meangingful overhead. all chunks have a 4CC (ASCII String with exactly 4 characters, no length or null terminator. this also means it can be read/written as a U32) ID and Uint32 length. the main chunk also has the aforementioned (6a) 4CC formtype. 12 bytes file header information and 8 bytes section header is pretty decent.
	7] as extension of 6, create a data/file format and system which with relative ease can be integrated into any workflow/engine/framework/app/etc... even a novice programmer should be able to figure this out.

	"the name?" :
	see my game library/engine, "Kitten Engine", once i have an implementation/workflow ready, this will be its default data serializtion format, although using it won't be strictly enforced

	WARNING :
	in its current stage, this is still in draft phase and likely contains errors/oversight

	author/copyright : https://github.com/Brian151/