-
bool
-
unsigned integers
char
u8
u16
u32
u64
-
signed integers
s8
s16
s32
s64
-
floating point
f32
f64
Basic types can also have minimum and maximum values specified when used within a struct
type.
The value are clamped only clamp when using the generated set
functions or when encoding or
decoding the data as cson.
Example:
struct Example {
u16(100, 300) field_a;
s16(-100, 300) field_b;
f32(0.f, 100.f) field_c;
};
Here is the example of the code generated version of that struct in C:
typedef struct Example Example;
struct Example {
u16 field_a;
s16 field_b;
f32 field_c;
};
#define example_field_a_set(obj, value) ((obj).field_a = clamp_u16(value, 100, 300))
#define example_field_b_set(obj, value) ((obj).field_b = clamp_s16(value, -100, 300))
#define example_field_c_set(obj, value) ((obj).field_c = clamp_f32(value, 0.f, 100.f))
u8x2
u16x2
u32x2
u64x2
s8x2
s16x2
s32x2
s64x2
f32x2
f64x2
u8x3
u16x3
u32x3
u64x3
s8x3
s16x3
s32x3
s64x3
f32x3
f64x3
u8x4
u16x4
u32x4
u64x4
s8x4
s16x4
s32x4
s64x4
f32x4
f64x4
Quat
Matrix3d
Transform3d
As you can see by this example, it is very similar to C but has some additions and omissions.
struct Example {
u32 field;
u32 array[10];
u32 count;
u16 vla[count];
};
struct
types support Variable Length Arrays when the array count uses a field name instead of an integer.
The VLAs field will be replace by a u32
relative byte offset from this field's memory address.
This can be used to access the array using the code generated function
Here is the example of the code generated version of that struct in C:
struct Example {
u32 field;
u32 array[10];
u32 count;
u32 vla_byte_offset;
};
u16* example_vla(Example* example) {
return (u16*)(((u8*)example) + example->vla_byte_offset);
}
u16* example_vla_alloc_vla(Example* obj, BinaryWriter* w, u32 count) {
obj->count = count;
binary_writer_alloc_vla(Example, w, count, &obj->vla_byte_offset);
return example_vla(obj);
}
string types are just like C where they are a null-terminated array of char
. this can either be
a fixed length array or VLA.
struct Example {
char name[32];
u32 description_size;
char description[description_size];
};
Packed types only work on struct
types and they act like supercharged bitfields.
We want to be able to express de/encoding integers, floats and enums down into a smaller series of bits. The packed type allows you to express how many bits you want to pack down, followed by a min and max value. This will allow the system to generate un/packing functions for your structure type automatically.
Example:
struct Example #bitfield(u32) {
u32 field_a;
u16(:8, 100, 300) field_b;
f32(:21, 0.f, 100.f) field_c;
Enum(:3) field_d;
};
field_a
is just a regular field. field_b
is a u16
that gets packed down into 8 bits within
the range of 100 to 300 inclusively. field_c
is a f32
that ges packed down into 21 bits within
the range of 0.f to 100.f. field_d
is an enum that is packed down into 3 bits with a range from
its min to its max value. The more bits the more precision, the less range the more precision.
#bitfield(T)
means that bitfields word size and alignment will be of type T
where T
is a basic type.
Bitfields can span over word boundaries at the cost of un/packing performance.
If #bitfield(T)
is not specified then T
will default to u32
.
Here is the example of the code generated version of that struct in C:
typedef struct Example Example;
struct Example {
u32 field_a;
u32 bitfields0;
};
#define example_field_b(obj) ...
#define example_field_b_set(obj, value) ...
#define example_field_c(obj) ...
#define example_field_c_set(obj, value) ...
#define example_field_d(obj) ...
#define example_field_d_set(obj, value) ...
As you can see by this example, it is very similar to C. Like in C, unions are untagged.
union Example {
u32 integer;
f32 float_;
};
As you can see by this example, it is very similar to C. And we also support c23 underlying type specifier but we generate this with a typedef as we don't use c23.
// defined in .enc file
enum EntType : s16 {
ENT_TYPE_PERSON = 0,
ENT_TYPE_FROG = 1,
ENT_TYPE_DRAGON = -3,
ENT_TYPE_PIG,
ENT_TYPE_MOUSE,
};
// generated C code
typedef s16 EntType;
enum EntType {
ENT_TYPE_PERSON = 0,
ENT_TYPE_FROG = 1,
ENT_TYPE_DRAGON = -3,
ENT_TYPE_PIG = -2,
ENT_TYPE_MOUSE = -1,
};
constexpr
is supported for basic types only so you can create arrays and packed types that
are defined by constants. Because C doesn't support constexpr
until c23, we current export
it as a #define
// defined in .enc file
constexpr u32 ENT_NAME_CAP = 32;
constexpr f32 PACKED_MIN = 0.f;
constexpr f32 PACKED_MAX = 128.f;
struct Ent #bitfield(u8) {
char name[ENT_NAME_CAP];
f32 packed(u8, PACKED_MIN, PACKED_MAX);
};
// generated C code
#define ENT_NAME_CAP 32
constexpr f32 PACKED_MIN = 0.f;
constexpr f32 PACKED_MAX = 128.f;
struct Ent {
char name[ENT_NAME_CAP];
u8 packed;
};
Targets are ways of specifying how you wish to use the encoder system with this .enc file. Each target that is enabled can enable features in encoder system as well disable features because they will not be compatible with that target.
Targets cannot be added or removed from an .enc file. You specify them at the top of the file
These sets of target modes, mean we want a versioned file that possibly gets upgraded that can be copied to the gpu and used there:
#target file/gpu binary
These sets of target modes, mean we want a versioned file that possibly gets upgraded that can be copied to over the network:
#target file/net binary
This target mode, means we want to un/pack data in gpu memory:
#target gpu binary
We have an editor where we make custom assets and save these out to disk. We also would like to add more features and make changes without losing assets that have already been made. We would also like to be able to use text files most of the time in development so we can visualise the data better. And to be able to ship binary files of assets for speed & size reasons.
When you use the file
target, a single .enc file represents a single file type.
So you must specify the single root struct
type with the #root
directive so:
// defined in .enc file
struct Example #root {
u32 field;
u32 array[10];
u32 count;
u32 vla_byte_offset;
};
// generated C code:
struct Example #root {
EncBinHeaderV0 header;
u32 field;
u32 array[10];
u32 count;
u32 vla_byte_offset;
};
The generated code also contains the header field with the magic number and more that you can read about later in the docs.
Every file needs a magic number for ensuring that the correct file format is being processed.
The magic number is always a 4 byte hexidecimal number.
You specify the #magic
directive below the #target
directive:
#target file binary
#magic 0x454E434F // ENCO
When writing network packets you benefit a lot from packing your structures down before sending them over the wire. This will help you send less data and reduce the amount of data the goes missing and has to be resent.
You also gain access to versioning so you can communicate correct with other clients on the same version.
When reading/writing packets from the packet buffer, the user will cast the byte stream into the code generated struct and manually process the packet themselves.
When you are writing the data infrequently and reading the same data multiple times and maybe across many threads. You might benefit by packing your data structures to become more cache friendly.
On the GPU you are usually read the same data across different CUs and this uses up space in caches. Also memory bandwidth is the first major problem to solve. So packing data down as small as possible is usually the way to go for a lot of GPU algorithms.
Manually writing un/packing code is error prone and takes time and this is the main problem the encoder system solves for gpu the use cases.
Features are aspects of the encoder system that change depending on the targets.
Each feature can be:
- ENABLED : feature is enabled by the target unless another target says it is unsupported
- UNCHANGED : feature is supported by this target but this target does not enable the feature
- UNSUPPORTED : feature is not supported by the target and completely disables the feature
file
: ENABLEDnet
: UNCHANGEDgpu
: UNSUPPORTEDcpu
: UNCHANGED
Variable Length Arrays inside structs are only enabled by the target file
since they work by encoding a u32 relative offset from where the field is.
For files this is very easy to achieve since we just bump allocate arrays on after the root object.
Is is not enabled by net
target mode as the packet size is only ~64K, so other techniques just work out better.
And for gpu
buffers are a single typed so supporting it will only really work if we did an
'untyped' u32 buffer where we reconstruct everything. And I don't see a use case for this at this time.
file
: ENABLEDnet
: UNCHANGEDgpu
: UNCHANGEDcpu
: UNCHANGED
CSON aka. the C-like JSON. Only really has a use when serialising out to a file
.
When using text encoding, union
s will need to be decorated with #tag
& #key
directives
and the field where the union
is used needs a #tag
directive. This is so that
the generated de/encoder can know what type it is expecting the union be.
This is not a problem for binary files, since it just copies the raw bytes.
enum EntType {
ENT_TYPE_DRAGON,
ENT_TYPE_PIG,
};
struct DragonEnc {
u32 something;
};
struct PigEnc {
u32 something;
};
union EntData #tag EntType {
DragonEnc dragon #key ENT_TYPE_DRAGON;
PigEnc pig #key ENT_TYPE_PIG;
};
struct EntEnc {
u32(:24, 0, 16777215) some_value;
EntType(:8) ent_type;
EntData data #tag ent_type;
};
struct
& union
fields that are integers can specify that they should be encoded as hexidecimal
when being encoding as text (cson):
struct Example {
u32 mask #hex;
}
When reading from a text file, all values might not be present. To solve this you can either do nothing and let it use the default value for that type. Or use default values like so:
struct Example {
u32 field0 = 123;
u32 field1; // initialized to 0
};
struct Example2 {
Example field; // will use the default values of Example
};
Default values are just parsed as strings so there is no error checking done as part of enc-gen. The string is pasted in as the text decoding code is generated.
Specified Default Values are only supported for:
- Basic Types
- Enums Types
- Vector Types
- Packed Types
Unspecified Default Value:
- Basic are zero initialized
- Enums are zero initialized
- Packed Types are zero initialized
- Vectors are zero initialized
- Quaterneon is identity initialized
- Matrix3d is identity initialized
- Transform3d is identity initialized
- structs follow their field default initializers
- unions follow the type of the field that the tag specifies
- arrays follow their base type default initializers
When using text encoding, you are not allowed to remove types or fields from your .enc file.
Instead you can used the #removed
directive on type declarations, fields and enum values like so:
struct Pig #removed {
u32 age;
};
enum EntType {
ENT_TYPE_DRAGON,
ENT_TYPE_PIG #removed,
};
All this does is append __REMOVED
to the identifiers in code generation. The types, fields and values
will be kept around so that they can still be used in the upgrade step later on if you change your mind
and want to preserve the data.
The benefits of using this is:
- the change in identifiers will break code so you can find all uses of it when the field is
#removed
- when you put the file into
#dev
mode, it will ask you to actually delete all of your#removed
types, fields and values
file
: ENABLEDnet
: ENABLEDgpu
: UNCHANGEDcpu
: UNCHANGED
file
target needs versioning so we can preserve assets.
net
target needs versioning to ensure other clients are on the same version so they can communiate.
gpu
& cpu
targets alone purely work with types that are released in binary code, so no versioning is needed at all.
file
: ENABLEDnet
: UNCHANGEDgpu
: UNCHANGEDcpu
: UNCHANGED
The main purpose for the file
target is assets that we either edit in the editor or a baked asset we load from disk.
We need to be able to edit these types easily while developing new features but also provide a way for us to upgrade
old assets to the new version. net
will only need versioning to ensure other clients are on the same version.
gpu
& cpu
targets are purely designed with what is released in binary code, so no versioning is needed at all.
file
: ENABLEDnet
: ENABLEDgpu
: UNSUPPORTEDcpu
: ENABLED
For net
target mode, you really benefit from packing your structs into as small of the range as possible.
All modern compilers support a "packed" struct that removes all padding between fields by aligning each
field to 1 byte. This comes at a performance cost when reading & writing to the struct as field are no longer aligned.
Here is an example of using packed struct:
// defined in .enc file
struct Example #packed {
u8 field0; // span bytes 0..=1
u32 field1; // span bytes 1..=5
};
// defined in .enc file
struct Example {
u8 field0; // span bytes 0..=1
u32 field1; // span bytes 4..=8
};
file
: ENABLEDnet
: ENABLEDgpu
: UNSUPPORTEDcpu
: ENABLED
Since we want support a wider range of GPU hardware we restricted to basic types of u32
, s32
and f32
.
Bitfields will always be a u32
word and vectors types have the same basic type restrictions.
Binary data is de/encoded in memory inplace using data structure generated by enc-gen
that
were defined in your .enc files. This means you can directly operate on the data structures
by casting the opaque array of bytes into the structure you wish to de/encode.
For the file
target mode, this array of opaque bytes will by your root object type, defined
with the #root
directive in your .enc file. All VLA data will be stored directly after
the root object.
After running enc-gen
at the bottom of the generated header file you will find a bunch of
functions to help decoding & encoding both text and binary files
When encoding Binary files, you will use the BinaryWriter
to write out your binary file in memory.
Then when you are finished you can call a generated function to verify before saving it to disk.
Checking the checksum on a 50MB files takes ~50ms so this might be a thing you want to skip or only do once.
bool binary_verify_no_checksum_V0EditorMaterial(CoreString file_path, ByteView mem, char error_message[static ENC_MESSAGE_SIZE]);
bool binary_verify_with_checksum_V0EditorMaterial(CoreString file_path, ByteView mem, char error_message[static ENC_MESSAGE_SIZE]);
When decoding Binary files, you should run one of verify functions above to make sure your data has been read in properly.
If you format has any upgrades, an upgrade function will be generated for you to use.
It will need a BinaryWriter
just in case there are any upgrades that need to run.
T* binary_upgrade_T(BinaryWriter* w, EncBinHeaderV0* v, char error_message[static ENC_MESSAGE_SIZE]);
When encoding Text (cson) files, you first encode a binary file in memory. Then you call the cson generated function and this will encode the binary data as text before saving it out to disk.
CoreString cson_encode_mem_T(T* obj, CoreLinearAlctor* alctor, char error_message[static ENC_MESSAGE_SIZE]);
bool cson_encode_asset_T(T* obj, CoreString asset_path, CoreLinearAlctor* alctor, char error_message[static ENC_MESSAGE_SIZE]);
When decoding the Text (cson) files, you will need to pass in a BinaryWriter
so the cson can be
decoded into a binary file in memory.
T* cson_decode_mem_T(BinaryWriter* w, CoreString file_path, CoreString cson, char error_message[static ENC_MESSAGE_SIZE]);
T* cson_decode_asset_T(BinaryWriter* w, CoreString asset_path, char error_message[static ENC_MESSAGE_SIZE]);
A .enc file needs to choose to be a text and/or binary in the target directive itself. This can only be set once & cannot change.
Data types are only exported as text (cson). This should be use for things like configuration files where you would like the user to be able to edit these in the shipped version of the game:
#target file/... text
Data types are only exported as binary. This is useful for data that simply isn't well suited for text like mesh & pixel data. When used with the versioning feature, it should only be used when for types that do not change that much, as with binary data you will need to up the version for every time you want to make changes to the data layout:
#target ... binary
Data types are exported as text (cson) in development and released in binary. This works extremely well when your data types change a lot and if they are represented well in text form:
#target file/... text/binary
The versioning feature only applied when using targets file
or net
When data is encoded as text files (cson) you can do the following changes without updating the version:
- add struct/union fields
- add enum values
- add type
- change struct/union fields order
- change enum values
- change basic type min or max values
- change packed type bits_count, min or max values
- change enum underlying type
- change default values
- change #bitfield(T)
- add/remove #packed
But the following will require updating the version:
- remove type
- change a struct/union field type
- rename/remove a struct/union field
- rename/remove a enum value names
- change #tag or #key names
- change #root type
When data is encoded as binary you can do the following changes without updating the version:
- nothing
But the following will require updating the version:
- add/remove struct/union fields
- add/remove enum values
- add/remove type
- rename a struct/union field
- rename a enum value
- change a struct/union field type
- change basic type min or max values
- change packed type bits_count, min or max values
- change enum underlying type
- change #bitfield(T)
- add/remove #packed
- change #root type
when you need to make breaking changes to your data structures, you will need to place your .enc file into #dev
mode:
- #dev upgrade | ups the file version number and you will upgrade from past versions
- #dev noupgrade | ups the file version number and all past data will be discarded and started from scratch
- #dev amend | keep the same file version number, but you must sure you revert you assets made with this version
While you are in #dev
mode, you can change around between them. noupgrade only deletes past data structures when exiting
dev mode.
#dev
mode is configured like so:
#target ...
#magic ...
#dev ...
For files that export to a text file or (exclusively) binary data. The workflow is quite simple. If you need
to make those changes that require you to up version. You put the file into #dev
mode then make all of
you changes you need then remove the #dev
directive when you are done. The version will go up by 1
when entering #dev
mode.
When you support encoding to both text files and binary files. It works mostly the same, but when you leave #dev
mode,
the version number will be increased again by 1. You can think of it as text files will be encoded with even version
numbers and binary files will be encoded with odd version numbers. This is so that when we make changes to the
development text file, we will not be making changes to the binary file data types.
While you are in #dev
mode, you are going to be changing the data types and invalidating any files you have
saved out while in #dev
mode. To fix this files saved out in dev mode will be suffixed with -devm-HASH
where HASH
is the hash all the information about data type itself. This will prevent loading any invalid
data and allow you to retest the upgrade path from the previous version by just delete the dev files or
changing the type itself.
The magic & version number is encoded at the top of the .cson file as an integer like so:
#magic ...
#version ...
For a binary file we will have the following header:
struct EncBinHeaderV0 {
u32 magic;
u32 enc_version;
u32 file_version;
u32 header_size;
u64 data_types_hash;
u64 file_size;
u64 checksum;
};
The magic number ensures the file is the correct file format.
The enc_version number exists so make changes to the EncBinHeader itself in the future if we need to.
The file_version number exists so we can avoid loading newer data and upgrade from previous versions.
The header_size allows us to skip the header to where the root object starts independent of enc_version.
The data types hash ensures that the version wasn't changed on a different branch. This is not yet supported.
The file size gives you the full size of the file in bytes. Useful after you have written a file
into memory using the BinaryWriter
and you can retrieve the file size from the header.
The checksum will be a way for us to tell our users that the asset is corrupted. We still need to
properly handle invalid data when reading from encoded files for security purposes. The cson file
does not need the checksum as the parser will tell us if it is corrupted.
Every generated data type for when you target file
will be prefixed with the version number.
This is so we can support upgrading using the data types directly:
// defined in .enc file
struct Example {
u32 field0;
u32 field1;
};
// generated struct name
typedef struct V0Example V0Example;
struct V0Example {
u32 field0;
};
// generated struct name
typedef struct V1Example V1Example;
struct V1Example {
u32 field0;
u32 field1;
};
When the game tries to decode a file and the version is older the current one. The user creates their own upgrade function that will take the in memory binary representation of the file and emit the new version into a new buffer. This new version will then be saved out in it's place and then the standard manual processing of the binary file continues.
When the file
or net
target is being used for an .enc file, the versioning future will
be enabled. This is achieved by the enc-gen
with a database file that is auto generated.
The .encdb looks identical to the .enc file but values are explicit and some extra directives
exist.
It contains the declaration so that validation can be performed and errors can be reported
if the user makes changes that are not allowed. Then when #dev
mode is enabled and the
user makes edits, the database file will be updated accordingly.
When you have the upgrade feature enabled via by having the file
target. The database will also
preserve all past types from previous versions of the file. Unless the #dev noupgrade
was used that completely removes all past versions.
Upgrades cause the types to be declared multiple times, once for each version they exist in.
The #version ...
directive acts as a marker for where that version's types begin.
The data types hash is encoded in the #version
directive so we can check if any of the
past versions have been modified at all. Also this can be used to possibly support
branching in the future that will merge branched versions.
#version 0 0 15d44abec7e2cf3a
... // types are declared here
#version 0 1 65a93cc548ea89e8