Skip to content

Instantly share code, notes, and snippets.

@NocturnDragon
Forked from dwilliamson/Doc.md
Created January 10, 2018 20:01
Show Gist options
  • Save NocturnDragon/c92a28f4553324bd190c3923d2ef151f to your computer and use it in GitHub Desktop.
Save NocturnDragon/c92a28f4553324bd190c3923d2ef151f to your computer and use it in GitHub Desktop.
Minimal Code Generation for STL-like Containers

This is my little Christmas-break experiment trying to (among other things) reduce the amount of generated code for containers.

THIS CODE WILL CONTAIN BUGS AND IS ONLY PRESENTED AS AN EXAMPLE.

The C++ STL is still an undesirable library for many reasons I have extolled in the past. But it's also a good library. Demons lie in this here debate and I have no interest in revisiting it right now.

The goals that I have achieved with this approach are:

  • Code generation for my game saves roughly 100k across all EXEs and DLLs, bearing in mind I only implemented a vector class and the total EXE/DLL size is 1.2MB. Other types are more complicated and using this approach will save me even more.

  • Seamless interaction with a Reflection API is a natural side-effect with an order of magnitude reduction in generated code. The typical approach is to implement a container interface that the Reflection API will use and implement for each container type. This has the downside that ALL container code for ALL types needs to be generated up-front; the cost is massive. Part of me is wondering how the C++ standards committee intend to fix this (if indeed they even think it's a problem).

  • A big reduction of work for the compiler to perform as the implementation is completely hidden from clients and not stored in header files. This also allows me to place all containers in a single header file and reduce the amount of files the compiler has to request.

  • I can place my own STL in a DLL and save considerable generated code.

The big downside to this approach is that the containers will typically be slower than a template-based solution, particularly for non-POD data types. Storage of POD types will be close to the performance of the full template approach but still slower.

There are a few libraries out there that explore different approaches to the STL. Some examples:

  • tinystl - https://github.com/mendsley/tinystl A clean, minimal implementation of the STL using the typical templates-in-headers approach. Generates less code than the STL but only by virtue of offering much less.
  • uSTL - https://msharov.github.io/ustl/ Attempts to minimise generated code through putting all memory operations into untyped buffers. Actually gets most of its code-gen benefits from just being a simpler STL (see below).
  • RDESTL - https://github.com/msinilo/rdestl Not tried this one but worth mentioning as it's aimed at game development. Smaller and simpler than the STL with quite a few performance benefits.

What I've found is that, after 20 years of having templates in the language, compilers are amazing at inlining generated template code and amortising the cost with everything around it. In my experiments, a uSTL-inspired vector actually generates MORE code these days and for small executable sizes it was next to impossible to beat the compiler with any other approach. To summarise:

  • If your project is <100k then you will typically have a small amount of container instantiations and individual function calls. The compiler will do a better job of generating the smallest amount of code so you might as well stick with a typical basic template implementation (or just using a C approach).

  • If your project is <800k then this tightly tuned approach will give you some good savings. It's quite brittle to cleanup though (e.g. adding a type traits object) and you can easily lose generated code gains if you're not careful.

  • Anything bigger should see massive savings and afford you the luxury of making the code cleaner at the expense of some more generated code.

void VectorData::Construct(u32 type_size_, bool trivial_construct, bool trivial_copy_construct, bool trivial_destruct)
{
// Set default properties with type traits
items = nullptr;
size = 0;
capacity = 0;
type_size = type_size_;
construct = !trivial_construct;
copy_construct = !trivial_copy_construct;
destruct = !trivial_destruct;
}
void VectorData::Construct(const VectorData& rhs, CopyConstructPtr copy_constructor)
{
// Copy properties from source VectorData
size = rhs.size;
capacity = rhs.capacity;
type_size = rhs.type_size;
construct = rhs.construct;
copy_construct = rhs.copy_construct;
destruct = rhs.destruct;
// Allocate capacity
items = (u8*)malloc(capacity * type_size);
if (copy_construct)
{
// Copy construct all items from source VectorData
u8* dest_item = items;
u8* src_item = rhs.items;
for (u32 i = 0; i < size; i++)
{
copy_constructor(dest_item, src_item);
dest_item += type_size;
src_item += type_size;
}
}
else
{
// Memory copy is safe otherwise
memcpy(items, rhs.items, size * type_size);
}
}
void VectorData::Construct(u32 size_, u32 type_size_, bool trivial_construct, bool trivial_copy_construct, bool trivial_destruct, ConstructPtr constructor)
{
size = size_;
capacity = size_;
type_size = type_size_;
construct = !trivial_construct;
copy_construct = !trivial_copy_construct;
destruct = !trivial_destruct;
// Allocate capacity
items = (u8*)malloc(capacity * type_size);
// Call default constructor for all items, only if required
if (construct)
{
u8* item_ptr = items;
for (u32 i = 0; i < size; i++)
{
constructor(item_ptr);
item_ptr += type_size;
}
}
}
void VectorData::Destruct(DestructPtr destructor)
{
if (items != nullptr)
{
Clear(destructor);
free(items);
}
}
void VectorData::Reserve(u32 new_capacity, CopyConstructPtr copy_constructor, DestructPtr destructor)
{
// Only reserve larger capacities
if (new_capacity > capacity)
{
// Allocate a new set of data with an aligned size
capacity = AlignPow2(new_capacity, 16);
u8* new_items = (u8*)malloc(capacity * type_size);
// Need to copy existing items to the new storage area
if (items != nullptr)
{
// Memory walk is far more expensive than an extra outer loop branch so try to do everything in
// one pass with trickier code than using the simpler two pass approach.
if (copy_construct)
{
if (destruct)
{
// Copy construct the old data to the new location while destructing the old data
u8* dest_item = new_items;
u8* src_item = items;
for (u32 i = 0; i < size; i++)
{
copy_constructor(dest_item, src_item);
destructor(src_item);
dest_item += type_size;
src_item += type_size;
}
}
else
{
// Destruction not required so just copy construct all items to the new location
u8* dest_item = new_items;
u8* src_item = items;
for (u32 i = 0; i < size; i++)
{
copy_constructor(dest_item, src_item);
dest_item += type_size;
src_item += type_size;
}
}
}
else
{
// No construction required so use a byte copy
memcpy(new_items, items, size * type_size);
// Call destructor on old items if required
if (destruct)
{
u8* item_ptr = items;
for (u32 i = 0; i < size; i++)
{
destructor(item_ptr);
item_ptr += type_size;
}
}
}
// Release old item storage
free(items);
}
items = new_items;
}
}
void VectorData::Resize(u32 new_size, ConstructPtr constructor, CopyConstructPtr copy_constructor, DestructPtr destructor)
{
// Ensure there's enough allocated memory
// TODO: Better growth strategy
if (new_size > capacity)
Reserve(new_size, copy_constructor, destructor);
// Destruct elements above the new size
if (new_size < size && destruct)
{
u8* item_ptr = items + new_size * type_size;
for (u32 i = new_size; i < size; i++)
{
destructor(item_ptr);
item_ptr += type_size;
}
}
// Default construct elements above the old size
else if (new_size > size && construct)
{
u8* item_ptr = items + size * type_size;
for (u32 i = size; i < new_size; i++)
{
constructor(item_ptr);
item_ptr += type_size;
}
}
size = new_size;
}
// TODO: Remove
void VectorData::ClearResize(u32 new_size, ConstructPtr constructor, CopyConstructPtr copy_constructor, DestructPtr destructor)
{
Clear(destructor);
Resize(new_size, constructor, copy_constructor, destructor);
}
void VectorData::Clear(DestructPtr destructor)
{
// Destruct all items if required
if (destruct)
{
u8* item_ptr = items;
for (u32 i = 0; i < size; i++)
{
destructor(item_ptr);
item_ptr += type_size;
}
}
size = 0;
}
void* VectorData::PushBack(const void* object, CopyConstructPtr copy_constructor, DestructPtr destructor)
{
// Ensure there's enough space to store the new item
Reserve(size + 1, copy_constructor, destructor);
// Copy new item to the end of the vector
// TODO: Improve performance with a switch on common integer types sizes?
u8* dest_item = items + size * type_size;
if (copy_construct)
copy_constructor(dest_item, object);
else
memcpy(dest_item, object, type_size);
size++;
return dest_item;
}
void VectorData::PushBackUnique(const void* object, CopyConstructPtr copy_constructor, DestructPtr destructor, EqualPtr equal)
{
// Leave, not adding if the item already exists
u8* item_ptr = items;
for (u32 i = 0; i < size; i++)
{
if (equal(item_ptr, object))
return;
}
PushBack(object, copy_constructor, destructor);
}
void VectorData::PopBack(DestructPtr destructor)
{
Assert(size > 0);
size--;
if (destruct)
destructor(items + size * type_size);
}
void VectorData::RemoveUnstable(u32 index, DestructPtr destructor, CopyPtr copy)
{
// Copy from the back over the top of the item being removed
// Copy function will always contain the optimal compiler-generated copy,
// which includes calling the destructor on the dest object
u8* last_item = items + (size - 1) * type_size;
copy(items + index * type_size, last_item);
// Destruct the copied item
if (destruct)
destructor(last_item);
size--;
}
void VectorData::Remove(u32 index, DestructPtr destructor, CopyPtr copy)
{
// Shift all items above the removed item, one down
// TODO: memcpy everything for trivial destructor + assignment
u8* item_ptr = items + index * type_size;
for (u32 i = index; i < size - 1; i++)
{
copy(item_ptr, item_ptr + type_size);
item_ptr += type_size;
}
// Destruct the left over item
if (destruct)
destructor(item_ptr);
size--;
}
void VectorData::RemoveRange(u32 start, u32 count, DestructPtr destructor, CopyPtr copy)
{
u32 end = start + count;
Assert(end <= size);
u32 left_over = size - end;
// Shift all items above the removed items down
// TODO: memcpy? Needs trivial assign detection
u8* dest_item = items + start * type_size;
u8* src_item = dest_item + count * type_size;
for (u32 i = 0; i < left_over; i++)
{
copy(dest_item, src_item);
dest_item += type_size;
src_item += type_size;
}
// Destruct the items left behind at the top
if (destruct)
{
u8* item_ptr = items + end * type_size;
for (u32 i = end; i < size; i++)
{
destructor(item_ptr);
item_ptr += type_size;
}
}
size -= count;
}
void VectorData::CopyFrom(const VectorData& src, CopyConstructPtr copy_constructor, DestructPtr destructor)
{
// Empty and ensure there's enough capacity
Clear(destructor);
Reserve(src.size, copy_constructor, destructor);
if (copy_construct)
{
// Copy construct all items over from source
u8* dest_item = items;
u8* src_item = src.items;
for (u32 i = 0; i < src.size; i++)
{
copy_constructor(dest_item, src_item);
dest_item += type_size;
src_item += type_size;
}
}
else
{
// Bit copy instead
memcpy(items, src.items, src.size * type_size);
}
size = src.size;
}
// Functions used by all containers
+template <typename TYPE> void Construct(void* dest)
+{
+ new ((TYPE*)dest) TYPE;
+}
+template <typename TYPE> void Destruct(void* dest)
+{
+ ((TYPE*)dest)->~TYPE();
+}
+template <typename TYPE> void CopyConstruct(void* dest, const void* src)
+{
+ new ((TYPE*)dest) TYPE(*(TYPE*)src);
+}
+template <typename TYPE> void Copy(void* dest, const void* src)
+{
+ *(TYPE*)dest = *(TYPE*)src;
+}
+template <typename TYPE> bool Equal(const void* a, const void* b)
+{
+ return (*(const TYPE*)a) == (*(const TYPE*)b);
+}
+
+// Function pointers for anonymous use
+typedef void (*ConstructPtr)(void* dest);
+typedef void (*CopyConstructPtr)(void* dest, const void* src);
+typedef void (*DestructPtr)(void* dest);
+typedef void (*CopyPtr)(void* dest, const void* src);
+typedef bool (*EqualPtr)(const void* a, const void* b);
//
// Private implementation stored as POD so that it can be in a union
//
struct VectorData
{
void Construct(u32 type_size_, bool trivial_construct, bool trivial_copy_construct, bool trivial_destruct);
void Construct(const VectorData& rhs, CopyConstructPtr copy_constructor);
void Construct(u32 size_, u32 type_size_, bool trivial_construct, bool trivial_copy_construct, bool trivial_destruct, ConstructPtr constructor);
void Destruct(DestructPtr destructor);
void Reserve(u32 new_capacity, CopyConstructPtr copy_constructor, DestructPtr destructor);
void Resize(u32 new_size, ConstructPtr constructor, CopyConstructPtr copy_constructor, DestructPtr destructor);
void ClearResize(u32 new_size, ConstructPtr constructor, CopyConstructPtr copy_constructor, DestructPtr destructor);
void Clear(DestructPtr destructor);
void* PushBack(const void* new_item, CopyConstructPtr copy_constructor, DestructPtr destructor);
void PushBackUnique(const void* new_item, CopyConstructPtr copy_constructor, DestructPtr destructor, EqualPtr equal);
void PopBack(DestructPtr destructor);
void RemoveUnstable(u32 index, DestructPtr destructor, CopyPtr copy);
void Remove(u32 index, DestructPtr destructor, CopyPtr copy);
void RemoveRange(u32 start, u32 count, DestructPtr destructor, CopyPtr copy);
void CopyFrom(const VectorData& src, CopyConstructPtr copy_constructor, DestructPtr destructor);
// Untyped storage
u8* items;
u32 size;
u32 capacity;
// Type traits packed into data as a separate object increases code gen size
u32 type_size : 29;
bool construct : 1;
bool copy_construct : 1;
bool destruct : 1;
};
//
// Public interface
//
template <typename TYPE>
class Vector
{
public:
Vector()
{
m_Data.Construct(
sizeof(TYPE),
__has_trivial_constructor(TYPE),
__has_trivial_copy(TYPE),
__has_trivial_destructor(TYPE));
}
explicit Vector(const Vector& rhs)
{
m_Data.Construct(rhs.m_Data, CopyConstruct<TYPE>);
}
explicit Vector(u32 nb_entries)
{
m_Data.Construct(
nb_entries,
sizeof(TYPE),
__has_trivial_constructor(TYPE),
__has_trivial_copy(TYPE),
__has_trivial_destructor(TYPE),
Construct<TYPE>);
}
~Vector()
{
m_Data.Destruct(Destruct<TYPE>);
}
void reserve(u32 new_capacity)
{
m_Data.Reserve(new_capacity, Construct<TYPE>, Destruct<TYPE>);
}
void resize(u32 new_size)
{
m_Data.Resize(new_size, Construct<TYPE>, CopyConstruct<TYPE>, Destruct<TYPE>);
}
// TODO: remove
void clear_resize(u32 nb_entries)
{
m_Data.ClearResize(nb_entries, Construct<TYPE>, CopyConstruct<TYPE>, Destruct<TYPE>);
}
TYPE& push_back(const TYPE& entry)
{
return *(TYPE*)m_Data.PushBack(&entry, CopyConstruct<TYPE>, Destruct<TYPE>);
}
void push_back_unique(const TYPE& entry)
{
m_Data.PushBackUnique(&entry, CopyConstruct<TYPE>, Destruct<TYPE>, Equal<TYPE>);
}
void pop_back()
{
m_Data.PopBack(Destruct<TYPE>);
}
void remove_unstable(u32 index)
{
m_Data.RemoveUnstable(index, Destruct<TYPE>, Copy<TYPE>);
}
void remove(u32 index)
{
m_Data.Remove(index, Destruct<TYPE>, Copy<TYPE>);
}
void remove_range(u32 start, u32 count)
{
m_Data.RemoveRange(start, count, Destruct<TYPE>, Copy<TYPE>);
}
void clear()
{
m_Data.Clear(Destruct<TYPE>);
}
void copy_from(const Vector& src)
{
m_Data.CopyFrom(src.m_Data, CopyConstruct<TYPE>, Destruct<TYPE>);
}
u32 byte_size() const
{
return m_Data.size * sizeof(TYPE);
}
u32 size() const
{
return m_Data.size;
}
const TYPE& front() const
{
Assert(m_Data.size > 0);
return *items;
}
TYPE& front()
{
Assert(m_Data.size > 0);
return *items;
}
const TYPE& back() const
{
Assert(m_Data.size > 0);
return *(items + m_Data.size - 1);
}
TYPE& back()
{
Assert(m_Data.size > 0);
return *(items + m_Data.size - 1);
}
const TYPE* data() const
{
return items;
}
TYPE* data()
{
return items;
}
const TYPE& operator [] (u32 i) const
{
Assert(i < m_Data.size);
return *(items + i);
}
TYPE& operator [] (u32 i)
{
Assert(i < m_Data.size);
return *(items + i);
}
private:
// Prevent implicit copies - use copy_from instead
Vector& operator = (const Vector& rhs);
// Union with typed pointer for debugger viewing and simpler indexing
union
{
VectorData m_Data;
TYPE* items;
};
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment