Space Optimization Tricks

Arrange member sequence

From EECS370 we can learn alignment and know how to arrange member sequence to minimize the total memory usage. (Just arrange it from smallest to biggest).

Bit field

union {
    struct Foo {
        char a : 6;
        bool b : 1;
        bool c : 1;
    } w,
    uint8_t v;
}; // Access non-active member (or casting) is valid in C, but may UB in C++

Set alignment

#pragma pack(push, n)
strut Baz {
    
};
#pragma pack(pop)

struct Foo {
    uint32_t a;
    uint16_t b;
} __attribute__((pack, aligned(1)));

-fpack-struct

I just cannot figure out how to use it and STL container together. It errors.

Use difference container/allocator, or use C-style array

vector<T> -> set<T>/map<T> (small number of objects)

vector<vector<T>> -> new T[N * M] (24 bytes of extra vector<T>)

vector<vector<T>> vec(n, vector<T>(m)) costs sizeof(vector<T>) + n * sizeof(vector<T>) + n * m * sizeof(T)

// Default
template <typename T>
using V = vector<T, __gnu_cxx::new_allocator<T>> // Look at the extension allocator part in [ref](https://gcc.gnu.org/onlinedocs/libstdc++/manual/memory.html);

Bit representation

Just pack n conditions to log_2(n) + 1 bits with complex mapping table

Using `std::pair` and `std::tuple`?

EBO (Empty Base Optimization) for `pair`

Actually, empty struct in C/Cpp costs 1 byte. However, as base classes, they acts like not-exists.

When it meets std::pair<int, empty_struct>, the size is 8 rather than 4. You can use boost::compressed_pair to do EBO here.

Space and EBO 2, what kind of implementation the `tuple` is?

EBO: libstdc++-v3 does __empty_not_final optimization here

Actually, there are two kinds of tuple implementation. One is naive inheritance.

template <typename T>
struct tuple_element {
    T value;
};
template <typename ...Args>
struct tuple : tuple_element<T>... {};

This acts just like arranging members in order inside struct. The alignemnt should be the largest among base classes and members variables.

However, another implementation of tuple may be composition

template <size_t N, typename ...Args>
struct tuple_element;

template <typename T>
struct tuple_element<1, T> { T value; };

template <size_t N, typename T, typename ...Args>
struct tuple_element<N, T, Args...>{
    T value;
    tuple_element<N - 1, Args...> rest;
    // Or write T value here
};

template <typename ...Args>
struct tuple {
    tuple_element<sizeof...(Args), Args...> elem;
};

That's a difference. AFAIK, MSVC use the second one and libstdcxx/libcxx use the first one.

Note: libstdc++-v3 uses

struct _Tuple_impl<Idx, Head, Tails...>
    : public _Tuple_impl<Idx + 1, Tails...>, 
    private _Head_base<Idx, Head> // act as laying on reverse order

libvc++ uses

struct tuple<This, Rest...>
    : public tuple<Rest...> {
    // ...
    _Tuple_val<This> val; // just as T
}; // act as laying on original order

Though it seems just a order-problem, The Standard says (ISO/IEC 14882::2017 draft N4660)

[ Note: The order of derivation is not significant except as specified by the semantics of initialization by constructor (15.6.2), cleanup (15.4), and storage layout (12.2, 14.1). — end note ]

So the order of base classes are implemented-defined. Then gcc can do optimization to behave like the first one (so as Clang)

Thus,

using T = std::tuple<bool, char, size_t, bool>;
std::cout << sizeof(T); // return 32 in x86-64-cl and 24 in gcc-6 & clang-4.0 (on godbolt)

Releasing STL containers

Actually, vector<T>::resize/vector<T>::shrink_to_fit is implemented-defined and will not release your memory immediately. Then, we have swap idiom

vector<T>{}.swap(vec);

Airtnp/Space Optimization Tips.md