Redesigning `coca`'s Storage Abstraction

This post was written primarily to organize my thoughts during the design process. I'll explain the thought process that led me to each point in the design space we visit, then point out any issues, which will lead into the next iteration. Along the way, I made a few errors, which I'll try to rectify in the (Side Notes).

Here we go!

Introduction: The Status Quo

When I first started working on coca, I was really just experimenting with the soon-to-be-stabilized min_const_generics, but I also wanted to support dynamically sized data structures (which would still have constant capacity). So my initial implementation of Vec looked something like this:

use std::convert::{TryFrom, TryInto};
use std::marker::PhantomData;
use std::mem::MaybeUninit;

struct Vec<Element, Buffer, Index>
where
    Buffer: AsRef<[MaybeUninit<Element>]> + AsMut<[MaybeUninit<Element>]>,
    Index: TryFrom<usize> + TryInto<usize>,
{
    len: Index,
    buf: Buffer,
    elem: PhantomData<Element>,
}

This worked okay, but it was kind of messy, and converting indices with try_{from, into} got annoying pretty fast, which is why I first introduced the AsIndex trait. Still, I wasn't quite happy until I saw and adopted the ContiguousStorage trait from Matthieu M.'s proposal to parametrize data structures not by allocators, but by storage type.

It works pretty much the same, but it makes trait bounds much nicer to read (and write), and has the added benefit of not relying on external trait implementations. It also works with other data structures that wrap a partially initialized array, like BinaryHeap and Deque. Anything more complex though, and we come to know its limitations.

Let's Talk about Maps

The simplest possible map you could build with this abstraction would look something like this:

struct LinearMap<K, V, Storage>
where
    K: Eq,
    Storage: ContiguousStorage<(K, V)>,
{
    pairs: Vec<(K, V), Storage, usize>,
}

impl<K, V, Storage> LinearMap<K, V, Storage>
where
    K: Eq,
    Storage: ContiguousStorage<(K, V)>,
{
    fn get(&self, k: &K) -> Option<&V> {
        self.pairs.iter().find(|(key, _)| key == k).map(|(_, value)| value)
    }

    fn insert(&mut self, k: K, v: V) -> Option<V> {
        let mut it = self.pairs.iter().enumerate();
        if let Some(idx) = it.find(|(_, (key, _))| key == k).map(|(i, _)| i) {
            Some(self.pairs.replace(idx, (k, v)).1)
        } else {
            self.pairs.push((k, v));
            None
        }
    }

    // other methods omitted
}

This isn't a very good general-purpose map, but it can definitely beat one in the right circumstances, specifically where comparisons are cheap and the maps stay small. However, even for only that use-case, we can do better by using a structure of arrays instead of an array of structures, which allows our linear search to check more keys per cache line read. We'd really like to write something closer to this, in spirit at least:

struct LinearMap<K, V, Storage>
where
    K: Eq,
    Storage: ContiguousStorage<(K, V)>,
{
    keys: Vec<K, Storage, usize>,
    values: Vec<V, Storage, usize>,
}

Ideally, these two vectors would share the same len field, but even this naive version can't work as shown here: the vectors expect a ContiguousStorage<K> and ContiguousStorage<V>, respectively. We could require users to pass in two separate storage blocks, but we can also make this work without changing the signature of LinearMap with some clever unsafe code.

Since align_of::<(K, V)>() == max(align_of::<K>(), align_of::<V>()) and size_of::<(K, V)>() >= size_of::<K>() + size_of::<V>, we can always fit two arrays [K], [V] in the same memory as the original [(K, V)]. The same logic applies for any n-tuple.

This too has issues, though, primarily memory usage. The culprit is in the size inequality, i.e. padding. Consider this:

use std::mem::size_of;

#[repr(align(8))]
struct Padded(u8);

// The size of `Padded` must be 8 so that the result of
// `<*Padded>::offset` is always well-aligned 
assert_eq!(size_of::<Padded>(), 8);

// The `u8` goes after `Padded`, so the tuple must be padded
// to the next multiple of its alignment (which equals 8)
assert_eq!(size_of::<(Padded, u8)>(), 16);

const N: usize = 16;
type ArrayOfStructs = [(Padded, u8); N];
type StructOfArrays = ([Padded; N], [u8; N]);

assert_eq!(size_of::<ArrayOfStructs>(), N*16);
assert_eq!(size_of::<StructOfArrays>(), N*8 + N);

In this (admittedly contrived) example, the array of structs is larger by a factor of 16:9, nearly double the size! By requiring users to provide a ContiguousStorage<(K, V)>, we're implicitly requiring them to overprovision, potentially by a lot.

This effect becomes more pronounced when you throw metadata into the mix. Say you want to track per-item occupancy in a bitmask stored as a [u64], so you require a ContiguousStorage<(K, V, bool)>. Not only will this require 8 times as much space for the mask as is actually needed, adding the bool to the tuple introduces even more padding in most cases. Furthermore, you can't rely on the provided storage being properly aligned for a [u64]. And besides, this is clearly leaking implementation details. Something's gotta give.

(Side Note: Rust doesn't actually make any such guarantees about the memory layout of tuples, or in fact any type that isn't repr(C); it's just how they currently happen to work. This may seem unlikely to change, but depending on it is just a generally bad idea.)

Rethinking the Storage Trait

Your first instinct might be to get rid of the type parameter on the storage trait entirely; client data structures will just have to deal with u8 pointers as if they'd gone through the allocator API, no big deal:

pub unsafe trait ContiguousStorage {
    fn as_ptr(&self) -> *const u8;
    fn as_mut_ptr(&mut self) -> *mut u8;
    fn capacity(&self) -> usize;
}

You can still implement this generically for arrays and slices of any type, so our fancy arrays keep working as before, and for complex data structures, you provide some utilities for correctly constructing suitable storage blocks.

I'd argue that's throwing the baby out with the bathwater. We're losing a good deal of type safety here, which we have to compenstate for with runtime checks, either by rejecting misaligned blocks, or by adjusting the pointers accordingly. Both can result in surprising failures as a consequence of programmer errors the compiler used to be able to detect.

Instead, I propose keeping the argument type, but to decouple it from any notion of an "item type". It would be used much like a marker trait, except it's a type marking a trait implementation:

use std::marker::PhantomData;

unsafe trait ContiguousStorage<Marker> {
    fn as_ptr(&self) -> *const u8;
    fn as_mut_ptr(&mut self) -> *mut u8;
    fn capacity(&self) -> usize;
}

struct ArrayLike<T>(PhantomData<T>);

unsafe impl<T> ContiguousStorage<ArrayLike<T>> for &mut [std::mem::MaybeUninit<T>] {
    fn as_ptr(&self) -> *const u8 {
        self.as_ptr() as _
    }

    fn as_mut_ptr(&mut self) -> *mut u8 {
        self.as_mut_ptr() as _
    }

    fn capacity(&self) -> usize {
        self.len()
    }
}

struct Vec<E, B, I>
where
    B: ContiguousStorage<ArrayLike<E>>
{
    len: I,
    buf: B,
    elem: PhantomData<E>,
}

Now, Vec<u64, SliceStorage<u8>, usize> is invalid again, and complex data structures can define their own marker types and go from there:

// in linear_map.rs:
use std::marker::PhantomData;
use std::mem::MaybeUninit;
use crate::storage::ContiguousStorage;

pub struct StorageRequirements<K, V>(PhantomData<(K, V)>);

#[repr(C)] // required because we need to calculate field offsets manually
pub struct InlineStorage<K, V, const N: usize> {
    keys: [MaybeUninit<K>; N],
    values: [MaybeUninit<V>; N],
}

unsafe impl<K, V, const N: usize> ContiguousStorage<StorageRequirements<K, V>> for InlineStorage<K, V, N> {
    fn as_ptr(&self) -> *const u8 { self as *const Self as *const u8 }
    fn as_mut_ptr(&mut self) -> *mut u8 { self as *mut Self as *mut u8 }
    fn capacity(&self) -> usize { N }
}

pub struct LinearMap<K, V, S: ContiguousStorage<StorageRequirements<K, V>>> {
    buf: S,
    len: usize,
    values_offset: usize, // cache the pointer offset to avoid redoing address calculations
    data: PhantomData<(K, V)>,
}

That's all well and good, but what about heap-allocated storage? We can (and probably should) provide blanket implementations so that Box<T>: ContiguousStorage<R> where T: ContiguousStorage<R>, sure, but that's not good enough. Users would still be required to specify the capacity at compile time, making with_capacity-style constructors and growable storage in general impossible to implement.

Storage Boxes don't Spark Joy

I experimented a bit with custom dynamically sized types (DSTs) as they currently exist in Rust, but to no avail: I defined a struct ending in a field of type [()], manually called the global allocator, and successfully built a fat pointer with ptr::slice_from_raw_parts and some funky type casting. Problem is, a box constructed with Box::from_raw called on this fat pointer would leak the memory when dropped. That's because it tries to construct a Layout using align_of_val and size_of_val, which doesn't match the layout used for the initial allocation.

We can kind of solve this problem using more unstable features, along these lines:

#![feature(const_generics)]
#![feature(const_evaluatable_checked)]

use std::marker::PhantomData;
use std::mem::{MaybeUninit, size_of, align_of};

#[repr(C)]
struct ArrayPair<A, B, const N: usize>([A; N], [B; N]);
struct SlicePair<A, B> {
    _phantom: PhantomData<(A, B)>,
    _force_align: ArrayPair<A, B, 0>,
    _max_padding: [MaybeUninit<u8>; align_of::<B>() - 1],
    data: [[MaybeUninit<u8>; size_of::<A>() + size_of::<B>()]],
}

(Side Note: This will not compile, complaining about missing where bounds on the array size expressions; I did not bother learning the specifics of these unsatable features yet, so I'm not sure what those would have to look like.)

However, this doesn't work with non-linear size functions either. What we really need here is better support for custom DSTs, but that doesn't seem likely to happen anytime soon; the notes from a recent lang team meeting about the competing RFCs give a decent overview.

We'll have to define our own fat pointers, as a temporary work around at least. In anticipation of the boilerplate this would necessitate, let's go straight for a generic one, which will be accompanied by a new trait, StorageRequirements, which will also bound the marker type on ContiguousStorage:

use std::alloc::{Layout, alloc, dealloc};
use std::marker::PhantomData;
use std::ptr::NonNull;

pub trait StorageRequirements {
    fn layout_with_capacity(items: usize) -> Layout;
}

pub unsafe trait ContiguousStorage<R: StorageRequirements> {
    fn as_ptr(&self) -> *const u8;
    fn as_mut_ptr(&mut self) -> *mut u8;
    fn capacity(&self) -> usize;
    fn try_resize<F>(&mut self, _: usize, _: F) -> Result<(), ()>
    where
        F: FnOnce(*const u8, *mut u8)
    {
        Err(())
    }
}

pub struct BoxedStorage<R: StorageRequirements, const GROWABLE: bool> {
    req: PhantomData<R>,
    ptr: NonNull<u8>,
    cap: usize,
}

unsafe impl<R: StorageRequirements, const GROWABLE: bool> ContiguousStorage<R> for BoxedStorage<R, GROWABLE> {
    fn as_ptr(&self) -> *const u8 { self.ptr.as_ptr() as _ }
    fn as_mut_ptr(&mut self) -> *mut u8 { self.ptr.as_ptr() }
    fn capacity(&self) -> usize { self.cap }
    fn try_resize<F>(&mut self, capacity: usize, copy: F) -> Result<(), ()>
    where
        F: FnOnce(*const u8, *mut u8)
    {
        if !GROWABLE { return Err(()); }

        let layout = R::layout_with_capacity(capacity);
        let dst = unsafe { alloc(layout) };
        if dst.is_null() { return Err(()); }

        let src = self.ptr.as_ptr() as *const u8;
        copy(src, dst);

        unsafe {
            dealloc(src as *mut u8, layout);
            self.ptr = NonNull::new_unchecked(dst);
        }

        Ok(())
    }
}

impl<R: StorageRequirements, const G: bool> BoxedStorage<R, G> {
    pub fn with_capacity(items: usize) -> Option<Self> {
        let layout = R::layout_with_capacity(items);
        let ptr = unsafe { alloc(layout) };
        let ptr = NonNull::new(ptr)?;
        Some(BoxedStorage { req: PhantomData, ptr, cap: items })
    }
}

impl<R: StorageRequirements, const G: bool> Drop for BoxedStorage<R, G> {
    fn drop(&mut self) {
        let layout = R::layout_with_capacity(self.cap);
        unsafe { dealloc(self.ptr.as_ptr(), layout ) };
    }
}

While that's not currently possible, in the future we could also add a generic associated type Inline<const N: usize>: ContiguousStorage<Self> to the StorageRequirements trait, which would enable a similarly generic SmallStorage. Note that this would be a breaking change, or require yet another trait InlineableStorageRequirements: StorageRequirements.

What's a Box, Anyway?

Early on in the discussion on his proposal, Matthieu wrote this:

I disagree that not moving the content is the primary motivation; I mainly see Box as going from unsized (trait / slice) to sized.

With that definition, a InlineBox<Trait, 64> is a Box that holds up to 64 bytes of data (on top of the trait pointer), and a SmallBox<Trait, 64> holds up to 64 bytes of data without memory allocation, or otherwise allocates.

I see both as useful as their InlineVec and SmallVec counterparts.

I really like this perspective, as it neatly frames a problem I forgot I had a while before starting coca. At the time, I was experimenting with using async functions as coroutines; for various reasons I won't go into here, my executor needed to be no_std, with no allocations allowed. I got it to work fine with just a single async function in the codebase, but it couldn't handle executing one followed by a different one. I needed dyn Future trait objects, but I couldn't use a &dyn Future in this case, and Box<dyn> was unavailable - so I scrapped the experiment, and didn't look back.

This definitely would have been nice to have there, but it seems nigh impossible to actually make work. Just like Rust currently lacks support for defining DSTs, you also can't really handle them generically.

Except the standard library makes it work somehow, right? Let's take a look.

As far as I can tell, there's three relevant traits, all of them gated behind different unstable features:

std::marker::Unsize<T: ?Sized> marks implementors as being convertible to the unsized type T.
std::ops::CoerceUnsized<T: ?Sized> marks implementors as implicitly convertible to T, provided both are pointer types and the implementor is also Unsize<T>. Weirdly, Unsize is not a super trait of CoerceUnsize.
std::ops::DispatchFromDyn is also a marker trait, that "is used for object safety, to check that a method's receiver type can be dispatched on."

(Side Note: The documentation is sparse here, I'm not sure I fully understand the latter two.)

That's three marker traits with builtin compiler support, at least one of which is for pointer types only. Not a great sign, but maybe Unsize is something we can work with. We already know that taking fat pointers apart and reassembling them is a fool's errand, so we'll just have to store a complete one for now. But we can't just point it at the actual object in storage, as that might move with the box! The solution I came up with needs the unstable set_ptr_value feature, which lets us store a dangling fat pointer instead, and replace just the pointer part (i.e. keeping the vtable pointer intact) whenever the box is dereferenced:

(Side Note: During experimentation, I managed to crash Miri! See this issue for details.)

#![feature(unsize)]
#![feature(set_ptr_value)]

use std::alloc::Layout;
use std::marker::Unsize;
use std::mem::{MaybeUninit, align_of, size_of};
use std::ops::{Deref, DerefMut};
use std::ptr::NonNull;

pub struct BoxReqs;
impl StorageRequirements for BoxReqs {
    fn layout_with_capacity(bytes: usize) -> Layout {
        Layout::from_size_align(bytes, 8).unwrap()
    }
}

pub union InlineStorage<const N: usize> {
    _force_alignment: MaybeUninit<u64>,
    data: [MaybeUninit<u8>; N],
}

unsafe impl<const N: usize> ContiguousStorage<BoxReqs> for InlineStorage<N> {
    fn capacity(&self) -> usize { N }
    fn as_ptr(&self) -> *const u8 {
        unsafe { (&self.data) as *const _ as *const u8 }
    }
    fn as_mut_ptr(&mut self) -> *mut u8 {
        unsafe { (&mut self.data) as *mut _ as *mut u8 }
    }
}

pub struct GenericBox<T: ?Sized, S: ContiguousStorage<BoxReqs>> {
    meta: NonNull<T>,
    buf: S,
}

impl<T: ?Sized, S: ContiguousStorage<BoxReqs>> Deref for GenericBox<T, S> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        let dangling = self.meta.as_ptr() as *const T;
        unsafe {
            dangling
                .set_ptr_value(self.buf.as_ptr())
                .as_ref().unwrap()
        }
    }
}

impl<T: ?Sized, S: ContiguousStorage<BoxReqs>> DerefMut for GenericBox<T, S> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        let dangling = self.meta.as_ptr();
        unsafe {
            dangling
                .set_ptr_value(self.buf.as_mut_ptr())
                .as_mut().unwrap()
        }
    }
}

pub type InlineBox<T, const N: usize> = GenericBox<T, InlineStorage<N>>;
impl<T: ?Sized, const N: usize> InlineBox<T, N> {
    pub fn new<U>(val: U) -> Self
    where
        U: Sized + Unsize<T>,
    {
        // unstable features const_generics and const_evaluatable_checked
        // could turn these into compile-time checks
        assert!(size_of::<U>() <= N);
        assert!(align_of::<U>() <= 8);
        
        let mut result = InlineBox {
            // use coercion to obtain vtable ptr / slice len
            // this is never dereferenced, so it's okay for it to dangle
            meta: NonNull::<U>::dangling() as NonNull<T>,
            // going from MaybeUninit<[u8]> to [MaybeUninit<u8>], so this is ok
            buf: unsafe { MaybeUninit::uninit().assume_init() }
        };
        
        let ptr = (&mut result.buf).as_mut_ptr() as *mut U;
        unsafe { ptr.write(val) };
        
        result
    }
}

In order to support storing types with larger alignment requirements, we can provide alternative InlineStorage types (or make it generic) and add another argument to ContiguousStorage::try_resize to override the minimum alignment from the StorageRequirements::layout_with_capacity.

Sadly, this isn't a suitable replacement for std::boxed::Box, or even coca::arena::Box yet - the size overhead is just too large. It does seem like a worthwhile trade-off to get inline trait objects, though, at least until DSTs get some more love from the language.

Closing Remarks

First of all, if you made it all the way down here: Thanks for reading!

I'm pretty proud of this design. No doubt it can be improved further, but I think I've settled in a decent spot. Please, do point out any issues you can think of!

teryror/storage-abstraction-v2.md

Redesigning coca's Storage Abstraction