This post was written primarily to organize my thoughts during the design process. I'll explain the thought process that led me to each point in the design space we visit, then point out any issues, which will lead into the next iteration. Along the way, I made a few errors, which I'll try to rectify in the (Side Notes).
Here we go!
When I first started working on coca
, I was really just
experimenting with the soon-to-be-stabilized min_const_generics
, but I also
wanted to support dynamically sized data structures (which would still have
constant capacity). So my initial implementation of Vec
looked something
like this:
use std::convert::{TryFrom, TryInto};
use std::marker::PhantomData;
use std::mem::MaybeUninit;
struct Vec<Element, Buffer, Index>
where
Buffer: AsRef<[MaybeUninit<Element>]> + AsMut<[MaybeUninit<Element>]>,
Index: TryFrom<usize> + TryInto<usize>,
{
len: Index,
buf: Buffer,
elem: PhantomData<Element>,
}
This worked okay, but it was kind of messy, and converting indices with
try_{from, into}
got annoying pretty fast, which is why I first introduced
the AsIndex
trait. Still, I wasn't quite happy until I saw and adopted the
ContiguousStorage
trait from Matthieu M.'s proposal to parametrize
data structures not by allocators, but by storage type.
It works pretty much the same, but it makes trait bounds much nicer to read
(and write), and has the added benefit of not relying on external trait
implementations. It also works with other data structures that wrap a partially
initialized array, like BinaryHeap
and Deque
. Anything more complex though,
and we come to know its limitations.
The simplest possible map you could build with this abstraction would look something like this:
struct LinearMap<K, V, Storage>
where
K: Eq,
Storage: ContiguousStorage<(K, V)>,
{
pairs: Vec<(K, V), Storage, usize>,
}
impl<K, V, Storage> LinearMap<K, V, Storage>
where
K: Eq,
Storage: ContiguousStorage<(K, V)>,
{
fn get(&self, k: &K) -> Option<&V> {
self.pairs.iter().find(|(key, _)| key == k).map(|(_, value)| value)
}
fn insert(&mut self, k: K, v: V) -> Option<V> {
let mut it = self.pairs.iter().enumerate();
if let Some(idx) = it.find(|(_, (key, _))| key == k).map(|(i, _)| i) {
Some(self.pairs.replace(idx, (k, v)).1)
} else {
self.pairs.push((k, v));
None
}
}
// other methods omitted
}
This isn't a very good general-purpose map, but it can definitely beat one in the right circumstances, specifically where comparisons are cheap and the maps stay small. However, even for only that use-case, we can do better by using a structure of arrays instead of an array of structures, which allows our linear search to check more keys per cache line read. We'd really like to write something closer to this, in spirit at least:
struct LinearMap<K, V, Storage>
where
K: Eq,
Storage: ContiguousStorage<(K, V)>,
{
keys: Vec<K, Storage, usize>,
values: Vec<V, Storage, usize>,
}
Ideally, these two vectors would share the same len
field, but even this naive
version can't work as shown here: the vectors expect a ContiguousStorage<K>
and ContiguousStorage<V>
, respectively. We could require users to pass in two
separate storage blocks, but we can also make this work without changing the
signature of LinearMap
with some clever unsafe
code.
Since align_of::<(K, V)>() == max(align_of::<K>(), align_of::<V>())
and
size_of::<(K, V)>() >= size_of::<K>() + size_of::<V>
, we can always fit two
arrays [K], [V]
in the same memory as the original [(K, V)]
. The same logic
applies for any n-tuple.
This too has issues, though, primarily memory usage. The culprit is in the size inequality, i.e. padding. Consider this:
use std::mem::size_of;
#[repr(align(8))]
struct Padded(u8);
// The size of `Padded` must be 8 so that the result of
// `<*Padded>::offset` is always well-aligned
assert_eq!(size_of::<Padded>(), 8);
// The `u8` goes after `Padded`, so the tuple must be padded
// to the next multiple of its alignment (which equals 8)
assert_eq!(size_of::<(Padded, u8)>(), 16);
const N: usize = 16;
type ArrayOfStructs = [(Padded, u8); N];
type StructOfArrays = ([Padded; N], [u8; N]);
assert_eq!(size_of::<ArrayOfStructs>(), N*16);
assert_eq!(size_of::<StructOfArrays>(), N*8 + N);
In this (admittedly contrived) example, the array of structs is larger by a
factor of 16:9, nearly double the size! By requiring users to provide a
ContiguousStorage<(K, V)>
, we're implicitly requiring them to overprovision,
potentially by a lot.
This effect becomes more pronounced when you throw metadata into the mix. Say
you want to track per-item occupancy in a bitmask stored as a [u64]
, so you
require a ContiguousStorage<(K, V, bool)>
. Not only will this require 8 times
as much space for the mask as is actually needed, adding the bool
to the tuple
introduces even more padding in most cases. Furthermore, you can't rely on the
provided storage being properly aligned for a [u64]
. And besides, this is
clearly leaking implementation details. Something's gotta give.
(Side Note: Rust doesn't actually make any such guarantees about the memory
layout of tuples, or in fact any type that isn't repr(C)
; it's just how they
currently happen to work. This may seem unlikely to change, but depending on it
is just a generally bad idea.)
Your first instinct might be to get rid of the type parameter on the storage
trait entirely; client data structures will just have to deal with u8
pointers
as if they'd gone through the allocator API, no big deal:
pub unsafe trait ContiguousStorage {
fn as_ptr(&self) -> *const u8;
fn as_mut_ptr(&mut self) -> *mut u8;
fn capacity(&self) -> usize;
}
You can still implement this generically for arrays and slices of any type, so our fancy arrays keep working as before, and for complex data structures, you provide some utilities for correctly constructing suitable storage blocks.
I'd argue that's throwing the baby out with the bathwater. We're losing a good deal of type safety here, which we have to compenstate for with runtime checks, either by rejecting misaligned blocks, or by adjusting the pointers accordingly. Both can result in surprising failures as a consequence of programmer errors the compiler used to be able to detect.
Instead, I propose keeping the argument type, but to decouple it from any notion of an "item type". It would be used much like a marker trait, except it's a type marking a trait implementation:
use std::marker::PhantomData;
unsafe trait ContiguousStorage<Marker> {
fn as_ptr(&self) -> *const u8;
fn as_mut_ptr(&mut self) -> *mut u8;
fn capacity(&self) -> usize;
}
struct ArrayLike<T>(PhantomData<T>);
unsafe impl<T> ContiguousStorage<ArrayLike<T>> for &mut [std::mem::MaybeUninit<T>] {
fn as_ptr(&self) -> *const u8 {
self.as_ptr() as _
}
fn as_mut_ptr(&mut self) -> *mut u8 {
self.as_mut_ptr() as _
}
fn capacity(&self) -> usize {
self.len()
}
}
struct Vec<E, B, I>
where
B: ContiguousStorage<ArrayLike<E>>
{
len: I,
buf: B,
elem: PhantomData<E>,
}
Now, Vec<u64, SliceStorage<u8>, usize>
is invalid again, and complex data
structures can define their own marker types and go from there:
// in linear_map.rs:
use std::marker::PhantomData;
use std::mem::MaybeUninit;
use crate::storage::ContiguousStorage;
pub struct StorageRequirements<K, V>(PhantomData<(K, V)>);
#[repr(C)] // required because we need to calculate field offsets manually
pub struct InlineStorage<K, V, const N: usize> {
keys: [MaybeUninit<K>; N],
values: [MaybeUninit<V>; N],
}
unsafe impl<K, V, const N: usize> ContiguousStorage<StorageRequirements<K, V>> for InlineStorage<K, V, N> {
fn as_ptr(&self) -> *const u8 { self as *const Self as *const u8 }
fn as_mut_ptr(&mut self) -> *mut u8 { self as *mut Self as *mut u8 }
fn capacity(&self) -> usize { N }
}
pub struct LinearMap<K, V, S: ContiguousStorage<StorageRequirements<K, V>>> {
buf: S,
len: usize,
values_offset: usize, // cache the pointer offset to avoid redoing address calculations
data: PhantomData<(K, V)>,
}
That's all well and good, but what about heap-allocated storage?
We can (and probably should) provide blanket implementations so that
Box<T>: ContiguousStorage<R> where T: ContiguousStorage<R>
, sure, but that's
not good enough. Users would still be required to specify the capacity at
compile time, making with_capacity
-style constructors and growable storage in
general impossible to implement.
I experimented a bit with custom dynamically sized types (DSTs) as they currently exist in Rust,
but to no avail: I defined a struct ending in a field of type [()]
, manually
called the global allocator, and successfully built a fat pointer with
ptr::slice_from_raw_parts
and some funky type casting. Problem is, a box
constructed with Box::from_raw
called on this fat pointer would leak the
memory when dropped. That's because it tries to construct a Layout
using
align_of_val
and size_of_val
, which doesn't match the layout used for the
initial allocation.
We can kind of solve this problem using more unstable features, along these lines:
#![feature(const_generics)]
#![feature(const_evaluatable_checked)]
use std::marker::PhantomData;
use std::mem::{MaybeUninit, size_of, align_of};
#[repr(C)]
struct ArrayPair<A, B, const N: usize>([A; N], [B; N]);
struct SlicePair<A, B> {
_phantom: PhantomData<(A, B)>,
_force_align: ArrayPair<A, B, 0>,
_max_padding: [MaybeUninit<u8>; align_of::<B>() - 1],
data: [[MaybeUninit<u8>; size_of::<A>() + size_of::<B>()]],
}
(Side Note: This will not compile, complaining about missing where
bounds on
the array size expressions; I did not bother learning the specifics of these
unsatable features yet, so I'm not sure what those would have to look like.)
However, this doesn't work with non-linear size functions either. What we really need here is better support for custom DSTs, but that doesn't seem likely to happen anytime soon; the notes from a recent lang team meeting about the competing RFCs give a decent overview.
We'll have to define our own fat pointers, as a temporary work around at least.
In anticipation of the boilerplate this would necessitate, let's go straight for
a generic one, which will be accompanied by a new trait, StorageRequirements
,
which will also bound the marker type on ContiguousStorage
:
use std::alloc::{Layout, alloc, dealloc};
use std::marker::PhantomData;
use std::ptr::NonNull;
pub trait StorageRequirements {
fn layout_with_capacity(items: usize) -> Layout;
}
pub unsafe trait ContiguousStorage<R: StorageRequirements> {
fn as_ptr(&self) -> *const u8;
fn as_mut_ptr(&mut self) -> *mut u8;
fn capacity(&self) -> usize;
fn try_resize<F>(&mut self, _: usize, _: F) -> Result<(), ()>
where
F: FnOnce(*const u8, *mut u8)
{
Err(())
}
}
pub struct BoxedStorage<R: StorageRequirements, const GROWABLE: bool> {
req: PhantomData<R>,
ptr: NonNull<u8>,
cap: usize,
}
unsafe impl<R: StorageRequirements, const GROWABLE: bool> ContiguousStorage<R> for BoxedStorage<R, GROWABLE> {
fn as_ptr(&self) -> *const u8 { self.ptr.as_ptr() as _ }
fn as_mut_ptr(&mut self) -> *mut u8 { self.ptr.as_ptr() }
fn capacity(&self) -> usize { self.cap }
fn try_resize<F>(&mut self, capacity: usize, copy: F) -> Result<(), ()>
where
F: FnOnce(*const u8, *mut u8)
{
if !GROWABLE { return Err(()); }
let layout = R::layout_with_capacity(capacity);
let dst = unsafe { alloc(layout) };
if dst.is_null() { return Err(()); }
let src = self.ptr.as_ptr() as *const u8;
copy(src, dst);
unsafe {
dealloc(src as *mut u8, layout);
self.ptr = NonNull::new_unchecked(dst);
}
Ok(())
}
}
impl<R: StorageRequirements, const G: bool> BoxedStorage<R, G> {
pub fn with_capacity(items: usize) -> Option<Self> {
let layout = R::layout_with_capacity(items);
let ptr = unsafe { alloc(layout) };
let ptr = NonNull::new(ptr)?;
Some(BoxedStorage { req: PhantomData, ptr, cap: items })
}
}
impl<R: StorageRequirements, const G: bool> Drop for BoxedStorage<R, G> {
fn drop(&mut self) {
let layout = R::layout_with_capacity(self.cap);
unsafe { dealloc(self.ptr.as_ptr(), layout ) };
}
}
While that's not currently possible, in the future we could also add a generic
associated type Inline<const N: usize>: ContiguousStorage<Self>
to the
StorageRequirements
trait, which would enable a similarly generic SmallStorage
.
Note that this would be a breaking change, or require yet another trait
InlineableStorageRequirements: StorageRequirements
.
Early on in the discussion on his proposal, Matthieu wrote this:
I disagree that not moving the content is the primary motivation; I mainly see
Box
as going from unsized (trait / slice) to sized.With that definition, a
InlineBox<Trait, 64>
is aBox
that holds up to 64 bytes of data (on top of the trait pointer), and aSmallBox<Trait, 64>
holds up to 64 bytes of data without memory allocation, or otherwise allocates.I see both as useful as their
InlineVec
andSmallVec
counterparts.
I really like this perspective, as it neatly frames a problem I forgot I had a
while before starting coca
. At the time, I was experimenting with using async
functions as coroutines; for various reasons I won't go into here, my executor
needed to be no_std
, with no allocations allowed. I got it to work fine with
just a single async
function in the codebase, but it couldn't handle executing
one followed by a different one. I needed dyn Future
trait objects, but I
couldn't use a &dyn Future
in this case, and Box<dyn>
was unavailable - so
I scrapped the experiment, and didn't look back.
This definitely would have been nice to have there, but it seems nigh impossible to actually make work. Just like Rust currently lacks support for defining DSTs, you also can't really handle them generically.
Except the standard library makes it work somehow, right? Let's take a look.
As far as I can tell, there's three relevant traits, all of them gated behind different unstable features:
std::marker::Unsize<T: ?Sized>
marks implementors as being convertible to the unsized typeT
.std::ops::CoerceUnsized<T: ?Sized>
marks implementors as implicitly convertible toT
, provided both are pointer types and the implementor is alsoUnsize<T>
. Weirdly,Unsize
is not a super trait ofCoerceUnsize
.std::ops::DispatchFromDyn
is also a marker trait, that "is used for object safety, to check that a method's receiver type can be dispatched on."
(Side Note: The documentation is sparse here, I'm not sure I fully understand the latter two.)
That's three marker traits with builtin compiler support, at least one of which
is for pointer types only. Not a great sign, but maybe Unsize
is something we
can work with. We already know that taking fat pointers apart and reassembling
them is a fool's errand, so we'll just have to store a complete one for now.
But we can't just point it at the actual object in storage, as that might move
with the box! The solution I came up with needs the unstable set_ptr_value
feature, which lets us store a dangling fat pointer instead, and replace just
the pointer part (i.e. keeping the vtable pointer intact) whenever the box is
dereferenced:
(Side Note: During experimentation, I managed to crash Miri! See this issue for details.)
#![feature(unsize)]
#![feature(set_ptr_value)]
use std::alloc::Layout;
use std::marker::Unsize;
use std::mem::{MaybeUninit, align_of, size_of};
use std::ops::{Deref, DerefMut};
use std::ptr::NonNull;
pub struct BoxReqs;
impl StorageRequirements for BoxReqs {
fn layout_with_capacity(bytes: usize) -> Layout {
Layout::from_size_align(bytes, 8).unwrap()
}
}
pub union InlineStorage<const N: usize> {
_force_alignment: MaybeUninit<u64>,
data: [MaybeUninit<u8>; N],
}
unsafe impl<const N: usize> ContiguousStorage<BoxReqs> for InlineStorage<N> {
fn capacity(&self) -> usize { N }
fn as_ptr(&self) -> *const u8 {
unsafe { (&self.data) as *const _ as *const u8 }
}
fn as_mut_ptr(&mut self) -> *mut u8 {
unsafe { (&mut self.data) as *mut _ as *mut u8 }
}
}
pub struct GenericBox<T: ?Sized, S: ContiguousStorage<BoxReqs>> {
meta: NonNull<T>,
buf: S,
}
impl<T: ?Sized, S: ContiguousStorage<BoxReqs>> Deref for GenericBox<T, S> {
type Target = T;
fn deref(&self) -> &Self::Target {
let dangling = self.meta.as_ptr() as *const T;
unsafe {
dangling
.set_ptr_value(self.buf.as_ptr())
.as_ref().unwrap()
}
}
}
impl<T: ?Sized, S: ContiguousStorage<BoxReqs>> DerefMut for GenericBox<T, S> {
fn deref_mut(&mut self) -> &mut Self::Target {
let dangling = self.meta.as_ptr();
unsafe {
dangling
.set_ptr_value(self.buf.as_mut_ptr())
.as_mut().unwrap()
}
}
}
pub type InlineBox<T, const N: usize> = GenericBox<T, InlineStorage<N>>;
impl<T: ?Sized, const N: usize> InlineBox<T, N> {
pub fn new<U>(val: U) -> Self
where
U: Sized + Unsize<T>,
{
// unstable features const_generics and const_evaluatable_checked
// could turn these into compile-time checks
assert!(size_of::<U>() <= N);
assert!(align_of::<U>() <= 8);
let mut result = InlineBox {
// use coercion to obtain vtable ptr / slice len
// this is never dereferenced, so it's okay for it to dangle
meta: NonNull::<U>::dangling() as NonNull<T>,
// going from MaybeUninit<[u8]> to [MaybeUninit<u8>], so this is ok
buf: unsafe { MaybeUninit::uninit().assume_init() }
};
let ptr = (&mut result.buf).as_mut_ptr() as *mut U;
unsafe { ptr.write(val) };
result
}
}
In order to support storing types with larger alignment requirements, we can
provide alternative InlineStorage
types (or make it generic) and add another
argument to ContiguousStorage::try_resize
to override the minimum alignment
from the StorageRequirements::layout_with_capacity
.
Sadly, this isn't a suitable replacement for std::boxed::Box
, or even
coca::arena::Box
yet - the size overhead is just too large. It does seem like
a worthwhile trade-off to get inline trait objects, though, at least until DSTs
get some more love from the language.
First of all, if you made it all the way down here: Thanks for reading!
I'm pretty proud of this design. No doubt it can be improved further, but I think I've settled in a decent spot. Please, do point out any issues you can think of!