This is a very rough sketch of an idea for a reimplementation of the illumos ID Space mechanism. Take everything I say here with a grain of salt, please...
ID spaces will be implemented using two structs:
id_space_t
node_t *root; /* root of tree */
size_t height; /* the current height of the tree */
id_t low;
id_t high;
size_t size; /* number of allocated IDs */
id_t last_allocated; /* ID that was last allocated */
size_t branch_factor /* not modifiable after creation; probably should be a compile-time constant */
id_node_t
/* maybe not a long[] if branch_factor < 64 */
long bitfield[branch_factor/(8 * sizeof(long))];
node_t *children[branch_factor];
size_t inverse_depth; /* this is weird: it equals tree height minus actual depth */
An id space will consist of a tree of id_node_t
s, pointed to by an
id_space_t
. The high and low ranges will be specified by the user, as normal.
The branch_factor is the number of children per parent node - this will either
be a compile-time constant, a tunable parameter, or optimized automatically for
the predicted size of the tree.
Each node has a bitfield, which will serve two different purposes depending on whether or not the node is a leaf. If the node is a leaf, zeroes in the bitfield will indicate unallocated IDs, and ones will indicate allocated IDs. If the node is not a leaf, each bit will correspond to a child node. A zero will indicate that the corresponding child (or one of the child's children or its childrens' children, etc.) has an unallocated ID. A one will indicate that the child has no children with unallocated IDs - its branch is completely full.
The tree starts with just the root node and grows (bottom-up!) as more space is required. Here's an example. For simplicity's sake, we'll use a branching factor of 2 and assume that the ID range starts at 0.
In my hacked-together notation, N
signifies a null pointer.
We start with a single node (labeled node A):
A: bitfield:[0 0]
children:[N N]
inverse_depth: 0
We allocate our first id, which is 0:
A: bitfield:[1 0]
children:[N N]
inverse_depth: 0
And another id, 1:
A: bitfield:[1 1]
children:[N N]
inverse_depth: 0
On the next allocation, we find that the root's bitfield is completely full - this is how we know we've run out of space. We add a new root, node B:
B: bitfield:[1 0]
children:[A N]
inverse_depth: 1
/
A: bitfield:[1 1]
children:[N N]
inverse_depth: 0
Note that we didn't have to modify A at all.
When we add a new layer like this, we must also update the root
and height
fields in our id_space_t
struct.
On the next allocation, we see that B has a child with space remaining. This child hasn't actually been allocated yet, so we allocate it:
B: bitfield:[1 0]
children:[A C]
inverse_depth: 1
/ \
A: bitfield:[1 1] C: bitfield:[1 0]
children:[N N] children:[N N]
inverse_depth: 0 inverse_depth: 0
On the next allocation, we see that we have filled up C. This means that B is now also full, so we update both nodes. Furthermore, the root is now full, so we add another layer:
D: bitfield:[1 0]
children:[B N]
inverse_depth: 2
/
B: bitfield:[1 1]
children:[A C]
inverse_depth: 1
/ \
A: bitfield:[1 1] C: bitfield:[1 1]
children:[N N] children:[N N]
inverse_depth: 0 inverse_depth: 0
And so on and so on...
Note that the IDs that correspond to specific spots in each node's bitfield never change - for example, the second slot in node C will always refer to ID 3. This allows us to be precise when traversing the tree, for first-fit allocation, next-fit allocation, and deletion alike.
For first-fit allocation, we start at the root and iterate through the bitfield until we find a non-full child. We then iterate through the child's bitfield, and so on, until we reach a leaf. We can use each node's inverse depth and the tree's total height to figure out when we reach a leaf. We can calculate the ID corresponding to that leaf using the branching factor, current depth, and path we take as we iterate.
For next-fit allocation, we do exactly the same thing, but we only consider nodes that have the possibility of holding IDs larger than the last allocated ID. This will occasionally result in a "miss" where we end in a leaf whose only free IDs are smaller than the last allocated ID, but this case will be rare and will just require one additional tree traversal.
For ID deletion, we know the exact location of the ID to be deleted, so we can traverse the tree precisely until we reach that location.
In all of these cases, we almost always can get away with only performing one
tree traversal of length logbranch_factor
n, iterating through at
most branch_factor
elements at each node, for a total runtime of
branch_factor
* logbranch_factor
n, where n is
branch_factor
tree_height.
When adding a layer, we don't have to touch any nodes except those in the
direct path up to the root from the last filled-up leaf. This is why we use
inverse_depth
instead of depth - otherwise, we would have to touch all nodes
to update their depth.
In id_space_create, the user may specifiy a minumum ID value larger than zero. In this case, we should simply avoid allocating IDs from that part of the tree. We may initially have to allocate a single node for each layer until the tree is tall enough to support the user's minimum value, but we do not have to "flesh out" these layers preemptively.
As an example, suppose our branching factor is 2 and the user specifies a minimum ID value of 14. We begin with an empty tree:
A: bitfield:[0 0]
children:[N N]
inverse_depth: 0
But discover upon allocation that our tree isn't big enough to support ID 14. We therefore expand the tree via our bottom-up new-root method until it is:
D: bitfield:[0 0]
children:[C N]
inverse_depth: 3
/
C: bitfield:[0 0]
children:[B N]
inverse_depth: 2
/
B: bitfield:[0 0]
children:[A N]
inverse_depth: 1
/
A: bitfield:[0 0]
children:[N N]
inverse_depth: 0
And then allocate as we would normally - ID 14 belongs in node G in the diagram below:
D: bitfield:[0 0]
children:[C E]
inverse_depth: 3
/ \
C: bitfield:[0 0] E: bitfield:[0 0]
children:[B N] children:[N F]
inverse_depth: 2 inverse_depth: 2
/ \
B: bitfield:[0 0] F: bitfield:[0 0]
children:[A N] children:[N G]
inverse_depth: 1 inverse_depth: 1
/ \
A: bitfield:[0 0] G: bitfield:[1 0]
children:[N N] children:[N N]
inverse_depth: 0 inverse_depth: 0
From here, we can allocate normally - we just need to ensure that we don't allow
allocations less than 14. Note that there is an overhead of at most an extra
log branch_factor
n nodes (in this case, nodes A, B, and C), where
n is the power of 2 immediately below the user's minimum ID value (in this
case, 8). This extra allocation could take place at the time of tree creation or
ID allocation, but would only have to take place once. If the user then reduced
the minimum ID, we would already have the structure in place to begin allocating
from the restricted portion of the tree.
It will be simple to prevent ID allocation beyond the user's specified maximum value - we just avoid allocating IDs past that point and avoid unnecessarily expanding the tree to allow IDs past that point. If the user expands the ID range by increasing the maximum value, the tree can be allowed to grow normally.
A tree of height h will support branch_factor
h+1 IDs and
require at most branch_factor
h + branch_factor
h-1+
... + branch_factor
0 nodes. This does not seem efficient for a low
branching factor (e.g. 2) but becomes increasingly efficient as the branching
factor increases. At any rate, this is more efficient than allocating a separate
vmem_seg_t
for every ID, which is what we do now.
For example, suppose our branching factor is 16, and we want to allocate 4096 IDs. This means we need a tree of height 2, which will have at most 273 nodes. Each node will take up 138 bytes (2 bytes for the bitfield, 16 child pointers, and a size_t for the depth) for a total of (138 * 273) == 37674 bytes, or about 37 kilobytes. As noted in OS-6013, the current ID space implementation uses 232 bytes per ID - this would entail a total of (232 * 4096) == 950,272 bytes, or about 950 kilobytes.
As another example, suppose our branching factor is 64, and we want to allocate 262144 IDs. This means we need a tree of height 2, which will have at most 4161 nodes. Each node will take up 528 bytes (1 long for the bitfield, 64 child pointers, and a size_t for the depth) for a total of (528 * 4161) == 2,197,008 bytes, or about 2 megabytes. The current implementation would require (232 * 262144) == 60,817,408 bytes, or about 60 megabytes.