The purpose of the log indexer is to maintain the log index data structure and keep it consistent with the canonical chain. This structure has four components:
- information about the currently indexed range
- actual filter map data for a consecutive series of maps
- pointers from each indexed block number to the first log value index it occupies in the linear index space
- pointers from each rendered map to the number of the block that occupies the last log value index of the map
Last block of map pointers and the corresponding reverse direction block to log value index pointers are also stored for the last map of each epoch before the indexed range. During initialization these pointers are added based on a set of hardcoded checkpoints for the given chain (which is a workaround solution until EIP-7745 is passed and these pointers become provable from the latest head). Later if an indexed section gets unindexed then these pointers are retained. Indexing can be started from such a known epoch boundary.
Information about the current state of the log index is stored in a single FilterMapsRange
instance:
FirstIndexedBlock
,AfterLastIndexedBlock
: range in which blocks are fully indexed.HeadBlockIndexed
: iftrue
thenAfterLastIndexedBlock-1
is the head block ofindexedView
and it is fully rendered; iffalse
thenAfterLastIndexedBlock
exists and is partially rendered.HeadBlockDelimiter
: ifHeadBlockIndexed
istrue
thenHeadBlockDelimiter
points to the next free log value index position where the block delimiter of the current head block will be rendered once the next block is added.FirstRenderedMap
,AfterLastRenderedMap
: range in which filter maps are fully rendered.FirstRenderedMap
is always the first map of an epoch.TailPartialEpoch
: rendered maps in the epoch before the one that starts withFirstRenderedMap
. Tail epochs are always rendered staring withFirstRenderedMap - MAPS_PER_EPOCH
.TailPartialEpoch
is always less thanMAPS_PER_EPOCH
. If the last map of the tail partial epoch is rendered then it is merged to the main index,FirstRenderedMap
is reduced byMAPS_PER_EPOCH
andTailPartialEpoch
is set to zero.FirstIndexedBlock
is also updated when this merge happens and it is not affected by the partial epoch.
Note that leftover filter row and pointer entries may exist in the database outside the valid range after an unclean shutdown. These entries can be ignored and they are overwritten or removed when the given range gets properly rendered.
If FilterMapsRange
is missing from the database then the log index should be considered non-existent and any leftover filter row or pointer objects should be removed from the database before initialization.
In order to improve database size and access speed efficiency, base layer filter data (which is very often just a few bytes per row) is not stored under individual keys for each row of each map. Rows not longer than BASE_ROW_LENGTH
and the first BASE_ROW_LENGTH
entries of longer rows are stored in BASE_ROW_GROUP
sized groups as tightly packed FilterMapBaseRows
instances. Rows longer than BASE_ROW_LENGTH
also have an individual FilterMapExtRow
instance which contains the rest of the filter row entries.
Database keys for filter rows are constructed so that they are ordered by epoch first, then by row index, then by map index inside the epoch (in binary terms the row index is interjected in the middle of the map index). This is the same pattern as the one EIP-7745 defines for tree hashing, and for the same reason (efficient access for searching).
For each rendered map and the last map of each epoch before the rendered range, a FilterMapLastBlock
instance is stored that consists of the block number and block ID of the block that occupies the last log value index of the map. Note that the block delimiter after each block is added when the next block is added, therefore if delimiter of block N
falls to the last log value index of the map then the last block entry will point to block N+1
. If the head block is N
then the head delimiter is not added yet and the last block entry will point to the head block N
.
For each fully or partially indexed block and the last block of every epoch before the rendered range, a BlockLvPointer
instance is stored that is a the first log value index occupied by the block. Note that though the block delimiter of block N-1
is generated when the block N
is added, the BlockLvPointer
points to the first log value index of block N
. This also means that the first log value index pointer of the last block of map M
may point to the beginning of map M+1
.
The figure below shows an example of a log index database with 4 indexed blocks plus an empty genesis block mapped onto 6 filter maps. The table below that shows the corresponding FilterMapsRange
fields.
map index 0 1 2 3 4 5
+--------+--------+--------+--------+--------+--------+
block number |01111111|11112222|22222222|33333333|34444444|44444 |
block delimiter |* | * | *| |* | |
tx index | 0000111|222 0011|1112333 |11124444| 0022222|22333 |
log index | 0011000|000 0000|0110000 |00000011| 0000011|11000 |
address/topic index | a0a0a01|a01 a0a0|1a0aa01 |a01aa0a0| a0a01a0|12a01 |
last block of map |1 |2 |3 |3 |4 |4 |
block lv pointer 0 1 | 12 | |24 | 33 | |
+--------+--------+--------+--------+--------+--------+
Fig 1. An example of a log index database
VALUES_PER_MAP = 8
FilterMapsRange field |
value |
---|---|
FirstIndexedBlock | 0 |
AfterLastIndexedBlock | 5 |
HeadBlockIndexed | true |
HeadBlockDelimiter | 45 |
FirstRenderedMap | 0 |
AfterLastRenderedMap | 6 |
TailPartialEpoch | 0 |
In the log index database blocks are referred to by block ID instead of block hash, even though at the moment the two are the same. The reason for the distinction is that once the log index root hash gets into the consensus block format, the log indexer should be able to work with a chain view where the latest block is the one currently being processed that does not have a block hash yet. After this fork the block ID will be a hash of the block without the log index root, providing a unique identification of blocks while also allowing indexing during block processing without any special cases in the indexer logic.
Name | Value |
---|---|
BASE_ROW_GROUP | 2**5 |
The indexer has two immutable views of the chain: the indexedView
and the targetView
. All stored data structures are consistent with the indexedView
(even if the indexed data does not reach the head of the indexedView
). The targetView
is based on the latest canonical chain head. If these two are not equivalent then head indexing happens, which means that map rendering starts from the first map that is different according to the targetView
. The map renderer generates a series of maps and corresponding pointers consistent with the targetView
, then the new data is committed to the database and the indexedView
is updated to the targetView
(even if the new index does not reach the head of the new targetView
yet).
When the target head is reached by the log index (the two views are equivalent and the index reaches the chain view head), tail indexing or unindexing happens if necessary. Tail indexing always starts in forward direction from the previous epoch boundary before the beginning of the indexed range and gets merged with the consecutive range of indexed maps once the tail partial epoch is fully rendered. Tail indexing is instantly interrupted and a new head indexing is started if the targetView
changes and can be resumed once the new head is indexed.
Tail unindexing removes entire epochs (except for the last pair of pointers) and can also happen during head indexing according to the desired log history length. Unindexing is very quick (at least with pebble db) as it removes continuous key ranges from the database.
Interface chainView
represents an immutable view of the indexed blockchain that provides block identifiers and belonging sets of receipts for each block number. It is currently implemented by StoredChainView
based on an underlying blockchain but it can also be implemented by the block processor so that the last block is the currently processed one. limitedChainView
shows a trimmed view of an underlying view and is only used internally during startup, in order to provide an indexedView
that is a limited version of the initializing targetView
that is guaranteed to be consistent with the existing index database. This ensures proper head re-indexing without adding special cases to the core indexer logic.
A map renderer is always instantiated by renderMapsBefore(afterLastMap uint32)
which automatically determines the starting point where targetView
diverges from indexedView
and renders maps until before afterLastMap
(which can be MaxUint32
in case of head rendering where the renderer will just replace/remove all previously existing maps after the starting point and keep rendering new maps until its logIterator
is finished. If the renderer replaces the head map then it also updates indexedView
to targetView
. If afterLastMap
is less than AfterLastRenderedMap
of the existing indexed range (tail rendering) then indexedView
is not replaced. If targetView
diverges before the rendering limit then the renderer returns with an error.
The rendering can either start from a map boundary or a cached snapshot of the rendered map previously created on a block boundary. This allows head rendering starting after the previous head or even after short reorgs without re-rendering the entire latest map.
The renderer periodically calls a callback that can block or interrupt the rendering process. An interrupted process can be resumed later. The callback can also change targetView
. The renderer checks whether the rendered data became obsolete when resuming a rendering process or returning from the callback and fails if this happens.
A logIterator
iterates on the linear log value index space and provides the log value hashes used to place marks on the filter maps. It can be initialized at a block boundary if the first log value index is known (which can be known from a last block of map/epoch pointer or a recent snapshot). It can also be started at a map boundary by initializing it at the start of the last block of the previous map and then iterating it to the desired map boundary.
Checkpoints are hardcoded lists of last block number, id and log value pointer triplets for each epoch of a given chain. These are added to the database at initialization. The applicable set of checkpoints and the last applicable epoch are determined automatically based on the initial targetView
. If no checkpoints are available then indexing can be started from genesis.
Snapshots are saved states of rendered maps created at block boundaries. They are only created during head rendering. If no suitable snapshot is available then head rendering can also be started at the previous map boundary.