Log indexer implementation overview

The purpose of the log indexer is to maintain the log index data structure and keep it consistent with the canonical chain. This structure has four components:

information about the currently indexed range
actual filter map data for a consecutive series of maps
pointers from each indexed block number to the first log value index it occupies in the linear index space
pointers from each rendered map to the number of the block that occupies the last log value index of the map

Last block of map pointers and the corresponding reverse direction block to log value index pointers are also stored for the last map of each epoch before the indexed range. During initialization these pointers are added based on a set of hardcoded checkpoints for the given chain (which is a workaround solution until EIP-7745 is passed and these pointers become provable from the latest head). Later if an indexed section gets unindexed then these pointers are retained. Indexing can be started from such a known epoch boundary.

Database representation

Indexed range

Information about the current state of the log index is stored in a single FilterMapsRange instance:

FirstIndexedBlock, AfterLastIndexedBlock: range in which blocks are fully indexed.
HeadBlockIndexed: if true then AfterLastIndexedBlock-1 is the head block of indexedView and it is fully rendered; if false then AfterLastIndexedBlock exists and is partially rendered.
HeadBlockDelimiter: if HeadBlockIndexed is true then HeadBlockDelimiter points to the next free log value index position where the block delimiter of the current head block will be rendered once the next block is added.
FirstRenderedMap, AfterLastRenderedMap: range in which filter maps are fully rendered. FirstRenderedMap is always the first map of an epoch.
TailPartialEpoch: rendered maps in the epoch before the one that starts with FirstRenderedMap. Tail epochs are always rendered staring with FirstRenderedMap - MAPS_PER_EPOCH.TailPartialEpoch is always less than MAPS_PER_EPOCH. If the last map of the tail partial epoch is rendered then it is merged to the main index, FirstRenderedMap is reduced by MAPS_PER_EPOCH and TailPartialEpoch is set to zero. FirstIndexedBlock is also updated when this merge happens and it is not affected by the partial epoch.

Note that leftover filter row and pointer entries may exist in the database outside the valid range after an unclean shutdown. These entries can be ignored and they are overwritten or removed when the given range gets properly rendered. If FilterMapsRange is missing from the database then the log index should be considered non-existent and any leftover filter row or pointer objects should be removed from the database before initialization.

Filter rows

In order to improve database size and access speed efficiency, base layer filter data (which is very often just a few bytes per row) is not stored under individual keys for each row of each map. Rows not longer than BASE_ROW_LENGTH and the first BASE_ROW_LENGTH entries of longer rows are stored in BASE_ROW_GROUP sized groups as tightly packed FilterMapBaseRows instances. Rows longer than BASE_ROW_LENGTH also have an individual FilterMapExtRow instance which contains the rest of the filter row entries.

Database keys for filter rows are constructed so that they are ordered by epoch first, then by row index, then by map index inside the epoch (in binary terms the row index is interjected in the middle of the map index). This is the same pattern as the one EIP-7745 defines for tree hashing, and for the same reason (efficient access for searching).

Last block of map

For each rendered map and the last map of each epoch before the rendered range, a FilterMapLastBlock instance is stored that consists of the block number and block ID of the block that occupies the last log value index of the map. Note that the block delimiter after each block is added when the next block is added, therefore if delimiter of block N falls to the last log value index of the map then the last block entry will point to block N+1. If the head block is N then the head delimiter is not added yet and the last block entry will point to the head block N.

Log value pointer of block

For each fully or partially indexed block and the last block of every epoch before the rendered range, a BlockLvPointer instance is stored that is a the first log value index occupied by the block. Note that though the block delimiter of block N-1 is generated when the block N is added, the BlockLvPointer points to the first log value index of block N. This also means that the first log value index pointer of the last block of map M may point to the beginning of map M+1.

Example

The figure below shows an example of a log index database with 4 indexed blocks plus an empty genesis block mapped onto 6 filter maps. The table below that shows the corresponding FilterMapsRange fields.

map index            0        1        2        3        4        5
                    +--------+--------+--------+--------+--------+--------+
block number        |01111111|11112222|22222222|33333333|34444444|44444   |
block delimiter     |*       |   *    |       *|        |*       |        |
tx index            | 0000111|222 0011|1112333 |11124444| 0022222|22333   |
log index           | 0011000|000 0000|0110000 |00000011| 0000011|11000   |
address/topic index | a0a0a01|a01 a0a0|1a0aa01 |a01aa0a0| a0a01a0|12a01   |
last block of map   |1       |2       |3       |3       |4       |4       |
block lv pointer    0 1      |    12  |        |24      | 33     |        |
                    +--------+--------+--------+--------+--------+--------+

Fig 1. An example of a log index database

VALUES_PER_MAP = 8

`FilterMapsRange` field	value
FirstIndexedBlock	0
AfterLastIndexedBlock	5
HeadBlockIndexed	true
HeadBlockDelimiter	45
FirstRenderedMap	0
AfterLastRenderedMap	6
TailPartialEpoch	0

Block identifiers

In the log index database blocks are referred to by block ID instead of block hash, even though at the moment the two are the same. The reason for the distinction is that once the log index root hash gets into the consensus block format, the log indexer should be able to work with a chain view where the latest block is the one currently being processed that does not have a block hash yet. After this fork the block ID will be a hash of the block without the log index root, providing a unique identification of blocks while also allowing indexing during block processing without any special cases in the indexer logic.

Constants

Name	Value
BASE_ROW_GROUP	2**5

Indexer logic

The indexer has two immutable views of the chain: the indexedView and the targetView. All stored data structures are consistent with the indexedView (even if the indexed data does not reach the head of the indexedView). The targetView is based on the latest canonical chain head. If these two are not equivalent then head indexing happens, which means that map rendering starts from the first map that is different according to the targetView. The map renderer generates a series of maps and corresponding pointers consistent with the targetView, then the new data is committed to the database and the indexedView is updated to the targetView (even if the new index does not reach the head of the new targetView yet).

When the target head is reached by the log index (the two views are equivalent and the index reaches the chain view head), tail indexing or unindexing happens if necessary. Tail indexing always starts in forward direction from the previous epoch boundary before the beginning of the indexed range and gets merged with the consecutive range of indexed maps once the tail partial epoch is fully rendered. Tail indexing is instantly interrupted and a new head indexing is started if the targetView changes and can be resumed once the new head is indexed.

Tail unindexing removes entire epochs (except for the last pair of pointers) and can also happen during head indexing according to the desired log history length. Unindexing is very quick (at least with pebble db) as it removes continuous key ranges from the database.

Immutable chain views

Interface chainView represents an immutable view of the indexed blockchain that provides block identifiers and belonging sets of receipts for each block number. It is currently implemented by StoredChainView based on an underlying blockchain but it can also be implemented by the block processor so that the last block is the currently processed one. limitedChainView shows a trimmed view of an underlying view and is only used internally during startup, in order to provide an indexedView that is a limited version of the initializing targetView that is guaranteed to be consistent with the existing index database. This ensures proper head re-indexing without adding special cases to the core indexer logic.

Map rendering

A map renderer is always instantiated by renderMapsBefore(afterLastMap uint32) which automatically determines the starting point where targetView diverges from indexedView and renders maps until before afterLastMap (which can be MaxUint32 in case of head rendering where the renderer will just replace/remove all previously existing maps after the starting point and keep rendering new maps until its logIterator is finished. If the renderer replaces the head map then it also updates indexedView to targetView. If afterLastMap is less than AfterLastRenderedMap of the existing indexed range (tail rendering) then indexedView is not replaced. If targetView diverges before the rendering limit then the renderer returns with an error.

The rendering can either start from a map boundary or a cached snapshot of the rendered map previously created on a block boundary. This allows head rendering starting after the previous head or even after short reorgs without re-rendering the entire latest map.

The renderer periodically calls a callback that can block or interrupt the rendering process. An interrupted process can be resumed later. The callback can also change targetView. The renderer checks whether the rendered data became obsolete when resuming a rendering process or returning from the callback and fails if this happens.

Log iterator

A logIterator iterates on the linear log value index space and provides the log value hashes used to place marks on the filter maps. It can be initialized at a block boundary if the first log value index is known (which can be known from a last block of map/epoch pointer or a recent snapshot). It can also be started at a map boundary by initializing it at the start of the last block of the previous map and then iterating it to the desired map boundary.

Checkpoints

Checkpoints are hardcoded lists of last block number, id and log value pointer triplets for each epoch of a given chain. These are added to the database at initialization. The applicable set of checkpoints and the last applicable epoch are determined automatically based on the initial targetView. If no checkpoints are available then indexing can be started from genesis.

Snapshots

Snapshots are saved states of rendered maps created at block boundaries. They are only created during head rendering. If no suitable snapshot is available then head rendering can also be started at the previous map boundary.

zsfelfoldi/log_index.md