new log indexer.md

Log indexer overview

This document gives a general overview of the new log indexer implementation.

mapDatabase

This structure is responsible for database access. Unlike the old implementation, this database structure only stores fully rendered maps (the last partially rendered map is simply re-rendered in memory at startup, takes a fraction of a second). It can also mark certain maps dirty, allowing the layer above it to clean up asynchronously. Similarly to the old implementation, it handles four types of database entries:

map range: a single entry that contains information about existing map entries. If it is missing then the database is considered not initialized and everything in the filtermaps key range will be deleted before initializing.
- version: database is uninitialized if it does not match the version known by the client.
- valid maps: row data and corresponding pointers in this map index rangeSet exist in the database and should be considered valid. This range set can consist of 0, 1 or 2 continuous sections, each starting at an epoch boundary. If there are 2 sections then the first one is a partially rendered tail epoch right before the start of the main range.
- dirty maps: row data and corresponding pointers in this map index rangeSet might exist in the database but should be considered invalid. Note that the block log value pointers are indexed by block number and the dirty map range does not directly translate to the dirty block range (as the last block of map pointers on the endpoints of the dirty range might also be missing or invalid) but it can always be determined at the valid maps boundaries so block pointers can be range deleted between the closest valid boundaries (or either end of the uint64 range if there are no more valid maps in either direction). This is taken care of in the mapStorage layer (see extendDeletedPointerRange).
- known epochs: number of epochs where the last block of map pointer at the last map of the epoch and the corresponding block log value pointer are available. These should always be available for every epoch in or before the rendered range in order to allow rendering historical epochs. This number could be usually inferred from the valid maps range, except for the case when the checkpoint initialization already happened but there are no completely rendered maps yet.
last block of map pointers: similarly to the old implementation; one pointer per valid map, plus at the known epoch boundaries, points to the block that generated the last log value of the map.
block log value pointers: similarly to the old implementation; one pointer per block pointing to the first belonging log value, where this log value is in a valid map, plus at the known epoch boundaries.
row data: indexed by map index, row index and layer index. Mapping layers allow different row lengths. There are four different allowed lengths, specified in Params.maxRowLength. Row data is stored in four corresponding parts, each containing rowData[maxRowLength[i]:maxRowLength[i+1]]. The mapping logic allows more mapping layers but they have the same size as maxRowLength[3]. In order to make batch row reads at lower layers faster and also reduce the number of keys in the database, the row sections of the lower two layers are stored in horizontal groups (same row index, subsequent map indices). The size of these groups is specified in Params.rowGroupSize. Row data updates (both write and dirty cleanup) are handled in batches by mapDatabase.writeMapRows (see details in function description).

Note that the read operations getFilterMap, getFilterMapRows, getBlockLvPointer and getLastBlockOfMap do not care about maps marked dirty at this level, they just return whatever is in the database.

mapStorage

This is a storage layer between mapDatabase and Indexer that hides the complexity arising from batch row data updates, row data groups and dirty map cleanup. It uses a memory layer over the database, introducing a third map index rangeSet called overlay. This allows mapStorage to implement addMap and deleteMaps with only in-memory operations and guaranteed low latency, allowing Indexer to be a simple passive structure. It implements getFilterMap, getFilterMapRows, getBlockLvPointer and getLastBlockOfMap similarly to mapDatabase, only with the dirty maps and memory overlay maps also considered, so that the results are instantly consistent with addMap and deleteMaps calls.

It is the caller's responsibility to stop adding new maps if there are too many memory overlay maps in order to avoid excessive memory consumption. isReady() returns true if the number of memory maps is below the mtBusy threshold. It is guaranteed that mapStorage returns to the ready state automatically by committing memory maps to the underlying database.

mapStorage runs an event loop in a goroutine that repeatedly calls doWriteCycle until there is nothing else to do or the storage is in a suspended state. Its internal selectEvent(blocking bool) function blocks the loop whenever the suspended atomic variable equals 1 or when the last return value of doWriteCycle was false. Whenever a change happens that might unblock the blocked loop, trigger() is called. The suspended state is switched externally by the suspendOrResume(suspend bool) function. The indexer uses this feature to suspend the write loop during block processing (when Indexer.Suspend is called) and resume when Indexer.AddBlockData is called.

doWriteCycle does not always instantly start a write process whenever there are memory overlay maps available. If possible, it tries to collect a batch of memory maps, ideally a full row group according to Params.rowGroupSize[0] so that lower layer row groups can be committed with a single write operation. This typically happens when rendering historical data. addMap has a forceCommit flag which is true when the added map is the latest one. In this case a write cycle is triggered instantly. A write cycle always operates on a single epoch and ensures that the underlying database has a valid format at all times (valid maps are located in a continuos range starting at an epoch boundary, plus optionally another continuous range of a partially rendered tail epoch starting at the previous epoch boundary). This is important because mapStorage does not start new write operations at shutdown and even terminates the current write process when stop() is called. In order to keep mapDatabase valid even during the write operation, doWriteCycle first sets the written map range to dirty in the stored map range, then sets it to valid after all rows of the entire map index range have been updated.

Write cycles are triggered under three conditions (in order of priority):

triggered by reaching row group boundary or forceCommit: whenever an addMap triggers a write cycle, the index of that epoch is added to the epochTrigger range set, then the event loop is resumed with a trigger() call.
forced by the total number of overlay maps reaching the mtForceWrite threshold. In this case the epoch with the largest number of overlay maps is selected.
an epoch with no maps to write (only dirty maps): happens typically when tail epochs are unindexed but a chain head reset or reorg can also trigger it.

memoryMap and finishedMap

There are two different representations of a map stored in memory. They both store row data and corresponding pointers (last block of map and block log value pointers that fall in the range of the given map). The last block pointer of the list always corresponds to the last block of the map. Row data is stored in a different way in the two representations but they both implement a getRow(rowIndex, maxLen uint32) FilterRow function.

memoryMap is a linked list structure that can be initialized as empty and implements addToRow(rowIndex, value uint32), making it suitable for rendering maps and representing a partially rendered map at the head of the index. Partially rendered maps are never stored on disk.

finishedMap can be created by memoryMap.finished() *finishedMap when the map is fully populated. It stores the data of each row continuously and in the order of row index, therefore it cannot add any more row entries but it requires significantly less memory and getRow is also faster. Low memory requirement for overlay maps is important for database efficiency because it allows collecting large enough batches to write horizontal row groups in a single write operation.

renderState and IndexView

These two structures both consist of a number of finishedMaps and a memoryMap at the end of a continuous rendered index range. renderState is used for rendering while IndexView provides an immutable view of a snapshot of the index. Both are created by Indexer and each can be initialized based on the other. storeHeadIndexView creates an IndexView snapshot based on the headRenderer *renderState. initSnapshot can initialize the head renderer based on a snapshot. Note that renderState can also be initialized at a map boundary (see initMapBoundary) if there is no suitable snapshot available. This is the case at startup and after a long chain rewind/reorg. In case of a map boundary initialization, the first block that renderState received will be checked against an expected block hash (the last, probably partially rendered block of the previous map). If this hash does not match then Indexer will revert the last map and try to initialize at an earlier boundary, ensuring a correct index even if the stored maps are inconsistent with the current canonical chain.

renderState does not have a reference to the underlying storage as it is only responsible for creating new maps. The nextBlock field contains the block number of the next expected block. The required block data (receipts and header) can be delivered with the addReceipts(receipts types.Receipts) and addHeader(header *types.Header) (uint32, []*finishedMap) (in this order). Any maps completed while indexing the block are returned as a return value of addHeader and not stored any further by renderState. If the entire renderRange has been completed then finished() bool returns true and adding more blocks has no effect (this is only relevant in case of tail rendering).

IndexView does have a reference to mapStorage and provides read functions GetFilterMapRows, GetBlockLvPointer and GetLastBlockOfMap similarly to the underlying storage. It provides an immutable view and has a fixed block range and map range, which is based on the tailEpoch (the oldest completely rendered epoch) and the last rendered head block at the time of the view's creation. The indexer takes care of not unindexing a tail epoch as long as it is part of an active IndexView. On the head end of the view range IndexView has a fixed number of overlay maps, the head map being a memoryMap and finishedMaps before that. This ensures the immutability, as long as the valid maps of the underlying storage are not reverted deeper than what the view's overlay maps can mask. This could only happen under extreme conditions (huge reorg or chain rewind) but if it does then the indexer invalidates the affected view which will return an error to any further read attempts. Since an IndexView requires some active maintenance by the indexer, each view should be released with the Release() function when no longer needed.

Indexer

Indexer implements core.Indexer interface where it receives header and receipts data for new blocks from, gets notified about reorgs and can also request data for older block ranges. Its target is to have a continuous indexed range starting from an epoch boundary just old enough so that it covers the block range headNumber+1-config.History .. headNumber (or 0 .. headNumber if config.History is zero). It always has a headRenderer *renderState that is initialized either at genesis, at a matching checkpoint or after the latest valid map stored in the underlying mapStorage. tailRenderer *renderState is non-nil only when a tail epoch rendering is in progress.

Most of the indexer functionality is realized in the AddBlockData call which receives block data for new heads and requested historical ranges, and returns the new needBlocks historical requested range and the ready status flag (which reflects mapstorage.isReady()). This function realizes multiple tasks:

head block number tracking: whenever a header with a higher block number is received, headNumber is updated (AddBlockData can only increase it, Revert can decrease it).
checkpoint initializaton: if headNumber > headRenderer.nextBlock and there is a potential checkpoint candidate between the two then its BlockNumber is requested in the needBlocks response. If such a block is being delivered and if its block hash matches the checkpoint, headRenderer is reinitialized at that checkpoint. If not then the checkpoint candidate is ruled out and not requested again.
feeding the head renderer: if headRenderer.nextBlock is delivered then the received receipts and header are added to headRenderer. If headNumber >= headRenderer.nextBlock (and there are no more checkpoint candidates) then headRenderer.nextBlock .. headNumber is requested in the needBlocks response.
feeding the tail renderer: if tailRenderer exists and tailRenderer.nextBlock is delivered then the received receipts and header are added to tailRenderer. If tailRenderer.finished() is true then it is removed. If headNumber+1 == headRenderer.nextBlock (chain head is indexed) then tailRenderer.nextBlock .. tailRenderLast is requested in the needBlocks response (tailRenderLast is the last block of the last map of the rendered tail epoch).
updating the tail state: when headNumber is updated, targetTailEpoch is also updated. If tail epochs can be removed (also considering the tail epochs of active IndexViews) then this is performed instantly (with mapStorage this is just a memory operation). If tailRenderer does not exist and more tail epochs are needed then a new tailRenderer is initialized at the previous epoch boundary. Note that tail rendering might also be limited by the history cutoff point set by SetHistoryCutoff.

GetIndexView(headBlockHash common.Hash) *IndexView returns (if available) an immutable snapshot view of the index corresponding to a canonical chain with the given head block hash. These snapshots are created when the chain head is indexed and are available for a fixed number of most recently indexed chain heads. This greatly simplifies the search logic and will also be suitable to serve the new execution API where the caller can specify a recent reference chain head. The same snapshots are used by Revert whenever possible (typically in case of short reorgs).

zsfelfoldi/new log indexer.md

Select an option

No results found