Skip to content

Instantly share code, notes, and snippets.

@zsfelfoldi
Last active March 9, 2025 08:17
Show Gist options
  • Save zsfelfoldi/1ea898e5ad242c70f5b800cda7233adf to your computer and use it in GitHub Desktop.
Save zsfelfoldi/1ea898e5ad242c70f5b800cda7233adf to your computer and use it in GitHub Desktop.
eip title description author discussions-to status type category created
7745
Light client and DHT friendly log index structure
An efficient, light client and DHT friendly replacement for block header bloom filters
Zsolt Felföldi (@zsfelfoldi)
Draft
Standards Track
Core
2024-07-17

Abstract

Replace the fixed 2048 bit log event bloom filters in block headers with a new data structure that can adapt to the changing number of events per block and consistently guarantee a sufficiently low false positive ratio.

The proposed structure maps all log entries onto a global linear index space and hashes them into a Merkle tree based on that index. It also contains a filter map for every fixed length section of the index space, hashed into the same Merkle tree. These are two dimensional sparse bit maps that allow searching for log address/topic patterns. Unlike the per-block bloom filters, they allow searching for specific events by accessing only a small portion of the entire dataset which can also be proven with a Merkle proof, making the search both efficient and light client friendly. They also provide exact position information for potential hits. Instead of signaling probable presence in a given block, potential hits derived from a filter map are pointers to the index space and can be directly used to look up the log at the given index from the Merkle tree of log entries.

The proposed structure can be efficiently used both for local search and for remote proof generation/verification, thereby simplifying implementation of provers and verifiers. It also allows validators that are not interested in either searching or proving logs to generate the log index root hash by maintaining a minimal log index state with a relatively small (hard capped) size.

Motivation

Adding logs has a significantly lower gas cost and should accordingly be less resource consuming than writing to the state. The original design of bloom filters in each block achieves this goal as there is no complex data structure like the state to update, the set of logs emitted in each block is all contained within the header and receipts belonging to that block. Logs mostly just have long term storage costs. On the other hand, searching logs in a long range of blocks is very expensive.

Bloom filters are only useful as long as they are sufficiently sparse. False positive ratio rises rapidly with the number of events per filter and the density of 1 bits in the filter bit vector. In the currently existing bloom filter each log address and topic sets 3 out of a fixed length of 2048 bits which resulted in sufficiently sparse filters in the beginning but with the increase of the block gas limits the false positive ratio soon made the filter practically useless. Mainnet blocks currently add over 1000 log addresses and topics in average and therefore the bloom filter size would need to increase about tenfold in order to achieve acceptable false positive rates again. This would raise block header size to about 3 kilobytes. Even if the size of the per-block bloom filters would be raised to a sufficient level, log search would still require accessing the entire header chain. Searching in just the most recent one year history would cost over 6 gigabytes of data, not counting the access of actual logs where the bloom filters have a match. The current situation is even worse, requiring a large portion of the full block receipt sets to be accessed due to the high false positive rate of the bloom filters.

Specification

Terms and definitions

  • log value: either an address value or a topic value. Each LOG opcode adds one address value and 0..4 topic values. A log value is represented by a 32 byte hash which is calculated as sha2(address) or sha2(topic)
  • log value index: values are globally mapped to a linear index space, with a monotonically increasing log value index assigned to each added log value. The log values are added in the order of EVM execution (address value first, then the topic values) so the logs generated in each block and in each transaction of the block occupy a continuous range in the index space. A block delimiter is also added between blocks which has its own log value index and is added to the Merkle tree of log entries but not to the filter maps.
  • log entry: an SSZ encoded log event with position metadata (a LogEntry container) is added to the log_entries Merkle tree at the first log value index assigned to the event (the one assigned to the address value). The entries at indices assigned to topic values are left empty (a LogEntry with all zero fields). The position metadata contains the block number but not the block hash since the block hash is not known yet when the block where the newly added logs were emitted is still being constructed. If needed, the block hash can be found in the block delimiter positioned after the logs emitted in the given block (except for the head block which does not have a block delimiter yet but its hash is always expected to be known to the prover/verifier).
  • block delimiter: a special entry in the log value index space that is placed between the logs emitted by each block. In order to fit into the fixed shape tree structure its Merkle tree shape is identical to the LogEntry container, with the Log part being empty and the meta info part is a BlockDelimiterMeta container instead of a LogMeta. The two are always distinguishable based on the dummy_value field which goes in place of the log_in_tx_index of log entries and always has a value of 2**64-1. It references the block by number and hash. Each block delimiter is placed after the logs emitted by the referenced block when the next block is added (when the hash of the referenced block is already known), before the logs emitted by that block. This makes it easy to prove the log value index boundaries of a searched block range and also allows proving the hash of the blocks emitting the matching logs which is required for filling out the position metainfo of the RPC response.
  • filter map: a MAP_WIDTH by MAP_HEIGHT sized sparse bit map intended to help searching for log values in a fixed VALUES_PER_MAP length section of the log value index space. Each log value is marked on the map at a row and column that depends on the log value index and the log value itself. Rows are sparsely encoded as a list of marked column indices (in strictly ascending order, which also coincides with the order of occurence). Each map contains at most VALUES_PER_MAP marks and therefore the chance of false positives is kept at a constant low level.
  • filter epoch: a MAPS_PER_EPOCH sized group of consecutive filter maps stored in the hash tree in a way so that multiple rows of adjacent filter maps with the same row index can be efficiently retrieved in a single Merkle multiproof. The log value to row index mapping is constant during a single epoch but changes between epochs.

Consensus data format

Block headers

Beginning at the execution timestamp FORK_TIMESTAMP, execution clients MUST replace the logs_bloom field of the header schema with log_index_root which is the root hash of the LogIndex structure after adding the logs emitted in the given block.

The resulting RLP encoding of the header is therefore:

rlp([
    parent_hash,
    0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347, # ommers hash
    coinbase,
    state_root,
    txs_root,
    receipts_root,
    log_index_root,
    0, # difficulty
    number,
    gas_limit,
    gas_used,
    timestamp,
    extradata,
    prev_randao,
    0x0000000000000000, # nonce
    base_fee_per_gas,
    withdrawals_root,
    blob_gas_used,
    excess_blob_gas,
    parent_beacon_block_root,
])

Container types

class LogIndex(Container):
    epochs: Vector[LogIndexEpoch, MAX_EPOCH_HISTORY]
    next_index: uint64                                                # next log value index to be added

class LogIndexEpoch(Container):
    filter_maps: Vector[Vector[FilterRow, MAPS_PER_EPOCH], MAP_HEIGHT]
    log_entries: Vector[LogEntry, MAPS_PER_EPOCH * VALUES_PER_MAP]    # LogEntry containers at the first index of each log event, empty otherwise

type FilterRow = ProgressiveByteList[MAX_BASE_ROW_LENGTH * log2(MAP_WIDTH) // 8 * MAPS_PER_EPOCH, LAYER_COMMON_RATIO]   # assumes that MAP_WIDTH is a power of 256

class LogEntry(Container):
    log: Log
    meta: LogMeta

class LogMeta(Container):
    block_number: uint64
    transaction_hash: Root
    transaction_index: uint64
    log_in_tx_index: uint64

class Log(Container):
    address: ExecutionAddress
    topics: List[Bytes32, MAX_TOPICS_PER_LOG]
    data: ProgressiveByteList[MAX_LOG_DATA_SIZE, 4]

class BlockDelimiterEntry(Container):
    dummy_log: Log        # zero address and empty lists
    meta: BlockDelimiterMeta

class BlockDelimiterMeta(Container):
    block_number: uint64
    block_hash: Root
    timestamp: uint64
    dummy_value: uint64  # 2**64-1
ProgressiveByteList container

ProgressiveByteList[CAPACITY, COMMON_RATIO] is defined as a byte list container type that consists of multiple byte vectors of different size. The size of the first vector is 32 bytes (a single chunk) while the size of further vectors is growing according to a geometric sequence with the specified common ratio (a power of 2), with a potential exception for the last vector which might be smaller (but still a power of 2) according to the total number of chunks required for realizing the specified capacity. As shown on the figure below, in the Merkle hashing scheme the smaller vectors are close to the root, allowing less hashing and shorter proofs if the actual list is significantly smaller than the maximum capacity or only the first part of the list needs to be proven.

ProgressiveByteList[3000, 4]

       V3  V4
        \  /
     V3  \/
      \  /
   V2  \/
    \  /
 V1  \/
  \  /
   \/  LEN
    \  /
     \/
    ROOT
    
V1: ByteVector[32]
V2: ByteVector[128]
V3: ByteVector[512]
V4: ByteVector[2048]
V5: ByteVector[512]

Fig 1. Merkle hashing scheme of the ProgressiveByteList container

In this example a list with a capacity limit of 3000 is realized using 5 vectors. Note that the last vector is smaller than what would follow in the geometric sequence because the total size of 32+128+512+2048+512 is already enough to store 3000 bytes.

Log entries and block delimiters

Log events with position metadata are added to the log_entries tree at the log value index assigned to the address value of the log. This allows a simple Merkle proof of all fields of the eth_getLogs JSON-RPC response except for the blockHash which cannot be hashed into the log_entries tree at the time of block processing because the hash of the processed block is not known yet. Before adding entries of the next block, a BlockDelimiterEntry is also added with the previous block's number and hash.

The following table shows an example of mapping log entries and block delimiters onto the log value index space:

Block number Transaction index Log index Log event log value indices
0 block delimiter 0
1 0 0 Addr, Topic1, Topic2, Topic3 1, 2, 3, 4
1 0 1 Addr, Topic1, Topic2, Topic3 5, 6, 7, 8
1 1 0 Addr, Topic1, Topic2 9, 10, 11
1 1 1 Addr, Topic1 12, 13
1 1 2 Addr, Topic1, Topic2 14, 15, 16
1 block delimiter 17
2 0 0 Addr, Topic1, Topic2, Topic3, Topic4 18, 19, 20, 21, 22

Filter map row encoding

Each row of the filter map is encoded as a series of little endian binary encoded column indices. With the proposed value of MAP_WIDTH = 2**24 this results in a simple and efficient encoding as a series of 3 byte values. The number of indices in a row may vary in a wide range. The total number of indices in a fully populated filter map is VALUES_PER_MAP minus the number of block delimiters which are not marked on the map. Though the average row size is about VALUES_PER_MAP // MAP_HEIGHT, the upper limit of individual row length is VALUES_PER_MAP.

Proposed constants

Name Value
MAP_WIDTH 2**24
MAP_HEIGHT 2**16
VALUES_PER_MAP 2**16
MAPS_PER_EPOCH 2**10
MAX_EPOCH_HISTORY 2**24
MAX_BASE_ROW_LENGTH 2**3
LAYER_COMMON_RATIO 2**4

Constructing the filter map

For each VALUES_PER_MAP long section of the log value index space a filter map is generated. These are fixed size MAP_WIDTH by MAP_HEIGHT sparse bit maps and each log value is marked on the map with a single bit being set to one. Block delimiters are not marked on the map but otherwise the density of the maps are close to constant. The number of marks in a row (the length of the sparse encoded row) is referred to as "row length", not to be confused with the constant MAP_WIDTH.

Design goals

The proposed data structure is intended to represent a balance between the cost of adding items and accessing old ones. The filter maps have a fixed tree shape, making in-memory maintenance, Merkle hashing and DHT distribution easy to implement. Filter entries are sorted into rows based on content and position in a way that allows quick linear database access and size efficient Merkle proofs. The difficulties arising from certain types of events being much more frequent than others are also mitigated.

Update and maintenance costs are also limited as tree nodes are eventually finalized and the number of non-finalized non-empty nodes is always hard capped, ensuring moderate memory requirements. Initialization costs of the data structure at any point of the chain are also capped. Additional database storage costs of filter maps is about 15% of the size of the actual logs. The Merkle tree structure also makes it easy to discard entire epochs along with the corresponding Merkle subtrees, making the implementation of history expiry of the log index simple.

Epochs and mapping layers

In order to allow efficient search of a certain log value in a long historical range, filter maps are organized into epochs, each consisting of a fixed MAPS_PER_EPOCH number of maps. In the most usual case (when row density is around average or below) row mapping stays the same during an entire epoch. The database and the hash tree both should be organized in a way that instead of putting all rows of a single map close to each other, rows of the same _row index_throughout an entire epoch are close to each other and therefore are cheap to access and/or prove with a Merkle proof.

In order to mitigate collisions in densely populated rows, the concept of mapping layers is introduced, meaning that if a certain row already has a certain number of entries then the row mapping is changed and very frequent log values are mapped into multiple rows. Initially, when a map is empty, every log value is mapped using mapping layer 0 or the "base layer". If a row reaches MAX_BASE_ROW_LENGTH then any further log values mapped onto that row in the base layer mapping will use layer 1 mapping instead. On layer 1 a different row is assigned to the same log value and the row length limit in increased to MAX_BASE_ROW_LENGTH * LAYER_COMMON_RATIO. If this row also reaches its limit, layer 2 mapping is used and so on. The row length limit increases exponentially until is reaches MAX_BASE_ROW_LENGTH * MAPS_PER_EPOCH where it does not grow further. Note that a row filled at a certain mapping layer can be grown further on a higher layer. Different log values colliding in the same row on a certain layer are probably mapped in different rows on the next layer, which means that a very popular value might populate multiple rows (also very long ones) but an unlucky less popular one colliding with it on base layer will probably just have to move one layer up. The search process is similar, if the searcher finds that the row belonging to the searched value is full according to the base layer limit then it also has to check the next layer and so on, until in finds a non-full row.

If a row is longer than the limit according to the layer the searcher is looking at then it can safely ignore the extra entries assuming that they were added by another value on a higher layer. The ProgressiveByteList container makes it efficient to prove row data belonging to the lower layer even if there is much more data in the same row added on a higher layer.

Row mapping

While base layer row mapping stays the same for an entire epoch, higher layer mappings are changed more frequently. Each mapping change has a cost in terms of database access overhead and Merkle proof size overhead and epoch size is determined in a way that these overheads stay sufficiently low compared to the cost of accessing the actual useful data. On higher layers where the rows are longer, a more frequent remapping is possible because the useful data size per map is also higher. It is also desirable so that a less frequent log value will only suffer from colliding with longer rows for a shorter time.

The row index and maximum row length are calculated as follows (note that from_binary32 and to_binary32 uses little endian encoding):

def get_row_index(map_index, log_value, layer_index):
     layer_factor = MIN(LAYER_COMMON_RATIO ** layer_index, MAPS_PER_EPOCH)
    row_frequency = MAPS_PER_EPOCH // layer_factor
    return from_binary32(sha2(log_value + to_binary32(map_index - map_index % row_frequency) + to_binary32(layer_index))[0:4]) % MAP_HEIGHT

The following figure shows how log values are mapped to rows on different mapping layers. Each dot represents a map row and the numbers indicate the mapping layer on which the row has been assigned to the given log value. Note that it might happen that a higher layer mapping coincides with a lower layer mapping for the same value; this does not cause any problem though as the row is simply grown further on the higher layer. The search algorithm can also simply revisit the same row in a higher layer iteration if necessary and process the rest of the row that it first ignored.

map index        111111 1111222222222233 3333333344444444 4455555555556666
       0123456789012345 6789012345678901 2345678901234567 8901234567890123
      +----------------+----------------+----------------+----------------+
row 0 |2........2......|2...............|...2............|........2.......|
row 1 |........1111.2..|.....2..1111....|1111.....2...2..|2111..2......2..|
row 2 |0000000000000000|.2..2....2..2...|....2111....2...|..2.....11112...|
row 3 |....2..22...1111|..........2.1111|2.......2.2.1111|.......2...2....|
row 4 |.2..1111..2....2|1112..2.2....2..|0000000011110020|...21111.2....2.|
row 5 |...2........2...|...............2|.2...2.........2|.2...2..........|
row 6 |1111.22....2..2.|0020000000020000|.......2...2....|..........2.1112|
row 7 |..2.............|....1112......2.|..2...2.........|0000200000000000|
      +----------------+----------------+----------------+----------------+
            epoch 0          epoch 1          epoch 2          epoch 3

Fig 2. Row mapping of a single log value on different mapping layers

MAP_HEIGHT = 8
MAPS_PER_EPOCH = 16
LAYER_COMMON_RATIO = 4

Column mapping

Column mapping assumes that MAP_WIDTH is a multiple of VALUES_PER_MAP. column index is calculated as follows:

def get_column_index(log_value_index, log_value):
    log_value_width = MAP_WIDTH // VALUES_PER_MAP                      # constant
    column_hash = fnv_1a(to_binary64(log_value_index) + log_value)     # 64-bit FNV-1A hash
    collision_filter = (column_hash // (2**64 // log_value_width) + column_hash // (2**32 // log_value_width)) % log_value_width
    return log_value_index % VALUES_PER_MAP * log_value_width + collision_filter

As shown on the figure below, this mapping practically assigns a log_value_width by MAP_HEIGHT rectangle to each log value index and ensures that each log value places exactly one mark in its own rectangle (the letters A-D represent different log values). This property also ensures that log value index can be restored from map index and column index, column indices never collide and keep the original order of log value indices, allowing efficient Merkle exclusion proofs of certain column indices in long rows.

column             11 1111 1111 2222 2222 2233
       0123 4567 8901 2345 6789 0123 4567 8901
      +---------------------------------------+
row 0 |.... .... .... .... .... .... .... ....|
row 1 |.... .... .... .... .... .C.. .... ....|
row 2 |.A.. .... ...A .... A... .... ...A ....|
row 3 |.... .... .... .... .... .... .... ....|
row 4 |.... .... .... .... .... .... .... ....|
row 5 |.... .B.. .... ...B .... .... .... ....|
row 6 |.... .... .... .... .... .... .... ..D.|
row 7 |.... .... .... .... .... .... .... ....|
      +---------------------------------------+

Fig 3. A single filter map with 8 entries of 4 different log values

MAP_WIDTH = 32
MAP_HEIGHT = 8
VALUES_PER_MAP = 8

Updating the log index

The following pseudocode shows how to add log data of a new block to the log index:

# Add all log values emitted in the block to the log index; should be called even if the block is empty
def add_block_logs(log_index, block):
    if block.number > 0:
        # add block delimiter entry
        block_delimiter_meta = BlockDelimiterMeta(block_hash = block.parent_hash, block_number = block.number-1, timestamp = block.parent.timestamp, dummy_value = 2**64-1)
        block_delimiter_entry = BlockDelimiterEntry(meta = block_delimiter_meta)
        log_index.epochs[log_index.next_index // (VALUES_PER_MAP*MAPS_PER_EPOCH)].log_entries[log_index.next_index % (VALUES_PER_MAP*MAPS_PER_EPOCH)] = block_delimiter_entry
        log_index.next_index += 1
    # add log entries and mark log values on filter maps
    for tx_index, receipt in enumerate(block.receipts):
        tx_hash = sha3(block.transactions[tx_index])
        for log_in_tx_index, log in enumerate(receipt.logs):
            log_meta = LogMeta(transaction_hash = tx_hash, block_number = block.number, transaction_index = tx_index, log_in_tx_index = log_in_tx_index)
            log_entry = LogEntry(meta = log_meta, log = log)
            log_index.epochs[log_index.next_index // (VALUES_PER_MAP*MAPS_PER_EPOCH)].log_entries[log_index.next_index % (VALUES_PER_MAP*MAPS_PER_EPOCH)] = log_entry
            add_log_value(log_index, address_value(log.address))
            for topic in log.topics:
                add_log_value(log_index, topic_value(topic))
    block.log_index_root = hash_tree_root(log_index)

# Mark a single log value on the filter maps
def add_log_value(log_index, log_value):
    bytes_per_column = log2(MAP_WIDTH) // 8   # assumes that map width is a power of 256
    map_index = log_index.next_index // VALUES_PER_MAP
    epoch_index = map_index // MAPS_PER_EPOCH
    map_subindex = map_index % MAPS_PER_EPOCH
    column_index = get_column_index(log_index.next_index, log_value)
    row = []
    layer_index = 0
    while True:
        layer_factor = MIN(LAYER_COMMON_RATIO ** layer_index, MAPS_PER_EPOCH)
        max_row_length = MAX_BASE_ROW_LENGTH * layer_factor
        row_index = get_row_index(map_index, log_value, layer_index)
        row = log_index.epochs[epoch_index].filter_maps[row_index][map_subindex]
        if len(row) < max_row_length * bytes_per_column:
            break
        layer_index += 1
        max_row_length = MIN(max_row_length * LAYER_COMMON_RATIO, MAX_BASE_ROW_LENGTH * MAPS_PER_EPOCH)
    row.append(to_binary32(column_index)[:bytes_per_column])
    log_index.next_index += 1

def address_value(address):
    return sha2(address)

def topic_value(topic):
    return sha2(topic)

Finding potential matches

Determining whether a column index found in a row that is relevant for the searched log value is possible by restoring the log value index and then calculating the column index from it again in order to check whether the quasi-random collision filter part matches the expected value for the given log value:

def get_log_value_index(map_index, column_index):
    log_value_width = MAP_WIDTH // VALUES_PER_MAP                        # constant
    return map_index * VALUES_PER_MAP + column_index // log_value_width

def is_potential_match(map_index, column_index, log_value):
    return get_column_index(get_log_value_index(map_index, column_index), log_value) == column_index

Iterating through all relevant mapping layers and corresponding rows is similar to how new log values are added. Filtering all potential matches from all relevant rows can be done with the following funcions:

def get_potential_matches(log_index, map_index, log_value):
    matches = []
    epoch_index = map_index // MAPS_PER_EPOCH
    map_subindex = map_index % MAPS_PER_EPOCH
    layer_index = 0
    while True:
        layer_factor = MIN(LAYER_COMMON_RATIO ** layer_index, MAPS_PER_EPOCH)
        max_row_length = MAX_BASE_ROW_LENGTH * layer_factor(layer_index)
        row_index = get_row_index(map_index, log_value, layer_index)
        row = log_index.epochs[epoch_index].filter_maps[row_index][map_subindex]
        for column_index in row:
            if is_potential_match(map_index, column_index, log_value):
                matches.append(get_log_value_index(map_index, column_index))
        if len(row) < max_row_length * bytes_per_column:
            break
        layer_index += 1
        max_row_length = MIN(max_row_length * LAYER_COMMON_RATIO, MAX_BASE_ROW_LENGTH * MAPS_PER_EPOCH)
    return matches

Initialization and minimal state

A Merkle tree updated with strictly monotonically increasing keys can be initialized with a Merkle branch of the next leaf to be added. If the keys are non-strictly monotonical (the same leaf can be updated multiple times) then the previous leaf value itself is also needed. In case of LogIndexMinimalState epochs are strictly monotinically added. The log_entries trees are also strictly monotinical while the MAPS_PER_EPOCH sized subtrees belonging to each row index are non-strictly monotonically updated.

class LogIndexMinimalState:
    next_index: uint64                                                              # next free index where the block delimiter of the previous head will be added
    epoch_branch: Vector[Bytes32, log2(MAX_EPOCH_HISTORY)]                          # merkle branch of the epoch where next_index points
    filter_map_rows: Vector[FilterRow, MAP_HEIGHT]                                  # rows of the filter map where next_index points
    filter_map_branches: Vector[Vector[Bytes32, log2(MAPS_PER_EPOCH)], MAP_HEIGHT]  # merkle branches of each row of the filter map where next_index points
    log_entries_branch: Vector[Bytes32, log2(MAPS_PER_EPOCH * VALUES_PER_MAP)]      # merkle branch of log entry where next_index points

This structure consists of log2(MAX_EPOCH_HISTORY)+MAP_HEIGHT*log2(MAPS_PER_EPOCH)+log2(MAPS_PER_EPOCH * VALUES_PER_MAP) branch hashes and in total at most VALUES_PER_MAP column indices in MAP_HEIGHT rows. Assuming that the variable length of each filter row is encoded in two bytes, with the proposed constants this amounts to 32*(24+65536*10+10+16)+3*65536+2*65536 = 21300800 bytes. This number can be considered as a worst case hard cap on the amount of data required to initialize the log index data structure at any point.

Providing LogIndexMinimalState on request needs to be added to the sync protocol in order to allow freshly synced nodes to bootstrap. A validator/block producer that does not want to generate proofs of historical log index data and only wants to verify or generate consensus while keeping the memory overhead minimal can also discard old Merkle tree nodes after update and use this structure as its log index state.

Note that initialization is also possible on an epoch boundary, in which case the log index is very cheap to initialize (only requires the epoch branch) but in this case all block headers and receipts between the boundary and the current head are required to generate the last unfinished epoch.

Rationale

Log value index space

In each block a varying number of log values are emitted. In addition to inefficient search, another drawback of per-block fixed size bloom filters is the varying filter utilization leading to over-utilized filters giving many false positives in some blocks and/or wastefully under-utilized filters in some blocks. Block gas limits also tend to change significantly over the long term so any future-proof solution has to be able to adapt to the varying number of log values per block.

Mapping log values on their own linear index space ensures uniform filter utilization of identically structured filter maps. Compared to the alternative of constantly changing adaptive filter size, this approach greatly simplifies the storage and tree hashing scheme and the construction of Merkle multiproofs covering longer block ranges. It also allows mapping each address and topic value separately to consecutive indices and implementing specific address/topic pattern filters.

Log entries tree

Hashing entire logs along with position info into a tree greatly simplifies the remote proving/verifying process. There is no need to separately prove the canonicalness of block headers and the receipts referenced in them, everything can be proven with Merkle proof of a single LogIndex structure and the BlockDelimiterMeta belonging to the same block number if the block hash is needed.

Storing the log_entries subtrees directly in their proposed merkleized format on disk is not really efficient though. This is not an issue for validators that want to maintain a minimal state; for them, updating the log_entries subtrees is really cheap as they only need to maintain a single Merkle branch pointing to the next log value index. Provers can implement log_entries efficiently by storing a subset of the Merkle tree nodes (the ones close to the root of each epoch's subtree) and generate the rest on demand based on the receipts that encode the same data in a more compact form.

Alternative filter structures considered

In a search structure of a constantly growing dataset there is typically a tradeoff between the cost of adding new data and the cost of searcing the existing dataset. One extreme is just linearly storing the data, which is practically the case now with logs, with the bloom filters being mostly useless. The other extreme is one big Merkle tree with all log values ever used as keys and the list of all occurences (possibly in a further merkleized format) as values. With billions of unique log values, adding new entries here is expected to have costs similar to that of the state, with multiple lookups and modifications/insertions at random places in a database on the order of magnitude of hundreds of gigabytes. Another issue where this is similar to the state is that removing old entries is hard and expensive. Adding logs is supposed to be cheaper than writing the state so solutions between these two extremes were considered as potentially practical, with multiple smaller structures generated periodically.

One question considered was whether to add separate keys for each unique log value emitted in the given period, or to use a more compressed fixed size tree format where different log values might collide (though preferably not too many of them). The second option may also include some kind of small probabilistic filter information that can help filter out the occurences of colliding log values without having to access/prove the entire logs belonging to them. This decision mostly boiled down to data access efficiency, both in terms of local disk access and remote Merkle proof size. Identically structured trees can be efficiently arranged in larger units (called epochs here), with values belonging to the same key in subsequent trees of an epoch located close to each other. This improves database access speed. It also allows smaller Merkle proofs with a series of leaves encoded together in an efficient format and internal nodes on only two boundary branches. Database writes are also efficient as the order of adding tree entries is not random and all the non-finalized parts of the tree can be kept in memory with a hard capped memory requirement.

The other design decision considered here was whether to hash entire logs into the list of log value occurences or just store position info and have a separate tree of log entries. This does not necessarily affect local storage efficiency which should probably only store position info in the local database anyways in order to avoid duplicating log data but could still generate the hash tree based on the full log data. Though the separate filter maps and log entry trees do present some additional complexity, the second option was chosen because of the size of Merkle proofs proving matches of multiple log value patterns. Tests have shown that realistic log searches often yield a lot more matches for the individual log values themselves that the pattern itself. Hashing entire logs into the occurence lists would mean that the proof would have to include at least the root hashes of all the individual log value matches, while in the second case only the position index is needed which is more than 10x smaller with the proposed parameters.

In conclusion, for the given application the fixed tree size approach with separate position info plus probabilistic collision filter approach seemed to be the most appropriate approach. Since the log value position info can be conveniently merged with the collision filter, the whole structure can be imagined as a sparse bit map on which each search operation can be thought of as applying a mask to the bit map.

False positive rate

From the filter maps a set of potential matches can be derived for any block range and log value or pattern of log values. These matches can then be looked up in the corresponding log_entries trees and actually matching logs can be added to the set of results. The design guarantees that the set of potential matches includes all actual matches but and also has a consistent rate of random false positive rate.

False positives can happen when the quasi-random collision filter part of a column index accidentally matches the expected value even though it was generated by a log value other than the searched one. The chance of this happening is VALUES_PER_MAP / MAP_WIDTH per colliding enrty in a row that is relevant for the search. Assuming that most entries in a map are different from the searched one, assuming uniform random distribution of entries, the average number of colliding entries found in a relevant row is VALUES_PER_MAP / MAP_HEIGHT.

Though certain log values might be emitted a lot more than others and therefore the row index distribution might not be entirely uniform, periodical remapping of rows and using multiple mapping layers ensures that over a long enough search period random collisions with more frequent log values do even out. Mapping layers do have another consequence though; if any row has at least MAX_BASE_ROW_LENGTH entries then the search logic requires looking into another row that is mapped to the searched log value on the next mapping layer. The maximum possible number of such rows is VALUES_PER_MAP / MAX_BASE_ROW_LENGTH and therefore the chance of randomly hitting one is VALUES_PER_MAP / MAX_BASE_ROW_LENGTH / MAP_HEIGHT in the worst case. In this case an extra row has to be processed, with extra chance of finding false positives. A collision with a frequent value at a certain mapping layer does not indicate a collision on the next layer though, therefore the expected number of entries in that row is no different from the first one. Having to process a third row would presume that the second one had at least MAX_BASE_ROW_LENGTH * LAYER_COMMON_RATIO entries. The chance of this happening after the first coincidence is practically negligible in the context of expected false positives.

The expected number of false positives for a single log value search can be estimated as VALUES_PER_MAP ^ 2 / MAP_WIDTH / MAP_HEIGHT * (1 + VALUES_PER_MAP / MAX_BASE_ROW_LENGTH / MAP_HEIGHT) per filter map. With the proposed constants this roughly equals 0.0044 false positives per map. As of March 2025 the average number of log values emitted in a mainnet block is slightly over 1000 while a filter map consists of 65536 log values. This gives a rough estimate of one false positive per 14000 blocks, which costs the searcher an extra lookup in a log_entries tree. The expected number of false positives in the entire chain history is around 1200.

Note that this is only true for a single value search while a typical pattern search requiring certain values on multiple positions has an exponentially lower false positive rate. For example if the pattern is [Addr, Topic1, Topic2] then three log value searches are performed and an actual log lookup is only necessary if the first search yields N, the second N+1 and the third N+2 simultaneously. If necessary, the rate can be easily reduced by using a higher MAP_WIDTH, at the cost of growing the size of encoded rows.

Backwards Compatibility

The existing log filter API (eth_getLogs, eth_newFilter, eth_getFilterLogs, eth_getFilterChanges) can be implemented with the new filter data structure. Applications relying on this API can operate without any change, with a higher performance. Repricing the LOG opcode might be considered after performing benchmarks but the extra processing cost is not significant while the extra storage cost is around 15%. Other than that the EVM is not affected in any way as it only emits logs but does not directly access them.

Security Considerations

Safe access with a remote prover

In order to prove a complete set of matches matching a given search pattern in a given block range, the prover needs to

  • prove the log value index range that corresponds to the searched block number range by proving the block delimiter entries of first_block - 1 and last_block
  • prove the relevant rows of filter maps based on map index and row index (verifier can determine the relevant rows in advance based on the log values in the search pattern and the relevant log value index range)
  • prove the actual log entry belonging to any potentially matching log value index and also the block delimiter entry with the same block number if the blockHash of the log position info is needed

Since all three steps can be realized with Merkle proofs of the same LogIndex structure referenced in the block headers, any search with a remove prover is as safe as the client's knowledge about the chain head.

Deliberate false positive attacks

The design guarantees that false positive rates do even out statistically over several epochs, even in case of random collisions with very frequent values, ensuring that an excessive amount false positives will not make the bandwidth and processing costs of the search prohibitively high. All of this is true for random collisions only though, not deliberately created collisions. A deliberate attack on a certain important log value in order to raise its false positive rate can not be ruled out entirely since with a low amount of filter data generated per log value it is always possible to "mine" another value that generates colliding filter data. The column mapping used here makes this attack a lot harder though, since the column index depends on both the log value and the exact log value index, making this attack only possible for block creators who are probably offered MEV rewards for much more lucrative manipulations of the transaction set than making the search of certain log events slightly more expensive.

Copyright

Copyright and related rights waived via CC0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment