The purpose of the log index search mechanism is to return a set of logs that might potentially match a given pattern. The main search function GetPotentialMatches
receives the following parameters:
- block range
- a set of addresses one of which has to match the log address (an empty set is interpreted as a wild card)
- a set of topics for each of at most 4 topic positions, one of which has to match the topic at the given position (an empty set is interpreted as a wild card)
GetPotentialMatches
constructs a pattern matcher based on the given addresses and topics and translates the block range into log value index range. Then it runs the pattern matcher on the filter maps covering the searched range, running multiple workers in parallel, each processing one epoch at a time. After evaluating the matcher it looks up the actual logs at the resulting potentially matching log value indices. Parallel worker results are collected and returned in chronological order.
Note that GetPotentialMatches
may still return false positives and the final filtering for actual matches is done by the filterLogs
function in eth/filters
.
The pattern matcher is a modular construction built from building blocks that all implement the matcher
interface (see below). For every individual log value a singleMatcher
is constructed that returns log value positions where the given log value might be found. A matchAny
is constructed from singleMatcher
child instances for the set of matching addresses and the sets of matching topics for each position. matchAny
returns log value positions where one of the children might have a match. Note that an empty matchAny
acts as a wild card. Finally a matchSequence
is constructed from matchAny
child instances that returns a potential match for log value index N
if the child matchers all returned a match for their position in the sequence (N
for the address, N+1
for the first topic, N+2
for the second topic and so on). Note that sequence matchers are often constructed from child matchers that give matches with a very different frequency. In these cases efficiency is achieved by adaptively adjusting the evaluation order to first evaluate the child that is cheapest to evaluate and/or gives empty sets of results most often so that the more expensive matchers do not need to be evaluated at all for most of the maps.
Pattern matcher example:
addresses = [A1, A2]
topics = [[T1], [T2, T3]]
The table below shows the occurences of A1, A2, T1, T2 and T3 in a single map, expected pattern matches shown in bold:
log value index | topic or address |
---|---|
565 | A1 |
566 | T1 |
1381 | A1 |
1382 | T1 |
1383 | T3 |
24578 | A2 |
24580 | T2 |
33402 | T2 |
45293 | A2 |
45294 | T1 |
45295 | T3 |
55322 | T1 |
The next table shows the result sets of individual matchers, false positives shown in italics:
matcher | results |
---|---|
singleMatcher(A1) | 565, 1381, 5567 |
singleMatcher(A2) | 24578, 45293 |
singleMatcher(T1) | 566, 1382, 45294, 55322 |
singleMatcher(T2) | 24580, 33402 |
singleMatcher(T3) | 1383, 12343, 45295 |
matchAny(A1, A2) | 565, 1381, 5567, 24578, 45293 |
matchAny(T1) | 566, 1382, 45294, 55322 |
matchAny(T2, T3) | 1383, 12343, 24580, 33402, 45295 |
matchSequence | 1381, 45293 |
The matcher backend (represented by the MatcherBackend
interface) provides access to the log index data. It is implemented by FilterMapsMatcherBackend
that serves log index data directly from the log indexer's local database but it can also be implemented using a remote provider and local proof verification. Since the underlying data source is updated as the chain progresses, the interface does not assume that the returned data stays consistent with a single view of the chain throughout the search process. Instead it provides the SyncLogIndex
function which lets the searcher know the block range that wasn't modified during the search and therefore the results in that range can be considered correct. If this guaranteed valid range is shorter than desired then the filter mechanism starts another search request for the invalidated range.
A matcher that implements the matcher
interface is constructed for a specific search pattern. Its newInstance
function creates a matcherInstance
looking for matches in a given set of filter maps. The instance can run in multiple getMatchesForLayer
iterations, in each iteration processing data on the next mapping layer and returning a list of matching log value indices for those maps where it has already found all matches. The set of maps processed by a single matcherInstance
typically covers one epoch or the interesting subset of it, except for matchSequence
which in some cases also requests the first map of the next epoch from its child matcher instances.
The dropIndices
function of matcherInstance
signals to the instance that the results for the given maps are no longer required, should not be processed further in subsequent layer iterations and the available partial results can be discarded. This per-layer evaluation method with optional abort is useful for matchSequence
which does not always know in advance which one of its child matchers will evaluate in multiple expensive iterations and which one will quickly return an empty set of results, allowing the evaluation of the other, expensive children to be aborted after processing the first (still cheap) layer. Though this interface adds some extra complexity, in some cases it can improve search efficiency by an order of magnitude.
Results for finished maps are returned in a matcherResult
which consists of a map index and a corresponding potentialMatches
list. getMatchesForLayer
should be called for more layer iterations until results for all requested maps that have not been dropped during the process are returned. Note that an empty set of results is represented as a zero-length potentialMatches
list while nil
value has a special purpose; it represents a "wild card" result meaning that the entire map range is a potential match. A zero length matchAny
returns wild cards for all maps. This simplifies matchSequence
construction with wild cards at certain positions. An all wild card matcher is not allowed (GetPotentialMatches
fails if the top level matcher returns one) as it would require an extra special case to process and it would also be much slower to look up every log by index than to simply revert to unindexed log search.
A singleMatcher
creates singleMatcherInstance
instances that can filter for a single log value. The instance maintains a set of requested maps and their partial results if available. In each mapping layer iteration getMatchesForLayer
fetches the appropriate row of the requested maps from the matcher backend, filters out potential matches from the list of column indices and adds them to the partial results. If a row had more entries than the maximum row length for the given layer then the list of entries is truncated and extra entries are ignored. If the row had fewer entries than the maximum then the final results are returned for the corresponding map and the map is removed from the requested set. dropIndices
also removes the specified maps from the requested set and removes partial results.
matchAny
is constructed from a list of child matchers and creates matchAnyInstance
instances that contain a similar list of instances of the child matchers, each initialized with the same set of requested maps. getMatchesForLayer
always requests results from all children. matchAnyInstance
keeps track of any results returned by children until all children have returned results for a certain map. When this happens, it returns the merged results. dropIndices
discards existing results of the specified maps and drops the same indices from all children.
matchSequence
is constructed from two child matchers: a base
and next
matcher. matchSequenceInstance
contains the corresponding child instances. getMatchesForLayer
evaluates the two children in an order determined by a heuristic function that approximates the cost of evaluation and the chance of empty result sets from each child. After evaluating each child, map indices are dropped from the other child if the results returned from the recently evaluated child guarantee that no possible result from the other would yield a combined result from matchSequenceInstance
. A sequence matched result N
is returned if baseInstance
returned potential match N
and nextInstance
returned N+offset
where offset
is a fixed parameter of matchSequence
. Note that though matchSequence
only handles two children in order to reduce code complexity, the newMatchSequence
function can recursively construct a sequence pattern matcher from any number of child matchers.
baseInstance
is instantiated with the same set of requested maps as matchSequenceInstance
. On the other hand, for each map M
requested from matchSequenceInstance
, both maps M
and M+1
are requested from nextInstance
because baseInstance
results for the last few log value indices of map M
might have to be matched against nextInstance
results for the first few log value indices of map M+1
. Similarly, dropIndices
always drops all specified map indices from baseInstance
but only drops map M
from nextInstance
if map M-1
is also either dropped or the baseInstance
results for map M-1
are known and do not contain any log value indices from the end of the map that could potentially match with anything from map M
.