Answer to comments on uBlock Origin thread: https://github.com/gorhill/uBlock/commit/5733439f629da948cfc3cae74afa519f6cff7b7f as it seems I do not have permission to comment.
Hi,
First of all I'd like to personnaly thank you for all the work you do on uBlock Origin and other extensions, the source code of which have been an inspiration to me personally many times in the past.
I am also really excited that there are multiple people pushing for more accurate measurements of the efficiency of content-blockers and I think sharing methodologies, data and results is a great start!
It is interesting that the results you obtained diverge from the study published yesterday. If I understand correctly you got similar timings for uBlock Origin itself, but the numbers for Adblock Plus do not seem to match (45µs instead of ~19µs). I'd really like to understand where this difference could come from.
The setup we used for the (synthetic) benchmark was the following:
- The version of uBlock Origin we used was commit 29b10d215184aef1a9a12b715b47de9656ecdc3c
- The version of Adblock Plus we used was commit 34c49bbf029e586226220c067c50cec6e8bf8842 of the adblockpluscore repository
- The code used to run the benchmark for Adblock Plus is the following: https://github.com/cliqz-oss/adblocker/blob/master/bench/comparison/adblockplus.js
We initialized an instance of the CombinedMatcher
class using all the network filters (as it seems to be the case in the extension), then used the matchesAny
method of the matcher as an entry-point. Moreover, the parsing of the URLs were performed using tldts
and not included in the measurement. It could be that the parsing and preparation of requests in Adblock Plus is less efficient than in uBlock Origin (which I know is extremely efficient).
The focus of the study was specifically on the network matching engine of the content-blockers and it seems likely that other parts of the extensions are introducing overhead. That's why I really like the in-browser measurement you have setup in uBlock Origin. In the end I guess all of these can be valuable in some way.
@remusao, I consider there is a flaw in the benchmark code to measure matching algorithm.
I can't speak for other filtering engines, but in uBO there is initialization work which is done in a lazy manner, at match() time if required. This initialization works occur only once and thus my opinion is that the only-once initialization work done at match() time should not influence measurement of the match() algorithm.
I believe the new engine created here should be first warmed up by making it go through all the data first, then measurement can proceed.
Edit 1: Quick local changes to add code to warm up the engine before measurements, it does seem warming up yield better results for
cliqz
andublock
(I didn't check the others).Edit 2: Never mind. Thinking more of it, I suppose it makes sense to measure these lazily initializations, given that the benchmark contains over 220K URL to match, this would be the equivalent of valid browsing session in the real world.