Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save stvemillertime/096b6d4ab71510b9cf849e39cc06d8a1 to your computer and use it in GitHub Desktop.

Select an option

Save stvemillertime/096b6d4ab71510b9cf849e39cc06d8a1 to your computer and use it in GitHub Desktop.

I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
  condition:
    for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %

Eliminate the compiler by pre-compiling the rules and then run them a few times:

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        4.94 real         4.91 user         0.02 sys
a /dev/null
        4.88 real         4.85 user         0.01 sys
a /dev/null
        4.89 real         4.87 user         0.01 sys
a /dev/null
        4.97 real         4.95 user         0.01 sys
a /dev/null
        4.82 real         4.79 user         0.02 sys
wxs@wxs-mbp yara %

Somewhere just under 5 seconds to run that (horrible) rule.

Here is the same thing with my loop optimization branch. All this branch does is stop running the expression inside the loop as soon as the condition is met. In our rule this is as soon as the expression evaluates to true one time. We have to recompile the rule again since my patch is modifying the bytecode emited by the compiler.

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
wxs@wxs-mbp yara %

What impact does this have on real world rules? I'm collecting some data right now, but if you have rules that have loops in them that could run a lot of times I'd love to see them, along with a handful of samples that match so I can benchmark some more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment