import platform
import yara
print(f"Platform version: {platform.version()}")
print(f"Python version: {platform.python_version()}")
print(f"YARA version: {yara.YARA_VERSION}")
rules = yara.compile(source='rule a { strings: $a = "foo" fullword condition: $a }')
for c in range(256):
#include <ctype.h>
#include <stdio.h>
int main(void) {
for (int i = 0; i <= 255; i++)
printf("0x%02x %u\n", i, !isalnum(i));
return 0;
}
import platform
import yara
print(f"Platform version: {platform.version()}")
print(f"Python version: {platform.python_version()}")
print(f"YARA version: {yara.YARA_VERSION}")
r = """
Test rules:
wxs@wxs-mbp yara % cat rules/test.yara
rule b {
strings:
$a = "LSCOLORS"
condition:
$a
}
One way to find PE files that start at offset 0 and have a single byte xor key:
rule single_byte_xor_pe_and_mz {
meta:
author = "Wesley Shields <[email protected]>"
description = "Look for single byte xor of a PE starting at offset 0"
strings:
$b = "PE\x00\x00" xor(0x01-0xff)
condition:
wxs@wxs-mbp yara % cat rules/sets.yara
rule a0 { condition: false }
rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule c { condition: 2 of (a*) }
rule d { condition: 50% of (a*) }
rule e { condition: 1 of (a1) }
rule f { condition: all of (a1, e) }
wxs@wxs-mbp yara %
This started with a tweet from Steve Miller (https://twitter.com/stvemillertime/status/1508441489923313664) in which he asked what is better for performance: 1 rule with 10k strings or 10k rules with 1 string each? Based upon my understanding of YARA I guessed it wouldn't matter for search time and the difference in bytecode evaluation would be in the noise. Effectively, I guessed you would not be able to tell the difference between the two.
Costin was the first to provide actual results and he claimed a 35 second vs 31 second difference between the two (https://twitter.com/craiu/status/1508445059129163783). That didn't make much sense to me so I asked for his rules so I could test them. He provided me with two rules files (10k.yara and 10kv2.yara) and a text file with a bunch of strings in it.
This is my attempt to replicate his findings and also document why he was getting the warning he was getting. Because I wanted the run to take a bit of time I ended up not using his text file with all the strings (it