Here's an example of how part of yrrc works. Starting with these rules:
wxs@wxs-mbp yrrc % cat rules/test.yara
rule a {
meta:
sample = "24c422e681f1c1bd08286c7aaf5d23a5f088dcdb0b219806b3a9e579244f00c5"
condition:
true
}
rule b {
meta:
sample = "24c422e681f1c1bd08286c7aaf5d23a5f088dcdb0b219806b3a9e579244f00c5"
condition:
true
}
rule c {
meta:
sample = "24c422e681f1c1bd08286c7aaf5d23a5f088dcdb0b219806b3a9e579244f00c5"
condition:
false
}
rule d {
meta:
sample = "foo"
condition:
true
}
wxs@wxs-mbp yrrc %
From these rules we can see that the "24c422e" sample is supposed to match on rules "a", "b", and "c". The fact that the condition of rule "c" means it will never match is not important for this step. Remember, yrrc is only meant to make sure that rules which have a hash in the metadata match that hash. If the rule does not match either the rule is incorrectly written or a YARA regression happened.
When I run yrrc in "collect" mode it reads in those rules and outputs the following:
wxs@wxs-mbp yrrc % DYLD_LIBRARY_PATH=/Users/wxs/src/yara/libyara/.libs ./yrrc -c config.json -m collect > hashes.json
wxs@wxs-mbp yrrc % jq . < hashes.json
{
"24c422e681f1c1bd08286c7aaf5d23a5f088dcdb0b219806b3a9e579244f00c5": {
"expected": [
"a",
"b",
"c"
]
},
"foo": {
"expected": [
"d"
]
}
}
wxs@wxs-mbp yrrc %
As expected, the output is that the "24c422e" sample is expected to match on rules "a", "b" and "c". Also, the "foo" hash is supposed to match on rule "d". I'm aware that "foo" is not a hash but this is just to test something. ;)
The next step is to run yrrc in scan mode, which reads in the json we just output, the YARA rules, and records which samples match which rule. Here's the output:
wxs@wxs-mbp yrrc % DYLD_LIBRARY_PATH=/Users/wxs/src/yara/libyara/.libs ./yrrc -c config.json -m scan
{
"24c422e681f1c1bd08286c7aaf5d23a5f088dcdb0b219806b3a9e579244f00c5": {
"expected": ["a", "b", "c"],
"matches": ["a", "b", "d"],
"yara_error": 0
},
"foo": {
"expected": ["d"],
"matches": [],
"yara_error": 3
}
}
wxs@wxs-mbp yrrc %
As you can see, the "24c422e" sample is matching on rule "a", "b" and "d" but not on rule "c" as we expect. This is considered a mismatch and means that either rule c is incorrectly written (or bad metadata) or that a YARA regression happened. The "foo" hash does not match the expected rule "d" but does have a YARA error which we can use to indicate the file does not exist in our sample set.
Great work Wesley, interesting research you've done 😄