wxs@wxs-mbp yara % cat rules/sets.yara
rule a0 { condition: false }
rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule c { condition: 2 of (a*) }
rule d { condition: 50% of (a*) }
rule e { condition: 1 of (a1) }
rule f { condition: all of (a1, e) }
wxs@wxs-mbp yara %This started with a tweet from Steve Miller (https://twitter.com/stvemillertime/status/1508441489923313664) in which he asked what is better for performance: 1 rule with 10k strings or 10k rules with 1 string each? Based upon my understanding of YARA I guessed it wouldn't matter for search time and the difference in bytecode evaluation would be in the noise. Effectively, I guessed you would not be able to tell the difference between the two.
Costin was the first to provide actual results and he claimed a 35 second vs 31 second difference between the two (https://twitter.com/craiu/status/1508445059129163783). That didn't make much sense to me so I asked for his rules so I could test them. He provided me with two rules files (10k.yara and 10kv2.yara) and a text file with a bunch of strings in it.
This is my attempt to replicate his findings and also document why he was getting the warning he was getting. Because I wanted the run to take a bit of time I ended up not using his text file with all the strings (it
| Today for #100DaysOfYARA I want to further explore one of my favorite topics | |
| "How to reliably detect libraries", or how to identify that a particular program has linked or otherwise included a particular library. | |
| Detecting libraries (especially ones written in C) pose unique challenges compared to malware, to include: | |
| - libraries tend to be platform/architecture nonspecific | |
| - compilerisms overwhelm otherwise decent signal | |
| - copy/pasta and groupthink across libraries |
| # Simple script to demo use of yara-python + externals | |
| # think of all the externals you could define! | |
| import os | |
| import sys | |
| import yara | |
| example_rule = ''' | |
| rule demo_externals | |
| { |
| Today for #100DaysOfYARA I want to dive in to some of the dirty secrets of creating/maintaining code-based YARA signatures | |
| Let's use SQLite3 as an example. Go get the source here (I prefer the amalgamation): | |
| https://sqlite.org/download.html | |
| I would like to reliably detect when a file is using SQLite. I often look at Windows executables, so I'm going to first concentrate on x86 programs that use this library. The easiest way to find them is to first concentrate on cleartext strings. In this case, I'm gonna pop over to VirusTotal and search for an easily-identifiable string: | |
| content: "failed to allocate %u bytes of memory" type:pe |
| import "pe" | |
| import "math" | |
| import "hash" | |
| rule IterateResourcesDemo | |
| { | |
| meta: | |
| description = "Example rule to iterate over PE resources and calculate entropy, MD5 and check for strings" | |
| strings: |
| #!/usr/bin/env python | |
| # for our homey, Claude Shannon | |
| import sys | |
| import logging | |
| import binascii | |
| import hashlib | |
| import argparse |
| #!/usr/bin/env python | |
| # -*- coding: utf-8 -*- | |
| # Thomas Roccia | IconDhash.py | |
| # pip3 install lief | |
| # pip3 install pillow | |
| # resource: https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html | |
| import lief | |
| import os | |
| import argparse |
| import "pe" | |
| import "hash" | |
| import "math" | |
| rule packedTextSection { | |
| meta: | |
| description = " Look for high-entropy .text sections within PE files " | |
| author = "Droogy" | |
| DaysOfYARA = "3/100" |
| """ | |
| got_tmilk.py - Go Type Milking | |
| Written by Ivan Kwiatkowski @ Kaspersky GReAT | |
| Shared under the terms of the GPLv3 license | |
| """ | |
| C_HEADER = """ | |
| enum golang_kind : __int8 | |
| { | |
| INVALID = 0x0, |