originally published in September 2012
STIX tries to organize IOC / observable information and give it some context. For example, using STIX/CybOX, you could collect together a set of observables, link them together into an indicator, and then maybe associate that indicator with known TTPs. That TTP could link to other indicators to help with confirmation and attribution to a threat actor or campaign.
I’d like at some point soon to build up a “link database”, something like a triplestore (graph database) that represents all its data in a Subject-Predicate-Object form.
$hash BELONGSTO $malware
$twitterhandle BELONGSTO $person
$ipaddress ISA $sshscanner
(This probably doesn’t look anything like the eventual ontology, and my syntax may or may not fit the RDF standard, but it should illustrate the idea.)
As we build up these links, now we start to have a database of objects (possibly using RDF or similar) that link together in various ways, allowing you to perform appropriate analyses to identify clusters of knowledge. This alternate structure probably isn’t usable for STIX per se, but that doesn’t mean a STIX document couldn’t be an object. For example, if we have a STIX document explaining the details of a piece of malware, you could represent that as
$malware ISDESCRIBEDBY $stixdocument
And then once we get to that $malware
node in your graph DB, now we know where to go for the full low-down. This would work the same if we have a document (dossier) describing a particular person.
To visualize it, think of something like a Maltego or Casefile graph, only much larger than you can reasonably manipulate with that tool. Which is not a criticism; I love Maltego and use it almost every day, but it is a tool for a particular set of use cases. Ideally, you could export a subgraph to a Maltego file, or of course the inverse: use Maltego to do a bunch of research on something, then import all that data into your DB for archival and possible linking to other nuggets of information.
I’ve done some very exploratory work on this, mostly just prototype code related to working with Maltego transforms. Of course my involvement with CIF also has provided some great lessons of all sorts. But I think that, as a question of knowledge management, we can do better than just tracking very basic data (like we do with CIF) or spending a lot of time on enumeration and detailed XSDs. Both of those matter a great deal, and I work on them frequently in my day job, but they just scratch the surface of what we can do with modern data and knowledge management techniques.
Stuff to update: