-
-
Save emaadmanzoor/b23d46c9d4d064746e1ef667cb8070bc to your computer and use it in GitHub Desktop.
StreamSpot takes input as a TSV file containing on each line the following fields:
source_id source_type destination_id destination_type edge_type graph_id
The fields are data-typed as follows:
source_id
anddestination_id
: Integer greater than 1 (0 stands for NA, like anonymous memory maps).source_type
,destination_type
,edge_type
: Single byte (we've been using ASCII characters so far).graph_id
: Integer greater than 0.
An example record in StreamSpot's format is: 4 a 5 c p 0
.
The data it currently uses is available here.
We are using the CDM 12 schema.
CDM-formatted JSON data contains events in the following format:
{"datum":{"com.bbn.tc.schema.avro.Event":{"uuid":"=ÖmÖä<8f><99>\u0015øNe¢÷<84>\f®ïC<94>é®<85>\u0001j\u0000 ǹs\u001AÓï","sequence":65362,"type":"EVENT_OPEN","threadId":28005,"source":"SOURCE_LINUX_AUDIT_TRACE","timestampMicros":{"long":1450217860803},"name":null,"parameters":null,"location":null,"size":null,"programPoint":null,"properties":{"map":{"eventId":"65362"}}}},"CDMVersion":"12"}
Unlike edges, CDM events are not self-contained: the only thing available is the event type (EVENT_OPEN
in the example above). The rest of the metadata is contained in other non-event records occuring both previously and after the event record in the CDM data. Let's look at these records now.
Subjects correspond to entities like processes, threads and units.
{"datum":{"com.bbn.tc.schema.avro.Subject":{"uuid":"~#<92>Å7<9f>ÜUèÓ\ryEQ¬¾\"Ëõ]<96>xb3#}UcÏåÿ<90>","type":"SUBJECT_PROCESS","pid":28005,"ppid":27999,"source":"SOURCE_LINUX_AUDIT_TRACE","startTimestampMicros":null,"unitId":{"int":0},"endTimestampMicros":null,"cmdLine":null,"importedLibraries":null,"exportedLibraries":null,"pInfo":null,"properties":{"map":{"uid":"0","programName":"sudo","group":"1003"}}}},"CDMVersion":"12"}
Objects correspond to entities like files, sockets or memory addresses.
{"datum":{"com.bbn.tc.schema.avro.FileObject":{"uuid":"Ý<9c>¨©<96>°<98>\r\u0012p <87>ͦ¤õ\u0005L<91>D<8c>Â<96><97>\\;\u0011µbW<82>S","baseObject":{"source":"SOURCE_LINUX_AUDIT_TRACE","permission":null,"lastTimestampMicros":null,"properties":null},"url":"file:///etc/login.defs","isPipe":false,"version":0,"size":null}},"CDMVersion":"12"}
Principals correspond to entities like local and remote users, which StreamSpot doesn't use as of now.
{"datum":{"com.bbn.tc.schema.avro.Principal":{"uuid":"\u001DÕ\u0019.ÖÑ8,¸ïB:»lò\u0011<96>?3Æ\rÌYï\u001C\u0013É7²Õðç","type":"PRINCIPAL_LOCAL","userId":0,"groupIds":[1003],"source":"SOURCE_LINUX_AUDIT_TRACE","properties":{"map":{"euid":"0","egid":"1003"}}}},"CDMVersion":"12"}
All 3 of the above CDM records appeared before the event record in the file.
CDM edges are what connect events to subjects, objects and principals. The CDM edge types we care about are:
EDGE_EVENT_ISGENERATEDBY_SUBJECT
: When an event is generated by a process/thread/unit.EDGE_EVENT_AFFECTS_*
: When an event writes to a file/registry/socket or forks a process.EDGE_*_AFFECTS_EVENT
: When an event reads from a file/registry/socket.
The entire CDM edge type list is here.
Continuing the example from above, the following CDM edge records following the event provide the metadata we need:
{"datum":{"com.bbn.tc.schema.avro.SimpleEdge":{"fromUuid":"=ÖmÖä<8f><99>\u0015øNe¢÷<84>\f®ïC<94>é®<85>\u0001j\u0000 ǹs\u001AÓï","toUuid":"Ý<9c>¨©<96>°<98>\r\u0012p <87>ͦ¤õ\u0005L<91>D<8c>Â<96><97>\\;\u0011µbW<82>S","type":"EDGE_FILE_AFFECTS_EVENT","timestamp":1450217860803,"properties":null}},"CDMVersion":"12"}
This tells us that a specific file object (identified by the UUID) was read during the event; so it provides us the destination type. By following this CDM edge to the object UUID, we can find out the destination ID (by mapping the filename to an ID).
{"datum":{"com.bbn.tc.schema.avro.SimpleEdge":{"fromUuid":"=ÖmÖä<8f><99>\u0015øNe¢÷<84>\f®ïC<94>é®<85>\u0001j\u0000 ǹs\u001AÓï","toUuid":"~#<92>Å7<9f>ÜUèÓ\ryEQ¬¾\"Ëõ]<96>xb3#}UcÏåÿ<90>","type":"EDGE_EVENT_ISGENERATEDBY_SUBJECT","timestamp":1450217860803,"properties":null}},"CDMVersion":"12"}
This tells us which process/thread/unit subject (identified by the UUID) generated the event. By following this edge to the subject UUID, we can find out the source type and source ID.
By reading all the records preceding the event record and following the event record until the next event record, we can collect all the metadata needed to (almost) construct a StreamSpot edge (for the example above):
- Edge type:
EVENT_OPEN
, directly from the event record. - Source type:
SUBJECT_PROCESS
, from a CDM record following the event record. - Source ID:
"pid":28005
, from a CDM record preceding the event record. - Destination type:
EDGE_FILE_AFFECTS_EVENT
, from a CDM record following the event record. - Destination ID:
"url":"file:///etc/login.defs"
, from a CDM record preceding the event record. Note that the source and destination may need to be swapped here since the file is read and information flows to the process.
We still need to apply the following transformations:
- Map source, destination and type names to single bytes: this can be done using the schema.
- Map each destination ID (filename, memory address, URL) to an integer index.
We will maintain these maps in memory. What's left now is assigning graph ID's.
We maintain a mapping from process/unit/thread ID to graph ID. Subsequent rules assume processes. The rules of graph ID assignment are:
- Two process ID's will have the same graph ID if one was forked from the other.
- Two process ID's will have different graph ID's if neither was forked from the other.
- Two process ID's will have different graph ID's if the parent forked then exec'd.
- Graph ID's start from 0, so the first process is assigned graph ID 0.
The mechanics will work as follows:
- When a new
SUBJECT_PROCESS
CDM record is read:- Check if its
pid
already has an assigned graph ID. - If a graph ID is already assigned to the
pid
, then continue. (this case should not occur, ideally) - If a graph ID is not assigned to the
pid
, then check if itsppid
already has an assigned graph ID:- If a graph ID is assigned to the
ppid
, assign the same graph ID to thepid
. - If a graph ID is not assigned to the
ppid
, assign the next available graph ID to thepid
.
- If a graph ID is assigned to the
- All subsequent events generated by this
pid
will use this assigned graph ID for the StreamSpot edge.
- Check if its
- When a new
EVENT_EXECUTE
CDM record is read:- Note the
threadId
field of the event: this is thepid
of the child process. - Assign the next available graph ID to this
pid
, replacing any existing assignment. - All subsequent events generated by this
pid
will use this assigned graph ID for the StreamSpot edge.
- Note the
#!/usr/bin/env python | |
import argparse | |
import json | |
import pdb | |
import sys | |
# CDM record type constants | |
CDM_TYPE_PRINCIPAL = 'com.bbn.tc.schema.avro.Principal' | |
CDM_TYPE_SUBJECT = 'com.bbn.tc.schema.avro.Subject' | |
CDM_TYPE_FILE = 'com.bbn.tc.schema.avro.FileObject' | |
CDM_TYPE_MEM = 'com.bbn.tc.schema.avro.MemoryObject' | |
CDM_TYPE_SOCK = 'com.bbn.tc.schema.avro.NetFlowObject' | |
CDM_TYPE_EDGE = 'com.bbn.tc.schema.avro.SimpleEdge' | |
CDM_TYPE_EVENT = 'com.bbn.tc.schema.avro.Event' | |
parser = argparse.ArgumentParser() | |
parser.add_argument('-i', '--input', help='Input CDM/JSON file', required=True) | |
parser.add_argument('-o','--output', help='Output StreamSpot edge file', required=True) | |
args = vars(parser.parse_args()) | |
input_file = args['input'] | |
output_file = args['output'] | |
uuid_to_pid = {} | |
uuid_to_pname = {} | |
uuid_to_url = {} | |
uuid_to_sockid = {} | |
uuid_to_addr = {} | |
pid_to_graph_id = {} | |
current_graph_id = 0 | |
with open(input_file, 'r') as f: | |
event_metadata_buffer = {} # filled/cleared on every new event | |
streamspot_edge = {'event_uuid': None, | |
'source_id': None, | |
'source_name': None, | |
'source_type': None, | |
'dest_id': None, | |
'dest_name': 'NA', | |
'dest_type': None, | |
'edge_type': None, | |
'graph_id': None | |
} # filled/cleared on every new event | |
lineno = 0 | |
for line in f: | |
line = line.strip() | |
lineno += 1 | |
cdm_record = json.loads(line) | |
cdm_record_type = cdm_record['datum'].keys()[0] | |
cdm_record_values = cdm_record['datum'][cdm_record_type] | |
#print cdm_record | |
if cdm_record_type == CDM_TYPE_PRINCIPAL: | |
continue # we don't care about principals | |
elif cdm_record_type == CDM_TYPE_SUBJECT: | |
uuid = cdm_record_values['uuid'] | |
type = cdm_record_values['type'] | |
if type == 'SUBJECT_PROCESS': | |
pid = cdm_record_values['pid'] # source ID | |
ppid = cdm_record_values['ppid'] # needed to assign graph ID | |
pname = cdm_record_values['properties']['map']['programName'] | |
unitid = cdm_record_values['unitId']['int'] | |
#uuid_to_pid[uuid] = str(pid) + '/' + pname + '/' + str(unitid) | |
uuid_to_pid[uuid] = pid | |
uuid_to_pname[uuid] = pname | |
if not pid in pid_to_graph_id: # pid has no gid assigned | |
if not ppid in pid_to_graph_id: # ppid has no gid assigned | |
pid_to_graph_id[pid] = current_graph_id | |
pid_to_graph_id[ppid] = current_graph_id | |
current_graph_id += 1 | |
else: # parent has a gid assigned | |
pid_to_graph_id[pid] = pid_to_graph_id[ppid] | |
else: | |
print "Unknown subject type:", type | |
sys.exit(-1) | |
elif cdm_record_type == CDM_TYPE_FILE: | |
uuid = cdm_record_values['uuid'] | |
url = cdm_record_values['url'] # destination ID | |
uuid_to_url[uuid] = url | |
elif cdm_record_type == CDM_TYPE_SOCK: | |
uuid = cdm_record_values['uuid'] | |
src = cdm_record_values['srcAddress'] | |
dest = cdm_record_values['destAddress'] | |
src_port = cdm_record_values['srcPort'] | |
dest_port = cdm_record_values['destPort'] | |
sock_id = src + ':' + str(src_port) + ':' + dest + ':' + str(dest_port) | |
uuid_to_sockid[uuid] = sock_id | |
elif cdm_record_type == CDM_TYPE_MEM: | |
uuid = cdm_record_values['uuid'] | |
addr = cdm_record_values['memoryAddress'] | |
uuid_to_addr[uuid] = addr | |
elif cdm_record_type == CDM_TYPE_EVENT: | |
# print previous streamspot edge if it is ready | |
if not None in streamspot_edge.values(): | |
print str(streamspot_edge['source_id']) + '\t' +\ | |
str(streamspot_edge['source_name']) + '\t' +\ | |
str(streamspot_edge['source_type']) + '\t' +\ | |
str(streamspot_edge['dest_id']) + '\t' +\ | |
str(streamspot_edge['dest_name']) + '\t' +\ | |
str(streamspot_edge['dest_type']) + '\t' +\ | |
str(streamspot_edge['edge_type']) + '\t' +\ | |
str(streamspot_edge['graph_id']) | |
# clear old edge data | |
streamspot_edge = {'event_uuid': None, | |
'source_id': None, | |
'source_name': None, | |
'source_type': None, | |
'dest_id': None, | |
'dest_name': 'NA', | |
'dest_type': None, | |
'edge_type': None, | |
'graph_id': None | |
} | |
uuid = cdm_record_values['uuid'] | |
type = cdm_record_values['type'] | |
streamspot_edge['edge_type'] = type # streamspot edge type | |
streamspot_edge['event_uuid'] = uuid # to map metadata | |
elif cdm_record_type == CDM_TYPE_EDGE: | |
type = cdm_record_values['type'] | |
if type == 'EDGE_SUBJECT_HASLOCALPRINCIPAL': | |
pass | |
elif type == 'EDGE_FILE_AFFECTS_EVENT': | |
# HACK! FIXME | |
# Special case for | |
# - EVENT_UPDATE | |
# - EVENT_RENAME | |
if streamspot_edge['edge_type'] == 'EVENT_UPDATE' or \ | |
streamspot_edge['edge_type'] == 'EVENT_RENAME': | |
assert cdm_record_values['toUuid'] == \ | |
streamspot_edge['event_uuid'] | |
from_uuid = cdm_record_values['fromUuid'] | |
url = uuid_to_url[from_uuid] | |
streamspot_edge['dest_id'] = url | |
streamspot_edge['dest_type'] = 'FILE' | |
else: | |
assert cdm_record_values['fromUuid'] == \ | |
streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
url = uuid_to_url[to_uuid] | |
streamspot_edge['dest_id'] = url | |
streamspot_edge['dest_type'] = 'FILE' | |
elif type == 'EDGE_EVENT_AFFECTS_FILE': | |
assert cdm_record_values['fromUuid'] == streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
url = uuid_to_url[to_uuid] | |
streamspot_edge['dest_id'] = url | |
streamspot_edge['dest_type'] = 'FILE' | |
elif type == 'EDGE_MEMORY_AFFECTS_EVENT': | |
assert cdm_record_values['fromUuid'] == streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
addr = uuid_to_addr[to_uuid] | |
streamspot_edge['dest_id'] = addr | |
streamspot_edge['dest_type'] = 'MEM' | |
elif type == 'EDGE_EVENT_AFFECTS_MEMORY': | |
assert cdm_record_values['fromUuid'] == streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
addr = uuid_to_addr[to_uuid] | |
streamspot_edge['dest_id'] = addr | |
streamspot_edge['dest_type'] = 'MEM' | |
elif type == 'EDGE_EVENT_AFFECTS_NETFLOW': | |
assert cdm_record_values['fromUuid'] == streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
sock_id = uuid_to_sockid[to_uuid] | |
streamspot_edge['dest_id'] = sock_id | |
streamspot_edge['dest_type'] = 'SOCK' | |
elif type == 'EDGE_EVENT_AFFECTS_SUBJECT' or \ | |
type == 'EDGE_EVENT_ISGENERATEDBY_SUBJECT': | |
assert cdm_record_values['fromUuid'] == streamspot_edge['event_uuid'] | |
to_uuid = cdm_record_values['toUuid'] | |
pid = uuid_to_pid[to_uuid] | |
pname = uuid_to_pname[to_uuid] | |
if type == 'EDGE_EVENT_AFFECTS_SUBJECT': | |
streamspot_edge['dest_id'] = pid | |
streamspot_edge['dest_name'] = pname | |
streamspot_edge['dest_type'] = 'PROCESS' | |
elif type == 'EDGE_EVENT_ISGENERATEDBY_SUBJECT': | |
streamspot_edge['source_id'] = pid | |
streamspot_edge['source_name'] = pname | |
streamspot_edge['source_type'] = 'PROCESS' | |
# graph ID assignment to streamspot edge | |
if type == 'EDGE_EVENT_ISGENERATEDBY_SUBJECT': | |
streamspot_edge['graph_id'] = \ | |
pid_to_graph_id[streamspot_edge['source_id']] | |
# handle graph ID change on EXECUTE | |
if streamspot_edge['edge_type'] == 'EVENT_EXECUTE': | |
pid_to_graph_id[streamspot_edge['source_id']] = \ | |
current_graph_id # change graph ID of caller process | |
current_graph_id += 1 | |
else: | |
print 'Unknown edge type:', type | |
sys.exit(-1) | |
else: | |
print 'Unknown CDM record type', cdm_record_type | |
sys.exit(-1) | |
# last event in buffer | |
print str(streamspot_edge['source_id']) + '\t' +\ | |
str(streamspot_edge['source_name']) + '\t' +\ | |
str(streamspot_edge['source_type']) + '\t' +\ | |
str(streamspot_edge['dest_id']) + '\t' +\ | |
str(streamspot_edge['dest_name']) + '\t' +\ | |
str(streamspot_edge['dest_type']) + '\t' +\ | |
str(streamspot_edge['edge_type']) + '\t' +\ | |
str(streamspot_edge['graph_id']) |
#!/usr/bin/env python | |
""" | |
Visualises streamspot/infoleak_small_units.ss | |
python visualise_streamspot_graph.py | |
""" | |
from graph_tool.all import * | |
import sys | |
process_color = [179/256.,205/256.,227/256.,0.8] | |
file_color = [251/256.,180/256.,174/256.,0.8] | |
mem_color = [204/256.,235/256.,197/256.,0.8] | |
sock_color = [222/256.,203/256.,228/256.,0.8] | |
group_map = {'PROCESS': 0, | |
'FILE': 1, | |
'MEM': 2, | |
'SOCK': 3} | |
def create_new_graph(): | |
g = Graph() | |
gid = g.new_graph_property('int') | |
vid = g.new_vertex_property('string') | |
vtype = g.new_vertex_property('string') | |
vlabel = g.new_vertex_property('string') | |
vgroup = g.new_vertex_property('int32_t') | |
vcolor = g.new_vertex_property('vector<double>') | |
etype = g.new_edge_property('string') | |
elabel = g.new_edge_property('string') | |
ecolor = g.new_edge_property('vector<double>') | |
g.gp.id = gid | |
g.vp.id = vid | |
g.vp.type = vtype | |
g.vp.label = vlabel | |
g.vp.group = vgroup | |
g.vp.color = vcolor | |
g.ep.type = etype | |
g.ep.label = elabel | |
g.ep.color = ecolor | |
return g | |
lno = 0 | |
graphs = {} # from gid to graph | |
with open('streamspot/infoleak_small_units.ss', 'r') as f: | |
for line in f: | |
lno += 1 | |
#if lno > 20: | |
# break | |
line = line.strip() | |
fields = line.split('\t') | |
source_id = fields[0] | |
source_name = fields[1] | |
source_type = fields[2] | |
dest_id = fields[3] | |
dest_name = fields[4] | |
dest_type = fields[5] | |
edge_type = fields[6] | |
graph_id = int(fields[7]) | |
if edge_type == 'EVENT_UNIT': | |
continue | |
#if edge_type == 'EVENT_EXECUTE': | |
# continue # do not add this edge | |
if not graph_id in graphs: | |
graphs[graph_id] = create_new_graph() | |
g = graphs[graph_id] | |
# check if source vertex exists | |
matches = find_vertex(g, g.vp.id, source_id) | |
if len(matches) == 0: | |
u = g.add_vertex() | |
g.vp.id[u] = source_id | |
g.vp.type[u] = source_type | |
g.vp.label[u] = source_id + '\\' + source_name | |
g.vp.group[u] = group_map[source_type] | |
if source_type == 'PROCESS': | |
g.vp.color[u] = process_color | |
elif source_type == 'FILE': | |
g.vp.color[u] = file_color | |
elif source_type == 'MEM': | |
g.vp.color[u] = mem_color | |
elif source_type == 'SOCK': | |
g.vp.color[u] = sock_color | |
else: | |
print 'unknown type', source_type | |
else: | |
u = matches[0] | |
# check if destination vertex exists | |
matches = find_vertex(g, g.vp.id, dest_id) | |
if len(matches) == 0: | |
v = g.add_vertex() | |
g.vp.id[v] = dest_id | |
g.vp.type[v] = dest_type | |
g.vp.label[v] = dest_id + '\\' + dest_name | |
g.vp.group[v] = group_map[dest_type] | |
if dest_type == 'PROCESS': | |
g.vp.color[v] = process_color | |
elif dest_type == 'FILE': | |
g.vp.color[v] = file_color | |
g.vp.label[v] = dest_id.split('/')[-1] | |
elif dest_type == 'MEM': | |
g.vp.color[v] = mem_color | |
elif dest_type == 'SOCK': | |
g.vp.color[v] = sock_color | |
else: | |
print 'unknown type', source_type | |
else: | |
v = matches[0] | |
e = g.add_edge(u,v) | |
g.ep.type[e] = edge_type | |
g.ep.label[e] = str(lno) + ':' + edge_type.split('_')[1] | |
if edge_type in ['EVENT_EXECUTE', 'EVENT_CLONE']: | |
g.ep.color[e] = process_color | |
elif edge_type in ['EVENT_OPEN', 'EVENT_READ', 'EVENT_WRITE', | |
'EVENT_UPDATE']: | |
g.ep.color[e] = file_color | |
elif edge_type in ['EVENT_BIND', 'EVENT_ACCEPT', 'EVENT_CONNECT']: | |
g.ep.color[e] = sock_color | |
else: | |
g.ep.color[e] = [0.179, 0.203,0.210, 0.8] | |
for graph_id, g in graphs.iteritems(): | |
pos = sfdp_layout(g, groups=g.vp.group) | |
graph_draw(g, pos, output_size=(2000, 2000), | |
vertex_text=g.vp.label, edge_text=g.ep.label, | |
vertex_shape='circle', | |
#vertex_size=5.0, | |
vertex_color=g.vp.color, | |
vertex_fill_color=[1.0,1.0,1.0,0.8], | |
vertex_pen_width=5.0, | |
edge_pen_width=5.0, | |
edge_font_size=16, | |
edge_font_weight=1.0, | |
#nodesfirst=True, | |
output='infoleaks_small_units_gid' + str(graph_id) + '.pdf') |
#!/usr/bin/env bash | |
# Requires avro-tools-1.8.1.jar in the working directory (and a working JRE) | |
# Requires Avro files in subdirectory cdm/ in the working directory | |
# Writes JSON files to subdirectory json/ in the working directory | |
mkdir json; | |
for file in $(ls cdm); | |
do | |
java -jar avro-tools-1.8.1.jar tojson cdm/$file > json/${file}; | |
mv "json/${file}" "json/${file/%.avro/.json}" | |
done |