Skip to content

Instantly share code, notes, and snippets.

@saliceti
Created February 10, 2016 12:54
Show Gist options
  • Save saliceti/2748cdfcd163c6f154ef to your computer and use it in GitHub Desktop.
Save saliceti/2748cdfcd163c6f154ef to your computer and use it in GitHub Desktop.
Flow log data workflow
#!/bin/bash
dirs=$@
base_dir=$(pwd)
rm -rf merged
mkdir merged
for dir in ${dirs}; do
echo Processing ${dir}...
file_list=$(ls ${dir}/*.log)
for job_log in ${file_list}; do
cat ${job_log} >> merged/$(basename ${job_log})
done
done
rm -rf sorted
mkdir sorted
echo Sorting...
file_list=$(ls merged/*.log)
for job_log in ${file_list}; do
cat ${job_log} | grep PROTO= | awk '{print $11,$18,$20}' \
| sort | uniq > sorted/$(basename ${job_log})
done
echo job,dest_ip,dest_port,protocol > flow-logs.csv
echo Formatting...
file_list=$(ls sorted/*.log)
for job_log in ${file_list}; do
job=$(basename ${job_log} | sed 's/.log$//')
while read line; do
if [[ "${line}" =~ DST=([[:digit:]\.]+)\ PROTO=([^ ]+)\ DPT=([[:digit:]]+) ]]; then
dest_ip=${BASH_REMATCH[1]}
protocol=${BASH_REMATCH[2]}
dest_port=${BASH_REMATCH[3]}
echo ${job},${dest_ip},${dest_port},${protocol} >> flow-logs.csv
fi
done < ${job_log}
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment