Skip to content

Instantly share code, notes, and snippets.

@6d61726b760a
Last active June 12, 2020 06:41
Show Gist options
  • Save 6d61726b760a/e8354fa3805bfafcda3c7ebed85fea6a to your computer and use it in GitHub Desktop.
Save 6d61726b760a/e8354fa3805bfafcda3c7ebed85fea6a to your computer and use it in GitHub Desktop.
download and parse/query logs from cloudtrail

get_trails.py

download cloudtrail logs from an s3 bucket

usage: get_trails.py [-h] --profile PROFILE --bucket BUCKET --prefix PREFIX --account ACCOUNT --region REGION [--from FROM_S] [--to TO_S] [--target_dir TARGET_DIR]

download cloudtrail logs from s3

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     the aws named profile to use
  --bucket BUCKET       s3 bucket name
  --prefix PREFIX       s3 bucket name
  --account ACCOUNT     aws account
                        can be specified multiple times
  --region REGION       aws region
                        can be specified multiple times
  --from FROM_S         start date
                        default: 'one day ago'
  --to TO_S             end date
                        default: 'now'
  --target_dir TARGET_DIR
                        destination directory
                        default: ./trails/

example date strings:
    --from 'one day ago' --to 'now'
    --from 'two weeks ago' --to 'one week ago'
    --from '2019-10-05' --to '2019-10-19'
    --from 'today'

example:

➜ python get_trails.py \
  --profile my-profile-name \
  --bucket my-log-bucket \
  --prefix cloudtrail \
  --account 0----------1 \
  --region ap-southeast-2 
  --from today
2020-06-12 10:11:34,502 [INFO] Started
2020-06-12 10:11:34,503 [INFO] profile: my-profile-name
2020-06-12 10:11:34,504 [INFO] bucket: my-log-bucket
2020-06-12 10:11:34,504 [INFO] prefix: cloudtrail
2020-06-12 10:11:34,504 [INFO] account: ['0----------1']
2020-06-12 10:11:34,504 [INFO] region: ['ap-southeast-2']
2020-06-12 10:11:34,504 [INFO] from_s: today
2020-06-12 10:11:34,504 [INFO] to_s: now
2020-06-12 10:11:34,504 [INFO] target_dir: trails
2020-06-12 10:11:34,522 [INFO] Found credentials in shared credentials file: ~/.aws/credentials
2020-06-12 10:11:34,681 [INFO] parsed start date: 2020-06-12 10:11:34.679348+10:00
2020-06-12 10:11:34,681 [INFO] parsed end date: 2020-06-12 10:11:34.681210+10:00
2020-06-12 10:11:34,681 [INFO] found 1 prefixes for download
2020-06-12 10:11:35,810 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0000Z_Mrtk9CuacsNJFMDN.json.gz
2020-06-12 10:11:36,110 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0000Z_T97GdFxLajY9plMw.json.gz
2020-06-12 10:11:36,204 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0000Z_eVbWYjmzjZr8ciQi.json.gz
2020-06-12 10:11:36,396 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0005Z_gQjb9bzIk0FTv90I.json.gz
2020-06-12 10:11:36,529 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0010Z_baG8yxnKZIMsSHlb.json.gz
2020-06-12 10:11:36,650 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0010Z_c7LucC37E8gpybbu.json.gz
2020-06-12 10:11:36,758 [INFO] downloading my-log-bucket/cloudtrail/AWSLogs/0----------1/CloudTrail/ap-southeast-2/2020/06/12/0----------1_CloudTrail_ap-southeast-2_20200612T0015Z_hIxKnEGs7AlX0LZo.json.gz

parse_trails.py

query cloudtrail logs in target directory using jq syntax

usage: parse_trails.py [-h] [--query QUERY] [--target_dir TARGET_DIR] [--splunk] [--print]

query cloudtrail logs in target directory using jq syntax

optional arguments:
  -h, --help            show this help message and exit
  --query QUERY         jq query to run against found trails
  --target_dir TARGET_DIR
                        directory to look for trails
                        default: ./trails/
  --splunk              send result to splunk
  --print               print matching events to console

default action is to display a count of matching events per file

this utility expects cloudtrail logs to be gzipped (no need to unzip logs downloaded from s3)

if using --splunk the SPLUNK_TOKEN and SPLUNK_ENDPOINT envvar must be set. eg:
   export SPLUNK_TOKEN='XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
   export SPLUNK_ENDPOINT='https://hec-input.splunkcloud.com:443/services/collector/event

example: default

➜ python parse_trails.py --query '.Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")'
2020-06-12 15:22:26,837 [INFO] query: .Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")
2020-06-12 15:22:26,837 [INFO] target_dir: trails
2020-06-12 15:22:26,837 [INFO] splunk: False
2020-06-12 15:22:26,837 [INFO] print: False
2020-06-12 15:22:26,839 [INFO] found 208 files to query
2020-06-12 15:23:00,526 [INFO] found 1 matching events in 0----------1_CloudTrail_ap-southeast-2_20200612T0335Z_3bkWdyXX2VbIPmPV.json.gz

example: display matching events

➜ python parse_trails.py --query '.Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")' --print
2020-06-12 15:23:07,077 [INFO] query: .Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")
2020-06-12 15:23:07,077 [INFO] target_dir: trails
2020-06-12 15:23:07,077 [INFO] splunk: False
2020-06-12 15:23:07,077 [INFO] print: True
2020-06-12 15:23:07,079 [INFO] found 208 files to query
2020-06-12 15:23:42,865 [INFO] found 1 matching events in 0----------1_CloudTrail_ap-southeast-2_20200612T0335Z_3bkWdyXX2VbIPmPV.json.gz
{"eventVersion": "1.07", "userIdentity": {"type": "AssumedRole", "principalId": "MYACCESSKEYID-----02:AutoScaling-UpdateDesiredCapacity", "arn": "arn:aws:sts::0----------1:assumed-role/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable/AutoScaling-UpdateDesiredCapacity", "accountId": "0----------1", "accessKeyId": "MYACCESSKEYID-----01", "sessionContext": {"sessionIssuer": {"type": "Role", "principalId": "MYACCESSKEYID-----02", "arn": "arn:aws:iam::0----------1:role/aws-service-role/dynamodb.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_DynamoDBTable", "accountId": "0----------1", "userName": "AWSServiceRoleForApplicationAutoScaling_DynamoDBTable"}, "attributes": {"creationDate": "2020-06-12T03:23:43Z", "mfaAuthenticated": "false"}}, "invokedBy": "dynamodb.application-autoscaling.amazonaws.com"}, "eventTime": "2020-06-12T03:32:22Z", "eventSource": "dynamodb.amazonaws.com", "eventName": "UpdateTable", "awsRegion": "ap-southeast-2", "sourceIPAddress": "dynamodb.application-autoscaling.amazonaws.com", "userAgent": "dynamodb.application-autoscaling.amazonaws.com", "errorCode": "LimitExceededException", "errorMessage": "Subscriber limit exceeded: Provisioned throughput decreases are limited within a given UTC day. After the first 4 decreases, each subsequent decrease in the same UTC day can be performed at most once every 3600 seconds. Number of decreases today: 6. Last decrease at Friday, June 12, 2020 at 3:23:45 AM Coordinated Universal Time. Next decrease can be made at Friday, June 12, 2020 at 4:23:45 AM Coordinated Universal Time", "requestParameters": {"tableName": "listings-out-of-order-dynamodb-prod", "globalSecondaryIndexUpdates": [{"update": {"indexName": "gsi-listings-out-of-order-sourceid", "provisionedThroughput": {"readCapacityUnits": 1, "writeCapacityUnits": 5}}}]}, "responseElements": null, "requestID": "1--------------------------------------------------G", "eventID": "363f05d5-XXXX-XXXX-XXXX-bb6d82be5d1f", "readOnly": false, "resources": [{"accountId": "0----------1", "type": "AWS::DynamoDB::Table", "ARN": "arn:aws:dynamodb:ap-southeast-2:0----------1:table/listings-out-of-order-dynamodb-prod"}], "eventType": "AwsApiCall", "apiVersion": "2012-08-10", "managementEvent": true, "recipientAccountId": "0----------1", "eventCategory": "Management"}

example: push results to splunk

➜ python parse_trails.py --query '.Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")' --splunk
2020-06-12 15:30:08,263 [INFO] query: .Records[] |select(.errorCode != null) | select(.userIdentity.accessKeyId=="MYACCESSKEYID-----01")
2020-06-12 15:30:08,263 [INFO] target_dir: trails
2020-06-12 15:30:08,263 [INFO] splunk: True
2020-06-12 15:30:08,263 [INFO] print: False
2020-06-12 15:30:08,266 [INFO] found 208 files to query
2020-06-12 15:30:42,249 [INFO] found 1 matching events in 0----------1_CloudTrail_ap-southeast-2_20200612T0335Z_3bkWdyXX2VbIPmPV.json.gz
2020-06-12 15:30:42,454 [INFO] splunk: 363f05d5-XXXX-XXXX-XXXX-bb6d82be5d1f [HTTP 200]
import argparse
import boto3
import dateparser
import datetime
import logging
import os
import pytz
import sys
def parse_time_string(time_string):
"""thanks: https://github.com/flosell/trailscraper/blob/master/trailscraper/s3_download.py"""
"""Parse human readable strings (e.g. "now", "2017-01-01" and "one hour ago") into datetime"""
return dateparser.parse(time_string, settings={'RETURN_AS_TIMEZONE_AWARE': True})
def _s3_key_prefix(prefix, date, account_id, region):
"""thanks: https://github.com/flosell/trailscraper/blob/master/trailscraper/s3_download.py"""
return f"{prefix}/AWSLogs/{account_id}/CloudTrail/{region}/{date.year}/{date.month:02d}/{date.day:02d}"
def _s3_key_prefixes(prefix, account_ids, regions, from_date, to_date):
"""thanks: https://github.com/flosell/trailscraper/blob/master/trailscraper/s3_download.py"""
delta = to_date.astimezone(pytz.utc) - from_date.astimezone(pytz.utc)
days = [to_date - datetime.timedelta(days=delta_days)
for delta_days in range(delta.days + 1)]
return [_s3_key_prefix(prefix, day, account_id, region)
for account_id in account_ids
for day in days
for region in regions]
def _s3_download_recursive(client, bucket, prefix, target_dir):
"""thanks: https://github.com/flosell/trailscraper/blob/master/trailscraper/s3_download.py"""
client = client
def _download_file(object_info):
key = object_info.get('Key')
target = target_dir + os.sep + key
if not os.path.exists(os.path.dirname(target)):
os.makedirs(os.path.dirname(target))
if not os.path.exists(target):
logging.info(f"downloading {bucket}/{key}")
client.download_file(bucket, key, target)
else:
logging.info(f"skipping, already exists: {bucket}/{key}")
def _download_dir(dist):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Prefix=dist):
if result.get('CommonPrefixes') is not None:
for subdir in result.get('CommonPrefixes'):
_download_dir(subdir.get('Prefix'))
if result.get('Contents') is not None:
for content in result.get('Contents'):
_download_file(content)
_download_dir(prefix)
def main():
scriptname = sys.argv[0]
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler(f"{sys.argv[0]}.log"),
logging.StreamHandler()
]
)
parser = argparse.ArgumentParser(
description='download cloudtrail logs from s3',
epilog="example date strings:\n"+
" --from 'one day ago' --to 'now'\n"+
" --from 'two weeks ago' --to 'one week ago'\n"+
" --from '2019-10-05' --to '2019-10-19'\n"+
" --from 'today'",
formatter_class=argparse.RawTextHelpFormatter
)
parser.add_argument("--profile",
help="the aws named profile to use",
action="store",
default="default",
required=True
)
parser.add_argument("--bucket",
help="s3 bucket name",
action="store",
required=True
)
parser.add_argument("--prefix",
help="s3 bucket name",
action="store",
default=None,
required=True
)
parser.add_argument("--account",
help="aws account\ncan be specified multiple times",
action="append",
default=None,
required=True
)
parser.add_argument("--region",
help="aws region\ncan be specified multiple times",
action="append",
default=None,
required=True
)
parser.add_argument("--from",
dest="from_s",
help="start date\ndefault: 'one day ago'",
action="store",
default="one day ago"
)
parser.add_argument("--to",
dest="to_s",
help="end date\ndefault: 'now'",
action="store",
default="now"
)
parser.add_argument("--target_dir",
help="destination directory\ndefault: ./trails/",
action="store",
default="trails"
)
args = parser.parse_args()
input_args = vars(args)
[logging.info(f"{arg}: {input_args[arg]}") for arg in input_args]
session = boto3.session.Session(profile_name=args.profile)
s3 = session.client("s3")
from_date = parse_time_string(args.from_s)
to_date = parse_time_string(args.to_s)
logging.info(f"parsed start date: {from_date}")
logging.info(f"parsed end date: {to_date}")
prefixes = _s3_key_prefixes(
prefix=args.prefix,
account_ids=args.account,
regions=args.region,
from_date=from_date,
to_date=to_date)
logging.info(f"found {len(prefixes)} prefixes for download")
[logging.debug(f"prefix: {prefix}") for prefix in prefixes]
for prefix in prefixes:
_s3_download_recursive(client=s3, bucket=args.bucket,
prefix=prefix, target_dir=args.target_dir)
if __name__ == "__main__":
main()
import argparse
import glob
import gzip
import json
import logging
import os
import pyjq
import requests
import sys
def splunk(payload,identifier):
""" send payload to splunk it to splunk hec """
splunk_token = (os.getenv('SPLUNK_TOKEN'))
if not splunk_token:
raise ValueError(f'unable to find SPLUNK_TOKEN envvar')
splunk_endpoint = (os.getenv('SPLUNK_ENDPOINT'))
if not splunk_token:
raise ValueError(f'unable to find SPLUNK_ENDPOINT envvar')
splunkAuth = {'Authorization': 'Splunk ' + splunk_token}
r = requests.post(splunk_endpoint, headers=splunkAuth, json=payload)
logging.info(f"splunk: {identifier} [HTTP {r.status_code}]")
def main():
scriptname = sys.argv[0]
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
# logging.FileHandler(f"{sys.argv[0]}.log"),
logging.StreamHandler()
]
)
parser = argparse.ArgumentParser(
description='query cloudtrail logs in target directory using jq syntax',
epilog="default action is to display a count of matching events per file\n\n"+
"this utility expects cloudtrail logs to be gzipped (no need to unzip logs downloaded from s3)\n\n"+
"if using --splunk the SPLUNK_TOKEN and SPLUNK_ENDPOINT envvar must be set. eg:\n"+
" export SPLUNK_TOKEN='XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX\n"
" export SPLUNK_ENDPOINT='https://hec-input.splunkcloud.com:443/services/collector/event\n\n",
formatter_class=argparse.RawTextHelpFormatter
)
parser.add_argument("--query",
help="jq query to run against found trails",
action="store",
default="default"
)
parser.add_argument("--target_dir",
help="directory to look for trails\ndefault: ./trails/",
action="store",
default="trails"
)
parser.add_argument("--splunk",
help = "send result to splunk",
action = "store_true",
default = False
)
parser.add_argument("--print",
help="print matching events to console",
action="store_true",
default=False
)
args = parser.parse_args()
input_args = vars(args)
[logging.info(f"{arg}: {input_args[arg]}") for arg in input_args]
files = glob.glob(args.target_dir + '/**/*.gz', recursive=True)
logging.info(f"found {len(files)} files to query")
for file in files:
filename = os.path.basename(file)
with gzip.open(file, 'rb') as f:
data = json.load(f)
result = pyjq.all(args.query, data)
if len(result) >0:
logging.info(f"found {len(result)} matching events in {filename}")
for event in result:
if args.splunk:
payload = {
"index": "markv_testing",
"sourcetype": "cloudtrail:event",
"event": event
}
splunk(payload,event['eventID'])
if args.print:
print(f"{json.dumps(event)}\n")
if __name__ == "__main__":
main()
boto3==1.14.0
dateparser==0.7.5
pyjq==2.4.0requests==2.23.0
requests==2.23.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment