A lot of us are interested in doing more analysis with our service logs so I thought I'd share an experiment I'm doing with Sync. The main idea is to transform the raw logs into something that'll be nice to query and generate reports with in Redshift.
Logs make their way into an S3 bucket (lets call it the 'raw' bucket) where we've got a lambda listening for new data. This lambda reads the raw heka protobuf gzipped data, does some transformation and writes a new file to a different S3 bucket (the 'processed' bucket) in a format that is redshift friendly (like json or csv). There's another lambda listening on the processed bucket that loads this data into Redshift.
