Recently an opportunity at work arose to run some ad hoc data analysis/discovery on our event data. The required analysis was a bit more complicated than what was provided by our current reporting capabilities. This complication was primarily due to both the quality and the layout/structure of the existing data. We wanted to understand how much of the revenue was coming from an internal (machine learning supported) recommendation service, for given products over various time boundaries. From these sales, we wanted to further understand which recommendation algorithm was used to drive the purchase. Measuring the performance of these algorithms would allow us to fine tune them in the future and have the ability to ascertain whether or not those changes made a positive or negative impact on the overall revenue.
It was determined early on that trying to do this in plain SQL wasn't a great fit, and having this data stored in Amazon Redshift constrained the approach to just SQL. Luckily, I had been pulling the