Day 2 of the Beat 5 Kaggle Becnhmarks in 5 Days Challenge.
The Random Acts of Pizza competition is about predicting when a request for a free pizza on the Random Acts of Pizza sub-reddit is granted. The benchmark is simply guessing that no pizzas are given (or all). This results in an AUC score of 50.
To beat the AUC = 50 benchmark with a simple model, I first looked at the training and test data to find simple features. I decided to use the word counts of the request title and comment text, as longer comments might be skipped by readers.
To build the model I first extracted only the desired fields from the original JSON files with jq and used json2csv to write out CSV.