Day 2 of the Match 5 Kaggle Benchmarks in 5 Days challenge
In the Random Acts of Pizza competition on Kaggle, the goal is to predict whether people posting on Reddit's Random Acts of Pizza sub-Reddit will actually receive a free pizza based on their post. For this classification problem, the evaluation metric is AUC.
I recreated the all-zeros benchmark using a couple of unix commandline tools.
- Create the CSV header:
echo "request_id,requester_received_pizza" > zero-benchmark.csv
- Extract the request_id from each JSON entry in the test set and set the prediction to 0, i.e. no pizza. For this I used the powerful jq tool to process the JSON and extract the request_id's.
cat test.json | jq -r '.[].request_id+",0"' >> zero-benchmark.csv