What follows is a technical test for this job offer at CARTO: https://boards.greenhouse.io/cartodb/jobs/705852#.WSvORxOGPUI
Build the following and make it run as fast as you possibly can using Python 3 (vanilla). The faster it runs, the more you will impress us!
Your code should:
- Download this ~2GB file: https://s3.amazonaws.com/carto-1000x/data/yellow_tripdata_2016-01.csv
- Count the lines in the file
- Calculate the average value of the tip_amount field.
All of that in the most efficient way you can come up with.
That's it. Make it fly!
import time
import pandas as pd
filename = 'data.csv'
t0 = time.time()
n = sum(1 for line in open(filename))
print('Number of lines: ', n)
print('Elapsed time : ', time.time() - t0)
df = pd.read_csv(filename)
t0 = time.time()
print ('Average of tip_amount column: ', df['tip_amount'].sum() / n)
print('Elapsed time : ', time.time() - t0)