Last active
April 11, 2023 16:23
-
-
Save gjreda/7433f5f70299610d9b6b to your computer and use it in GitHub Desktop.
pandas' read_csv parse_dates vs explicit date conversion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# When you're sure of the format, it's much quicker to explicitly convert your dates than use `parse_dates` | |
# Makes sense; was just surprised by the time difference. | |
import pandas as pd | |
from datetime import datetime | |
to_datetime = lambda d: datetime.strptime(d, '%m/%d/%Y %H:%M') | |
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime']) | |
# CPU times: user 1min 29s, sys: 331 ms, total: 1min 29s | |
# Wall time: 1min 30s | |
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', converters={'starttime': to_datetime, 'stoptime': to_datetime}) | |
# CPU times: user 17.6 s, sys: 269 ms, total: 17.9 s | |
# Wall time: 17.9 s | |
# $ wc -l divvy/Divvy_Trips_2013.csv | |
# 759789 divvy/Divvy_Trips_2013.csv |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Nanoseconds (as an offset from the Unix epoch I think).