Last active
August 21, 2024 09:48
-
-
Save dr5hn/55fe4b1ab93247de605ec42d44ae131f to your computer and use it in GitHub Desktop.
Delete column(s) from very large CSV file using pandas [How to delete columns in a CSV file?]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Source: https://stackoverflow.com/questions/38149288/delete-columns-from-very-large-csv-file-using-pandas-or-blaze | |
# pip3 install pandas | |
import pandas as pd | |
cols_to_keep = [ | |
'email', | |
'phonenumber', | |
'name' | |
] # columns you want to have in the resulting CSV file | |
chunksize = 10**5 # you may want to adjust it ... | |
for chunk in pd.read_csv('user.csv', chunksize=chunksize, usecols=cols_to_keep): | |
chunk.to_csv('user_n.csv', mode='a', index=False) | |
# For 500 MB file it takes 25secs |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks! It works fast as well