Skip to content

Instantly share code, notes, and snippets.

@dr5hn
Last active August 21, 2024 09:48
Show Gist options
  • Save dr5hn/55fe4b1ab93247de605ec42d44ae131f to your computer and use it in GitHub Desktop.
Save dr5hn/55fe4b1ab93247de605ec42d44ae131f to your computer and use it in GitHub Desktop.
Delete column(s) from very large CSV file using pandas [How to delete columns in a CSV file?]
# Source: https://stackoverflow.com/questions/38149288/delete-columns-from-very-large-csv-file-using-pandas-or-blaze
# pip3 install pandas
import pandas as pd
cols_to_keep = [
'email',
'phonenumber',
'name'
] # columns you want to have in the resulting CSV file
chunksize = 10**5 # you may want to adjust it ...
for chunk in pd.read_csv('user.csv', chunksize=chunksize, usecols=cols_to_keep):
chunk.to_csv('user_n.csv', mode='a', index=False)
# For 500 MB file it takes 25secs
@albertllonch
Copy link

Thanks! It works fast as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment