Skip to content

Instantly share code, notes, and snippets.

@arianagiorgi
Created December 21, 2017 21:47
Show Gist options
  • Save arianagiorgi/b2b9ef2a64fc330e27a84d79c6a6aed2 to your computer and use it in GitHub Desktop.
Save arianagiorgi/b2b9ef2a64fc330e27a84d79c6a6aed2 to your computer and use it in GitHub Desktop.
Uploading Large CSVs in Postgres

Uploading Large CSVs in Postgres

When importing a large data csv file into postgres, sometimes importing with csvkit alone will stall, and it's useful to use csvkit to build the table and the postgres COPY command to get the rest of the data in there.

You will need to pip install csvkit as well as psycopg2, preferably in a virtualenv.

  1. First we'll make the CREATE TABLE statement and pipe it into a sql file. This example uses the first 100 lines of the file to determine the data type of each field, but this can be adjusted as needed.
  • csvfile.csv = large data file
  • yourtable = table you want to create
  • outputfile.sql = file which will contain CREATE TABLE statement
$ head -n 100 csvfile.csv | csvsql --no-constraints --table yourtable > outputfile.sql

Since it's only reading the first 100 lines, it's likely that one of the fields will be miscast based on lack of data. You can open up outputfile.sql and directly change any type cast in the file.

  1. Navigate into the postgres server and database and run the file, which will create the table.
  • yourdb = an already existing database where you'd like to put your table
$ psql yourdb
yourdb=# \i outputfile.sql
  1. Insert the data from the csv into the existing table using the COPY command.
yourdb=# COPY yourtable FROM '/path/to/csvfile.csv' DELIMITER ',' CSV HEADER;
  1. If you get an error in regards to number values in quotations throughout the file, I've found the following command helpful to replace all instances of double quotations:

sed 's/\""//g' csvfile.csv > csvfile_clean.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment