Uploading Large CSVs in Postgres

When importing a large data csv file into postgres, sometimes importing with csvkit alone will stall, and it's useful to use csvkit to build the table and the postgres COPY command to get the rest of the data in there.

You will need to pip install csvkit as well as psycopg2, preferably in a virtualenv.

First we'll make the CREATE TABLE statement and pipe it into a sql file. This example uses the first 100 lines of the file to determine the data type of each field, but this can be adjusted as needed.

csvfile.csv = large data file
yourtable = table you want to create
outputfile.sql = file which will contain CREATE TABLE statement

$ head -n 100 csvfile.csv | csvsql --no-constraints --table yourtable > outputfile.sql

Since it's only reading the first 100 lines, it's likely that one of the fields will be miscast based on lack of data. You can open up outputfile.sql and directly change any type cast in the file.

Navigate into the postgres server and database and run the file, which will create the table.

yourdb = an already existing database where you'd like to put your table

$ psql yourdb
yourdb=# \i outputfile.sql

Insert the data from the csv into the existing table using the COPY command.

yourdb=# COPY yourtable FROM '/path/to/csvfile.csv' DELIMITER ',' CSV HEADER;

If you get an error in regards to number values in quotations throughout the file, I've found the following command helpful to replace all instances of double quotations:

sed 's/\""//g' csvfile.csv > csvfile_clean.csv

arianagiorgi/README.md

Uploading Large CSVs in Postgres