When importing a large data csv file into postgres, sometimes importing with csvkit alone will stall, and it's useful to use csvkit to build the table and the postgres COPY
command to get the rest of the data in there.
You will need to pip install csvkit
as well as psycopg2
, preferably in a virtualenv.
- First we'll make the
CREATE TABLE
statement and pipe it into a sql file. This example uses the first 100 lines of the file to determine the data type of each field, but this can be adjusted as needed.
csvfile.csv
= large data fileyourtable
= table you want to createoutputfile.sql
= file which will containCREATE TABLE
statement
$ head -n 100 csvfile.csv | csvsql --no-constraints --table yourtable > outputfile.sql
Since it's only reading the first 100 lines, it's likely that one of the fields will be miscast based on lack of data. You can open up outputfile.sql
and directly change any type cast in the file.
- Navigate into the postgres server and database and run the file, which will create the table.
yourdb
= an already existing database where you'd like to put your table
$ psql yourdb
yourdb=# \i outputfile.sql
- Insert the data from the csv into the existing table using the
COPY
command.
yourdb=# COPY yourtable FROM '/path/to/csvfile.csv' DELIMITER ',' CSV HEADER;
- If you get an error in regards to number values in quotations throughout the file, I've found the following command helpful to replace all instances of double quotations:
sed 's/\""//g' csvfile.csv > csvfile_clean.csv