When importing a large data csv file into postgres, sometimes importing with csvkit alone will stall, and it's useful to use csvkit to build the table and the postgres COPY command to get the rest of the data in there.
You will need to pip install csvkit as well as psycopg2, preferably in a virtualenv.
- First we'll make the
CREATE TABLEstatement and pipe it into a sql file. This example uses the first 100 lines of the file to determine the data type of each field, but this can be adjusted as needed.
csvfile.csv= large data fileyourtable= table you want to createoutputfile.sql= file which will containCREATE TABLEstatement
$ head -n 100 csvfile.csv | csvsql --no-constraints --table yourtable > outputfile.sql
Since it's only reading the first 100 lines, it's likely that one of the fields will be miscast based on lack of data. You can open up outputfile.sql and directly change any type cast in the file.
- Navigate into the postgres server and database and run the file, which will create the table.
yourdb= an already existing database where you'd like to put your table
$ psql yourdb
yourdb=# \i outputfile.sql
- Insert the data from the csv into the existing table using the
COPYcommand.
yourdb=# COPY yourtable FROM '/path/to/csvfile.csv' DELIMITER ',' CSV HEADER;
- If you get an error in regards to number values in quotations throughout the file, I've found the following command helpful to replace all instances of double quotations:
sed 's/\""//g' csvfile.csv > csvfile_clean.csv