For when you need to insert or update millions of rows and don't want to mess around with scripting to batch individual SQL commands, it's worth using postgres's built in, super-fast support for importing and exporting data from the CSV file format.
Note: that we're using the \COPY
command and not the COPY
command, since it allows us to work with files locally available on our machine and not worry about transferring files to the database server's file system.
Via the psql manual on \COPY
:
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
Open a psql
connection to your source database and run
\COPY (SELECT id, field_1, ... FROM ...) TO '~/Downloads/export.csv' (FORMAT csv, HEADER);
et voila, you'll have a CSV in your downloads folder with the columns you specified, filled with the rows you selected.
--> via stackoverflow.
Doing a simple import into a totally empty table is easy.
\COPY table_name FROM '/path/to/import.csv' (FORMAT csv);
Note that if you're adding a bunch of new rows to a table with an autoincrement primary key, you'll have to reset the sequence on that primary key column so the database doesn't generate duplicate IDs.
SELECT MAX(id) FROM tbl;
-[ RECORD 1 ]
max | {some_int}
ALTER SEQUENCE tbl_id_seq RESTART WITH {some_int + 1};
--> via stackoverflow
Note that if your source CSV and the table you're loading data into have different columns (or even different ordering of columns?), you may need to specify that column names explicitly when loading:
\COPY table_name (col1, col2, ...) FROM STDIN WITH (FORMAT CSV, HEADER)
Note that adding HEADERS
to this command does NOT automatically map the first row of the CSV (ie the column headers) to the correct database columns, it [basically just tells postgres to skip the first line of the CSV file}(https://stackoverflow.com/a/33271507/625840).
First, we'll create a temporary table with the schema that matches the CSV file
CREATE TEMP TABLE tmp_x (id int, field_1 text, ...);
Next, we'll load the CSV data into our new temporary table
\COPY tmp_x FROM '~/Downloads/import.csv' (FORMAT csv);
Then, we'll update our target table using the data in our temporary table
UPDATE actual_table
SET field_1 = tmp_x.field_1
FROM tmp_x
WHERE actual_table.id = tmp_x.id;
Finally, we'll drop the temporary table that we made
DROP TABLE tmp_x;
--> via stackoverflow.
Note that if your source CSV and the table you're loading data into have different columns (or even different ordering of columns?), you may need to specify that columns explicitly when loading:
Note that adding
HEADERS
to this command does NOT automatically map the first row of the CSV (ie the column headers) to the correct database columns, it basically just tells postgres to skip the first line of the CSV file.