Created
May 17, 2019 10:50
-
-
Save verajosemanuel/2fa94b014e7a49ab5a085ed32f84b5b5 to your computer and use it in GitHub Desktop.
#append data to a #PostgreSQL #table with `dplyr` without `collect()` in #R
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dplyr does not include commands to insert or update records in a database, so there is not a complete native dplyr solution for this. But you could combine dplyr with regular SQL statements to avoid bringing the data to R. | |
Let's start by reproducing your steps before the collect() statement | |
library(dplyr) | |
pg <- src_postgres() | |
reg_data <- tbl(pg, "reg_data") | |
reg_results <- | |
reg_data %>% | |
summarize(r_squared=regr_r2(y, x), | |
num_obs=regr_count(y, x), | |
constant=regr_intercept(y, x), | |
slope=regr_slope(y, x), | |
mean_analyst_fog=regr_avgx(y, x), | |
mean_manager_fog=regr_avgy(y, x)) | |
Now, you could use compute() instead of collect() to create a temporary table in the database. | |
temp.table.name <- paste0(sample(letters, 10, replace = TRUE), collapse = "") | |
reg_results <- reg_results %>% compute(name=random.table.name) | |
Where temp.table.name is a random table name. Using the option name= temp.table.name in compute we assign this random name to the temporary table created. | |
Now, we will use the library RPostgreSQL to create an insert query that uses the results stored in the temporary table. As the temporary table only lives in the connection created by src_postgresql() we need to reuse it. | |
library(RPostgreSQL) | |
copyconn <- pg$con | |
class(copyconn) <- "PostgreSQLConnection" # I get an error if I don't fix the class | |
Finally the insert query | |
sql <- paste0("INSERT INTO destination_table SELECT * FROM ", temp.tbl.name,";") | |
dbSendQuery(copyconn, sql) | |
So, everything is happening in the database and the data is not brought into R. | |
EDIT | |
Previous versions of this post did break encapsulation when we obtained temp.tbl.name from reg_results. This is avoided using the option name=in compute. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment