Created
June 23, 2016 17:52
-
-
Save bborgesr/c07211d00c52b305c3fcf549eb3ec766 to your computer and use it in GitHub Desktop.
Comparing the time that a query takes to run using three different models of database connection: one connection per program/app; one connection per query; using a connection pool.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Benchmarking database queries" | |
author: "Barbara Borges Ribeiro" | |
date: "June 23, 2016" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
library(DBI) | |
library(pool) | |
``` | |
### One connection per app | |
```{r app} | |
app_times <- numeric(0) | |
conn <- DBI::dbConnect( | |
drv = RMySQL::MySQL(), | |
dbname = "shinydemo", | |
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com", | |
username = "guest", | |
password = "guest" | |
) | |
for (id in 1:30) { | |
time <- system.time({ | |
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";") | |
dbGetQuery(conn, sql) | |
}) | |
app_times <- c(app_times, time["elapsed"]) | |
} | |
invisible(DBI::dbDisconnect(conn)) | |
``` | |
The average of these 30 measurements is `r mean(app_times)` seconds. | |
### One connection per query | |
```{r query} | |
query_times <- numeric(0) | |
args <- list( | |
drv = RMySQL::MySQL(), | |
dbname = "shinydemo", | |
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com", | |
username = "guest", | |
password = "guest" | |
) | |
for (id in 1:30) { | |
time <- system.time({ | |
conn <- do.call(DBI::dbConnect, args) | |
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";") | |
dbGetQuery(conn, sql) | |
DBI::dbDisconnect(conn) | |
}) | |
query_times <- c(query_times, time["elapsed"]) | |
} | |
``` | |
The average of these 30 measurements is `r mean(query_times)` seconds. | |
### Using pool | |
```{r pool} | |
pool_times <- numeric(0) | |
pool <- dbPool( | |
drv = RMySQL::MySQL(), | |
dbname = "shinydemo", | |
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com", | |
username = "guest", | |
password = "guest" | |
) | |
for (id in 1:30) { | |
time <- system.time({ | |
sql <- paste0("SELECT * FROM City WHERE ID = ", id, ";") | |
dbGetQuery(pool, sql) | |
}) | |
pool_times <- c(pool_times, time["elapsed"]) | |
} | |
poolClose(pool) | |
``` | |
The average of these 30 measurements is `r mean(pool_times)` seconds. | |
## Overall comparison | |
```{r times} | |
(times <- data.frame(app_times, query_times, pool_times)) | |
``` | |
As you can see, the app-level measurements are the fastest. This is no suprise. The pool measurements take a little over double that time, since you still do have to fetch and return the connection to the pool (in addition to calculating the query itself). Finally, the query-level measurements are the longest. Again, this makes sense because for each query, you're actually fetching a connection from the remote database. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment