Dplyr backends: the ultimate collection

Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.

Backend	Package
data.frame	builtin
data.table	builtin
arrays	builtin
SQLite	builtin
PostgreSQL/Redshift	builtin
MySQL/MariaDB	builtin
Bigquery	bigrquery
MonetDB	MonetDB.R
Presto	RPresto
Spark	dplyr.spark.hive
Hive	dplyr.spark.hive
Impala	dplyrimpaladb
Vertica	vertica.dplyr
Teradata	teradata.dplyr
Calcite	dplyr-calcite

piccolbo/dplyr-backends.md

himanshusin commented May 5, 2017

Uh oh!