Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.
Backend | Package |
---|---|
data.frame | builtin |
data.table | builtin |
arrays | builtin |
SQLite | builtin |
PostgreSQL/Redshift | builtin |
MySQL/MariaDB | builtin |
Bigquery | bigrquery |
MonetDB | MonetDB.R |
Presto | RPresto |
Spark | dplyr.spark.hive |
Hive | dplyr.spark.hive |
Impala | dplyrimpaladb |
Vertica | vertica.dplyr |
Teradata | teradata.dplyr |
Calcite | dplyr-calcite |
There are some packages that let you refer and manipulate data directly in Teradata .
Try :: https://github.com/hoxo-m/dplyr.teradata. Its still beta , I guess.
It is dplyR wrapper for Teradata and allows lazy execution.
But , I didn't find as robust as dplyr source support for inbulit databases , and Teradata is not one of them.