Last active
July 3, 2017 06:02
-
-
Save duttashi/970924402ab00eeca23358a1455c2e23 to your computer and use it in GitHub Desktop.
Connecting to spark on local cluster and other basic spark functions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Load sparlyr library in R environment | |
library(sparklyr) | |
# connecting to spark local cluster | |
sc <- spark_connect(master = "local", version="2.1.0") | |
# print the spark version | |
spark_version(sc) | |
# check data tables in spark local cluster | |
src_tbls(sc) # If no table copied in local cluster, then NULL or character(0) will be returned | |
# Copy data to spark local instance | |
flights_tbl <- copy_to(sc, nycflights13::flights, "flights", overwrite = TRUE) | |
# check data tables in spark local cluster | |
src_tbls(sc) # flights | |
# check amount of memory taken up by the flights_tbl tibble | |
object.size(flights_tbl) | |
# check colnames data table | |
colnames(flights) | |
# USING SQL | |
# It’s also possible to execute SQL queries directly against tables within a Spark cluster. The spark_connection object implements a DBI interface for Spark, so you can use dbGetQuery to execute SQL and return the result as an R data frame | |
library(DBI) | |
flights2013<- tbl(sc, sql("select flight, tailnum, origin, dest FROM flights where year=2013")) | |
# Writing data to a local csv file | |
write.csv(flightdetail.df) # will write to local storage | |
# stop the spark local cluster | |
spark_disconnect(sc) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment