Skip to content

Instantly share code, notes, and snippets.

@vascoosx
Created November 13, 2017 10:30
Show Gist options
  • Save vascoosx/9a02a646b772ec6d8fecc965457f7ef8 to your computer and use it in GitHub Desktop.
Save vascoosx/9a02a646b772ec6d8fecc965457f7ef8 to your computer and use it in GitHub Desktop.
sparkR setup
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"')
library(SparkR, lib.loc = "/home/user1/programs/spark-2.2.0-bin-hadoop2.7/R/lib/")
sc <- sparkR.session(master = "local", appName="myapp", sparkHome = "/home/ubuntu/programs/spark-2.2.0-bin-hadoop2.7",
sparkConfig=list(spark.executor.memory="6g", spark.driver.memory="12g"))
df <- collect(read.parquet("big/parquet/file1"))
sparkR.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment