Skip to content

Instantly share code, notes, and snippets.

@Btibert3
Created October 11, 2018 22:18
Show Gist options
  • Select an option

  • Save Btibert3/88cc74ab01a249d24cecda705b88443a to your computer and use it in GitHub Desktop.

Select an option

Save Btibert3/88cc74ab01a249d24cecda705b88443a to your computer and use it in GitHub Desktop.
Install spark for datascience on python and r

Python

Create your environment using Anaconda miniconda for python environments

conda create -n spark python=3

Activate the environment

source activate spark

Install ipython

conda install ipython

Now install pyspark

pip install pyspark

Fire up an ipython terminal

ipython

Show that you can import the package with import pyspark.

To prove it works, type pyspark. and hit tab, this will show the methods

R

http://spark.rstudio.com/

Download the package using devtools and load it

install.packages("sparklyr")
library(sparklyr)

the sparklyr package has utilities to manage the install for you

spark_install(version = "2.1.0")

and then ensure we have the connection loaded

sc <- spark_connect(master = "local")

and class(sc) should render:

[1] "spark_connection"       "spark_shell_connection" "DBIConnection"  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment