Skip to content

Instantly share code, notes, and snippets.

@javierluraschi
Last active February 2, 2020 17:18
Show Gist options
  • Save javierluraschi/15193fcc8c9e1aeca32c8935c8cfb4d0 to your computer and use it in GitHub Desktop.
Save javierluraschi/15193fcc8c9e1aeca32c8935c8cfb4d0 to your computer and use it in GitHub Desktop.
Using sparklyr with Kubernetes and Google Cloud

Adapted for sparklyr from cloud.google.com/solutions/spark-on-kubernetes-engine:

First, configure the cluster using Google Cloud client and kubectl:

gcloud config set compute/zone us-central1-f
gcloud container clusters create spark-on-gke --machine-type n1-standard-2

Next we need to bind the cluster admin to your email (you can retrieve your account email with gcloud config get-value account if needed):

kubectl create clusterrolebinding user-admin-binding --clusterrole=cluster-admin [email protected]
kubectl create clusterrolebinding --clusterrole=cluster-admin --serviceaccount=default:default spark-admin

Retrieve the remote cluster information.

gcloud container clusters list --filter name=spark-on-gke

Then from R connect as follows. Make sure <k8s-ip> gets replaced with the MASTER_IP printed in the previous step.

remotes::install_github("rstudio/sparklyr")
library(sparklyr)

sc <- spark_connect(config = spark_config_kubernetes(
    "k8s://https://<k8s-ip>",
    account = "default",
    image = "docker.io/jluraschi/spark:sparklyr",
    version = "2.4"))
@Pofferbacco
Copy link

Hello to all!
I'm working on a project where we have to explain the use of SparkR, so we created a kubernets cluster using gcloud, and following step-by-step the guide posted here above (very helpfull!).

When we try to connect from RStudio to kubernets cluster we get the following error "Error in f(...): Terminal is not running and cannot accept input".

Could someone help us? We've tried pretty much everything, so we're kind of in the corner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment