Adapted for sparklyr
from cloud.google.com/solutions/spark-on-kubernetes-engine:
First, configure the cluster using Google Cloud client and kubectl
:
gcloud config set compute/zone us-central1-f
gcloud container clusters create spark-on-gke --machine-type n1-standard-2
Next we need to bind the cluster admin to your email (you can retrieve your account email with gcloud config get-value account
if needed):
kubectl create clusterrolebinding user-admin-binding --clusterrole=cluster-admin [email protected]
kubectl create clusterrolebinding --clusterrole=cluster-admin --serviceaccount=default:default spark-admin
Retrieve the remote cluster information.
gcloud container clusters list --filter name=spark-on-gke
Then from R connect as follows. Make sure <k8s-ip>
gets replaced with the MASTER_IP
printed in the previous step.
remotes::install_github("rstudio/sparklyr")
library(sparklyr)
sc <- spark_connect(config = spark_config_kubernetes(
"k8s://https://<k8s-ip>",
account = "default",
image = "docker.io/jluraschi/spark:sparklyr",
version = "2.4"))
Hello to all!
I'm working on a project where we have to explain the use of SparkR, so we created a kubernets cluster using gcloud, and following step-by-step the guide posted here above (very helpfull!).
When we try to connect from RStudio to kubernets cluster we get the following error "Error in f(...): Terminal is not running and cannot accept input".
Could someone help us? We've tried pretty much everything, so we're kind of in the corner