# Add the Kubeflow Helm chart repo
helm repo add kubeflow https://charts.kubeflow.org
helm repo update# Install Spark Operator in the specified namespace (kubeflow)
helm install spark-operator kubeflow/spark-operator --namespace kubeflow --set operator.enabled=trueThe Spark Operator needs ClusterRole and ClusterRoleBinding to manage Spark jobs across multiple namespaces.
Create the RBAC resources:
# rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-operator-role
rules:
- verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
apiGroups: [""]
resources: ["pods", "pods/logs", "services", "secrets", "configmaps"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-operator-role-binding
subjects:
- kind: ServiceAccount
name: spark-operator-service-account
namespace: kubeflow
roleRef:
kind: ClusterRole
name: spark-operator-role
apiGroup: rbac.authorization.k8s.ioApply the RBAC configuration:
kubectl apply -f rbac.yamlThe Spark Operator will use a ServiceAccount to interact with resources across namespaces.
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark-operator-service-account
namespace: kubeflow # Adjust namespace if necessaryApply the ServiceAccount configuration:
kubectl apply -f serviceaccount.yamlIf you want to manage resource usage per namespace, apply resource quotas and limit ranges for Spark jobs.
Example resource quotas:
# resource-quotas.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: spark-job-quota
namespace: kubeflow
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16GiApply the resource quotas:
kubectl apply -f resource-quotas.yamlYou can define default resource requests and limits for Spark jobs to ensure they do not exceed certain thresholds.
Example limit range:
# limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: spark-job-limits
namespace: kubeflow
spec:
limits:
- default:
memory: 4Gi
cpu: "2"
defaultRequest:
memory: 2Gi
cpu: "1"
type: ContainerApply the limit range:
kubectl apply -f limit-range.yamlYou can use Kustomize to manage the configurations for different environments (e.g., staging, production) or user-specific namespaces.
spark-operator/
base/
kustomization.yaml
spark-operator.yaml
rbac.yaml
serviceaccount.yaml
overlays/
multi-tenant/
kustomization.yaml
resource-quotas.yaml
limit-range.yaml
argo-cd/
app-manifest.yamlBase kustomization.yaml:
# kustomization.yaml (base)
resources:
- spark-operator.yaml
- rbac.yaml
- serviceaccount.yamlOverlay kustomization.yaml:
# kustomization.yaml (overlay for multi-tenant)
bases:
- ../../base
resources:
- resource-quotas.yaml
- limit-range.yaml
namespace: kubeflowApply Kustomize overlay:
kubectl apply -k overlays/multi-tenantOnce the configuration is in place, you can define an ArgoCD application to manage the deployment using Kustomize.
Example ArgoCD Application:
# argo-cd/app-manifest.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: spark-operator
namespace: argocd
spec:
destination:
server: https://kubernetes.default.svc
namespace: kubeflow
source:
repoURL: '[email protected]:<your-org>/k8s-configs.git'
targetRevision: HEAD
path: spark-operator/overlays/multi-tenant
kustomize:
namePrefix: "spark-"
syncPolicy:
automated:
prune: true
selfHeal: trueApply ArgoCD application:
kubectl apply -f argo-cd/app-manifest.yamlUsers can now submit Spark jobs to their own namespaces by specifying the namespace in the SparkApplication YAML:
# spark-job.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-job
namespace: <user-namespace> # Replace with the actual user namespace
spec:
type: Python
pythonVersion: "3"
mainApplicationFile: "local:///opt/spark/examples/src/main/python/pi.py"
driver:
cores: 1
memory: "2Gi"
executor:
cores: 1
memory: "2Gi"
instances: 2
restartPolicy:
type: NeverSubmit the job:
kubectl apply -f spark-job.yaml -n <user-namespace>By following these steps, you'll be able to deploy the Spark Operator in a multi-tenant Kubernetes environment, configure RBAC, set up resource quotas, and allow users to submit Spark jobs to their own namespaces. Use Kustomize for environment-specific configurations, and ArgoCD for continuous deployment and management.