Spark Instructions

Step-by-Step Guide to Install Spark Operator

1. Add Kubeflow Helm Chart Repository

# Add the Kubeflow Helm chart repo
helm repo add kubeflow https://charts.kubeflow.org
helm repo update

2. Install Spark Operator Using Helm

# Install Spark Operator in the specified namespace (kubeflow)
helm install spark-operator kubeflow/spark-operator --namespace kubeflow --set operator.enabled=true

3. Configure RBAC for Spark Operator

The Spark Operator needs ClusterRole and ClusterRoleBinding to manage Spark jobs across multiple namespaces.

Create the RBAC resources:

# rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-operator-role
rules:
  - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
    apiGroups: [""]
    resources: ["pods", "pods/logs", "services", "secrets", "configmaps"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-operator-role-binding
subjects:
  - kind: ServiceAccount
    name: spark-operator-service-account
    namespace: kubeflow
roleRef:
  kind: ClusterRole
  name: spark-operator-role
  apiGroup: rbac.authorization.k8s.io

Apply the RBAC configuration:

kubectl apply -f rbac.yaml

4. Create ServiceAccount for Spark Operator

The Spark Operator will use a ServiceAccount to interact with resources across namespaces.

# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-operator-service-account
  namespace: kubeflow  # Adjust namespace if necessary

Apply the ServiceAccount configuration:

kubectl apply -f serviceaccount.yaml

5. Optional: Set Resource Quotas for User Namespaces

If you want to manage resource usage per namespace, apply resource quotas and limit ranges for Spark jobs.

Example resource quotas:

# resource-quotas.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: spark-job-quota
  namespace: kubeflow
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi

Apply the resource quotas:

kubectl apply -f resource-quotas.yaml

6. Optional: Set LimitRange for Spark Jobs

You can define default resource requests and limits for Spark jobs to ensure they do not exceed certain thresholds.

Example limit range:

# limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: spark-job-limits
  namespace: kubeflow
spec:
  limits:
    - default:
        memory: 4Gi
        cpu: "2"
      defaultRequest:
        memory: 2Gi
        cpu: "1"
      type: Container

Apply the limit range:

kubectl apply -f limit-range.yaml

7. Use Kustomize to Manage Configurations (Optional)

You can use Kustomize to manage the configurations for different environments (e.g., staging, production) or user-specific namespaces.

Example directory structure:

spark-operator/
  base/
    kustomization.yaml
    spark-operator.yaml
    rbac.yaml
    serviceaccount.yaml
  overlays/
    multi-tenant/
      kustomization.yaml
      resource-quotas.yaml
      limit-range.yaml
  argo-cd/
    app-manifest.yaml

Base kustomization.yaml:

# kustomization.yaml (base)
resources:
  - spark-operator.yaml
  - rbac.yaml
  - serviceaccount.yaml

Overlay kustomization.yaml:

# kustomization.yaml (overlay for multi-tenant)
bases:
  - ../../base
resources:
  - resource-quotas.yaml
  - limit-range.yaml
namespace: kubeflow

Apply Kustomize overlay:

kubectl apply -k overlays/multi-tenant

8. Deploy Using ArgoCD

Once the configuration is in place, you can define an ArgoCD application to manage the deployment using Kustomize.

Example ArgoCD Application:

# argo-cd/app-manifest.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: spark-operator
  namespace: argocd
spec:
  destination:
    server: https://kubernetes.default.svc
    namespace: kubeflow
  source:
    repoURL: '[email protected]:<your-org>/k8s-configs.git'
    targetRevision: HEAD
    path: spark-operator/overlays/multi-tenant
    kustomize:
      namePrefix: "spark-"
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Apply ArgoCD application:

kubectl apply -f argo-cd/app-manifest.yaml

9. Submit Spark Jobs in User Namespaces

Users can now submit Spark jobs to their own namespaces by specifying the namespace in the SparkApplication YAML:

# spark-job.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-job
  namespace: <user-namespace>  # Replace with the actual user namespace
spec:
  type: Python
  pythonVersion: "3"
  mainApplicationFile: "local:///opt/spark/examples/src/main/python/pi.py"
  driver:
    cores: 1
    memory: "2Gi"
  executor:
    cores: 1
    memory: "2Gi"
    instances: 2
  restartPolicy:
    type: Never

Submit the job:

kubectl apply -f spark-job.yaml -n <user-namespace>

Conclusion

By following these steps, you'll be able to deploy the Spark Operator in a multi-tenant Kubernetes environment, configure RBAC, set up resource quotas, and allow users to submit Spark jobs to their own namespaces. Use Kustomize for environment-specific configurations, and ArgoCD for continuous deployment and management.

bryanpaget/spark.md

Select an option

No results found