Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kumar-de/9ca9ee545988c58f730327f3e7c44af2 to your computer and use it in GitHub Desktop.
Save kumar-de/9ca9ee545988c58f730327f3e7c44af2 to your computer and use it in GitHub Desktop.
Running Spark job in Batch vs Streaming mode against Kerberos

Running Spark job in Batch vs Streaming mode against Kerberos

Variables

BROKER="broker1:9092"
SPARK_URL="yarn"
MODE="cluster"
APP_NAME="ApplicationName"
SECURITYPROTOCOL="SASL_PLAINTEXT"
COMMON_JAR="common-jar.jar"

Important

  • Get the fucking kerberos ticket first
  • Make sure the permissions to the keytab and jaas files are correctly set, i.e. they are readable.

Batch mode

spark2-submit --files jaas.conf,keytabFile.keytab                                       \
--driver-java-options "-Djava.security.auth.login.config=jaas.conf"                     \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf     \
        -Dsun.security.krb5.debug=true"                                                 \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf       \
        -Dsun.security.krb5.debug=true"                                                 \
--jars $COMMON_JAR --class com.achintya.ClassName --master $SPARK_URL                   \
--deploy-mode $MODE spark-batch-application.jar                                         \
-n $APP_NAME -b $BROKER -t $TOPIC ...

Streaming mode (this sample connects to Kafka and HBase)

spark2-submit --principal achintya@REALM --keytab keytabFile.keytab --files jaas.conf,keytabFile.keytab                  \
--driver-java-options "-Djava.security.auth.login.config=./jaas.conf"                                                    \
--conf "spark.streaming.kafka.maxRatePerPartition=$MAX_RATE_PER_PARTITION"                                               \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf"                                   \
--conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hbase/lib/" \
--driver-class-path "/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hbase/lib/"                  \
--class com.ultratendency.StreamingScorer --master $SPARK_URL --deploy-mode                                              \
--num-executors $NUM_EXECUTORS --executor-cores $EXECUTOR_CORES                                                          \
--executor-memory $EXECUTOR_MEMORY --driver-memory $DRIVER_MEMORY                                                        \
--jars $COMMON_JAR                                                                                                       \
spark-streaming-application.jar -n $APPNAME -b $BROKER ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment