Running Spark job in Batch vs Streaming mode against Kerberos
BROKER="broker1:9092"
SPARK_URL="yarn"
MODE="cluster"
APP_NAME="ApplicationName"
SECURITYPROTOCOL="SASL_PLAINTEXT"
COMMON_JAR="common-jar.jar"
Get the fucking kerberos ticket first
Make sure the permissions to the keytab and jaas files are correctly set, i.e. they are readable.
spark2-submit --files jaas.conf,keytabFile.keytab \
--driver-java-options "-Djava.security.auth.login.config=jaas.conf" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf \
-Dsun.security.krb5.debug=true" \
--conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf \
-Dsun.security.krb5.debug=true" \
--jars $COMMON_JAR --class com.achintya.ClassName --master $SPARK_URL \
--deploy-mode $MODE spark-batch-application.jar \
-n $APP_NAME -b $BROKER -t $TOPIC ...
Streaming mode (this sample connects to Kafka and HBase)
spark2-submit --principal achintya@REALM --keytab keytabFile.keytab --files jaas.conf,keytabFile.keytab \
--driver-java-options "-Djava.security.auth.login.config=./jaas.conf" \
--conf "spark.streaming.kafka.maxRatePerPartition=$MAX_RATE_PER_PARTITION" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf" \
--conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hbase/lib/" \
--driver-class-path "/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hbase/lib/" \
--class com.ultratendency.StreamingScorer --master $SPARK_URL --deploy-mode \
--num-executors $NUM_EXECUTORS --executor-cores $EXECUTOR_CORES \
--executor-memory $EXECUTOR_MEMORY --driver-memory $DRIVER_MEMORY \
--jars $COMMON_JAR \
spark-streaming-application.jar -n $APPNAME -b $BROKER ...