Skip to content

Instantly share code, notes, and snippets.

@bradfordcp
Last active July 28, 2023 10:10
Show Gist options
  • Save bradfordcp/15b35493e7d2216fcda9fb24fee5a1e5 to your computer and use it in GitHub Desktop.
Save bradfordcp/15b35493e7d2216fcda9fb24fee5a1e5 to your computer and use it in GitHub Desktop.
Setting up Apache Spark to use Apache Shiro for authentication of Spark Master dashboard.

Securing Apache Spark with Apache Shiro

  1. Download shiro-core-1.2.5.jar Apache Shiro Downloads
  2. Download shiro-web-1.2.5.jar Apache Shiro Downloads
  3. Note the location of the JAR files and shiro.ini. I placed it in the root of my Spark download
  4. Update the spark-env.sh file with the Shiro JARs and add an entry for the path where the shiro.ini resides
  5. Start the Spark master sbin/start-master.sh
  6. Navigate to the Spark master dashboard
  7. Authenticate with credentials in shiro.ini

Note this was developed / tested with Apache Spark 1.4.1, but should work with newer versions as well.

# =======================
# Shiro INI configuration
# =======================
[main]
# Objects and their properties are defined here,
# Such as the securityManager, Realms and anything
# else needed to build the SecurityManager
securityManager.realms = $iniRealm
securityManager = org.apache.shiro.web.mgt.DefaultWebSecurityManager
securityManager.subjectDAO.sessionStorageEvaluator.sessionStorageEnabled = false
[users]
# The 'users' section is for simple deployments
# when you only need a small number of statically-defined
# set of User accounts.
admin = secret
[roles]
# The 'roles' section is for simple deployments
# when you only need a small number of statically-defined
# roles.
[urls]
# The 'urls' section is used for url-based security
# in web applications. We'll discuss this section in the
# Web documentation
/** = authcBasic
#!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
sbin="`dirname "$0"`"
sbin="`cd "$sbin"; pwd`"
SPARK_CLASSPATH="${sbin}/../lib/shiro-web-1.2.5.jar:${sbin}/../lib/shiro-core-1.2.5.jar" # Shiro JARs
SPARK_CLASSPATH="${SPARK_CLASSPATH}:${sbin}/../" # Directory containing shiro.ini
SPARK_MASTER_OPTS="-Dspark.ui.filters=org.apache.shiro.web.servlet.IniShiroFilter" # Filter for authenticating requests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment