Skip to content

Instantly share code, notes, and snippets.

@DGrady
Last active June 26, 2017 18:47
Show Gist options
  • Save DGrady/de781f81b265c676031bbf9b6a87bb20 to your computer and use it in GitHub Desktop.
Save DGrady/de781f81b265c676031bbf9b6a87bb20 to your computer and use it in GitHub Desktop.
Configure PySpark environment variables and such
# See also https://github.com/minrk/findspark
import os
from pathlib import Path
import sys
def configure_spark(spark_home: str = None, python_path: str = None):
if not spark_home:
spark_home = os.environ['SPARK_HOME']
if not python_path:
python_path = sys.executable
os.environ['SPARK_HOME'] = spark_home
os.environ['PYSPARK_PYTHON'] = python_path
spark_python_path = Path(spark_home).joinpath('python')
py4j_paths = spark_python_path.joinpath('lib').glob('py4j-*.zip')
sys.path = [str(spark_python_path)] + list(map(str, py4j_paths)) + sys.path
return True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment