Skip to content

Instantly share code, notes, and snippets.

@gacardinal
Created February 14, 2019 17:16
Show Gist options
  • Save gacardinal/4b2cb9acba9d0e0edba218602e8353ba to your computer and use it in GitHub Desktop.
Save gacardinal/4b2cb9acba9d0e0edba218602e8353ba to your computer and use it in GitHub Desktop.
Python code to initialize a SparkSession with MongoDB
from pyspark.sql import SparkSession
# The important thing that seems to be often omitted in the docs is the
# spark.jars.packages option.
# Be sure to change the value to reflect your particular version of the spark-mongo connector you are using
spark = SparkSession \
.builder \
.appName("pysparktestapp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/<database>.<collection>") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/<database>.<collection>") \
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.0") \
.getOrCreate()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment