Created
February 14, 2019 17:16
-
-
Save gacardinal/4b2cb9acba9d0e0edba218602e8353ba to your computer and use it in GitHub Desktop.
Python code to initialize a SparkSession with MongoDB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import SparkSession | |
# The important thing that seems to be often omitted in the docs is the | |
# spark.jars.packages option. | |
# Be sure to change the value to reflect your particular version of the spark-mongo connector you are using | |
spark = SparkSession \ | |
.builder \ | |
.appName("pysparktestapp") \ | |
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/<database>.<collection>") \ | |
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/<database>.<collection>") \ | |
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.0") \ | |
.getOrCreate() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment