Skip to content

Instantly share code, notes, and snippets.

@Laxman-SM
Created February 15, 2020 17:10
Show Gist options
  • Select an option

  • Save Laxman-SM/97555f5d0594c985eef8d925d483ee3b to your computer and use it in GitHub Desktop.

Select an option

Save Laxman-SM/97555f5d0594c985eef8d925d483ee3b to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
cell: 1
import os
import sys
cell: 2
os.environ["SPARK_HOME"] = "/usr/spark2.4.3"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python"
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.7-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
cell: 3
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
SparkSession.builder.getOrCreate()
rdd = sc.textFile("/data/mr/wordcount/input/")
print(rdd.take(10))
print(sc.version)
output:- ['The Project Gutenberg EBook of The Adventures of Sherlock Holmes', 'by Sir Arthur Conan Doyle', '(#15 in our series by Sir Arthur Conan Doyle)', '', 'Copyright laws are changing all over the world. Be sure to check the', 'copyright laws for your country before downloading or redistributing', 'this or any other Project Gutenberg eBook.', '', 'This header should be the first thing seen when viewing this Project', 'Gutenberg file. Please do not remove it. Do not change or edit the']
2.4.3
cell: 4
print(sc.version)
output:-
2.4.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment