Created
December 17, 2018 17:51
-
-
Save fhoering/5d35a05bdd673057bccb65cc14b7559d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import sys | |
import numpy as np | |
from pyspark import SparkConf, SparkContext | |
def create_spark_context(): | |
pex_file = os.path.basename([path for path in sys.path if path.endswith('.pex')][0]) | |
conf = SparkConf() \ | |
.setMaster("yarn") \ | |
.set("spark.submit.deployMode", "client") \ | |
.set("spark.yarn.dist.files", pex_file) \ | |
.set("spark.executorEnv.PEX_ROOT", "./.pex") | |
os.environ['PYSPARK_PYTHON'] = "./" + pex_file | |
return SparkContext(conf=conf) | |
if __name__== "__main__": | |
sc = create_spark_context() | |
rdd = sc.parallelize([np.array([1,2,3]), np.array([1,2,3])], numSlices=2) | |
print(rdd.reduce(lambda x,y: np.dot(x,y))) | |
sys.exit(0) |
Does this aproach works already with spark-submit? I see those 2 tickets in Jira:
Seems to its not yet achievable.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I put this script into subfolder userlib/userlib/startup.py, then execute : pex pyspark==2.3.2 numpy userlib -o myarchive.pex with:
I am convertng from Scala/Java Spark world into Python and think its just some package classpath search issue..thx for hint. and thank you for article on medium!