Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45
Download and install it from oracle.com
Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45
Download and install it from oracle.com
import sys | |
from java.io import * | |
import java.io.InputStream | |
import java.io.FileInputStream | |
import java.lang.String # blah....converting String types between Java/Python is tedious | |
sys.path.append("pdfbox-1.0.0.jar") # or wherever you stashed it | |
import org.apache.pdfbox | |
""" | |
This method merges the FileInputStreams that the streamList points to, into the |