Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45
Download and install it from oracle.com
| import sys | |
| from java.io import * | |
| import java.io.InputStream | |
| import java.io.FileInputStream | |
| import java.lang.String # blah....converting String types between Java/Python is tedious | |
| sys.path.append("pdfbox-1.0.0.jar") # or wherever you stashed it | |
| import org.apache.pdfbox | |
| """ | |
| This method merges the FileInputStreams that the streamList points to, into the |
Tested with Apache Spark 1.3.1, Python 2.7.9 and Java 1.8.0_45
Download and install it from oracle.com