Skip to content

Instantly share code, notes, and snippets.

@prashanth-sams
Forked from krmahadevan/PlayWithPDF.java
Last active August 29, 2015 14:14
Show Gist options
  • Save prashanth-sams/8e5730d79265a2edc5f8 to your computer and use it in GitHub Desktop.
Save prashanth-sams/8e5730d79265a2edc5f8 to your computer and use it in GitHub Desktop.
import java.io.BufferedInputStream;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.util.PDFTextStripper;
public class PlayWithPDF {
/**
* @param args
* @throws URISyntaxException
* @throws IOException
*/
public static void main(String[] args) throws URISyntaxException, IOException {
URL url = new URL("https://bitcoin.org/bitcoin.pdf");
System.out.println(getTextFromPDF(url));
}
public static String getTextFromPDF(URL url) throws IOException{
BufferedInputStream fileToParse = new BufferedInputStream(url.openStream());
PDFParser parser = new PDFParser(fileToParse);
parser.parse();
String text = new PDFTextStripper().getText(parser.getPDDocument());
System.out.println(text);
parser.getPDDocument().close();
return text;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment