This is a helper function that will convert a given PDF file blob into text, as well as offering options to save the original PDF, intermediate Google Doc, and/or final plain text files. Additionally, the language used for Optical Character Recognition (OCR) may be specified, defaulting to 'en' (English).
Note: Updated 12 May 2015 due to deprecation of DocsList. Thanks to Bruce McPherson for the getDriveFolderFromPath()
utility.
// Start with a Blob object
var blob = gmailAttchment.getAs(MimeType.PDF);
// fileId will be the ID of a saved text file (default behavior):
var fileId = pdfToText( blob );
// filetext will contain text from pdf file, no residual files are saved:
var filetext = pdfToText( blob, {keepTextfile: false} );
// we can save other converted file types, too:
var options = {
keepPdf : true, // Keep a copy of the original PDF file.
keepGdoc : true, // Keep a copy of the OCR Google Doc file.
keepTextfile : true, // Keep a copy of the text file. (default)
path : "attachments/today" // Folder path to store file(s) in.
}
filetext = pdfToText( blob, options );
I'll recommend using pdftotext from the poppler package