This is a helper function that will convert a given PDF file blob into text, as well as offering options to save the original PDF, intermediate Google Doc, and/or final plain text files. Additionally, the language used for Optical Character Recognition (OCR) may be specified, defaulting to 'en' (English).
Note: Updated 12 May 2015 due to deprecation of DocsList. Thanks to Bruce McPherson for the getDriveFolderFromPath()
utility.
// Start with a Blob object
var blob = gmailAttchment.getAs(MimeType.PDF);
// fileId will be the ID of a saved text file (default behavior):
var fileId = pdfToText( blob );
// filetext will contain text from pdf file, no residual files are saved:
var filetext = pdfToText( blob, {keepTextfile: false} );
// we can save other converted file types, too:
var options = {
keepPdf : true, // Keep a copy of the original PDF file.
keepGdoc : true, // Keep a copy of the OCR Google Doc file.
keepTextfile : true, // Keep a copy of the text file. (default)
path : "attachments/today" // Folder path to store file(s) in.
}
filetext = pdfToText( blob, options );
I've been using this for a few months but yesterday it just stopped working completely, is anyone else having this problem?
It fails on line
var gdocFile = Drive.Files.insert(resource, pdfFile, insertOpts);
Just says internal error. I've also noticed that I can't right click a pdf in Drive and go to "open with > google docs" as that just errors too. I hope this gets fixed as I use this a lot.