Update: I now use a slightly different version of this script, which creates a single zip file instead of one per document, and puts a timestamp in the filename rather than overwriting the previous backup file. That version can be found at https://github.com/brokensandals/export-google-docs.
Google Apps Script that exports all your Google Docs/Sheets/Slides into docx/xlsx/pptx files and PDFs into a folder in your Google Drive. For more info and step-by-step setup instructions, see here: http://brokensandals.net/google-docs-backup
Replace INSERT_FOLDER_ID_HERE with the ID of the folder you want backups to be placed in.
Create a trigger to run the backupAll
function if you want to do this on a schedule (e.g. nightly).
Notes:
- By default, only files that you own (as opposed to files others have shared with you) will be backed up.
Remove the
file.getOwner()
check from thebackupAll
method if you want to change that. - For each file, both an Office file (docx/xlsx/pptx) and a PDF are generated, and combined into a zip file that's placed in the backup folder. Zipping the backup files ensures that they don't clutter up the recent activity list for Docs/Sheets/Slides.
- The script depends on the lastUpdated dates being correct on both the input files and the files in the backup directory.
If that seems problematic, you could change the
createOrUpdateFileForBlob
method to delete existing backup files rather than updating them.
As always, this code may have defects that prevent it from working properly. Use at your own risk and remember to periodically verify that your backups are actually working as expected.
Thanks for your answer @brokensandals!
I've been reading into the whole thing today and made some adjustments, especially for speed as the script terminates after a few minutes. Since I don't want to zip or create pdfs I removed them from my version. A probably volatile solution to get more speed is to hard code the export links, skipping one query (I also removed the . from the extension list):
I got another big speed gain from checking if the existing backup file is newer before downloading the blobs and removed the check in the update function:
This way the existing files are basically skipped over fast, which means the script can run multiple times and has a better chance to reach the end if the files at the front are done by a previous iteration. It's not ideal as it still suffers from the same problem as before if there are too many files. I read about being able to cache values between script runs so I'll look into that next. (Or not, depending on time I can spend on it)
Thanks for the links on folder structure, the multi parents threw me off, I thought it was supposed to list every parent kind of like this [Root, Subfolder, SubSubfolder], which it doesn't. I think walking the tree is probably faster than reconstructing the hierarchy from each file.
Thanks again, the script was a fantastic starting point to reverse engineer! I will report back if I get the folders to work.
Kind Regards