Created
March 13, 2018 22:55
-
-
Save rdinse/159f5d77f13d03e0183cb8f7154b170a to your computer and use it in GitHub Desktop.
Simple Google Drive backup script with automatic authentication for Google Colaboratory (Python 3)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Simple Google Drive backup script with automatic authentication | |
# for Google Colaboratory (Python 3) | |
# Instructions: | |
# 1. Run this cell and authenticate via the link and text box. | |
# 2. Copy the JSON output below this cell into the `mycreds_file_contents` | |
# variable. Authentication will occur automatically from now on. | |
# 3. Create a new folder in Google Drive and copy the ID of this folder | |
# from the URL bar to the `folder_id` variable. | |
# 4. Specify the directory to be backed up in `dir_to_backup`. | |
# Caveats: | |
# 1. The backup/restore functions override existing files both locally and | |
# remotely without warning. | |
# 2. Empty directories and files are ignored. | |
# 3. Use at your own risk. | |
!pip install -U -q PyDrive | |
from pydrive.auth import GoogleAuth | |
from pydrive.drive import GoogleDrive | |
import google.colab | |
from oauth2client.client import GoogleCredentials | |
import glob, os | |
folder_id = 'GOOGLE_DRIVE_FOLDER_ID_HERE' | |
dir_to_backup = 'LOCAL_BACKUP_DIRECTORY_HERE' | |
mycreds_file_contents = 'PASTE_JSON_STRING_HERE' | |
mycreds_file = 'mycreds.json' | |
with open(mycreds_file, 'w') as f: | |
f.write(mycreds_file_contents) | |
def authenticate_pydrive(): | |
gauth = GoogleAuth() | |
# https://stackoverflow.com/a/24542604/5096199 | |
# Try to load saved client credentials | |
gauth.LoadCredentialsFile(mycreds_file) | |
if gauth.credentials is None: | |
# Authenticate if they're not there | |
google.colab.auth.authenticate_user() | |
gauth.credentials = GoogleCredentials.get_application_default() | |
elif gauth.access_token_expired: | |
# Refresh them if expired | |
gauth.Refresh() | |
else: | |
# Initialize the saved creds | |
gauth.Authorize() | |
# Save the current credentials to a file | |
gauth.SaveCredentialsFile(mycreds_file) | |
drive = GoogleDrive(gauth) | |
return drive | |
def backup_pydrive(): | |
drive = authenticate_pydrive() | |
paths = list(glob.iglob(os.path.join(dir_to_backup, '**'), recursive=True)) | |
print(paths) | |
# Delete existing files | |
files = drive.ListFile({'q': "'%s' in parents" % folder_id}).GetList() | |
for file in files: | |
if file['title'] in paths: | |
file.Delete() | |
for path in paths: | |
if os.path.isdir(path) or os.stat(path).st_size == 0: | |
continue | |
file = drive.CreateFile({'title': path, 'parents': | |
[{"kind": "drive#fileLink", "id": folder_id}]}) | |
file.SetContentFile(path) | |
file.Upload() | |
print('Backed up %s' % path) | |
def restore_pydrive(): | |
drive = authenticate_pydrive() | |
files = drive.ListFile({'q': "'%s' in parents" % folder_id}).GetList() | |
for file in files: | |
os.makedirs(os.path.dirname(file['title']), exist_ok=True) | |
file.GetContentFile(file['title']) | |
print('Restored %s' % file['title']) | |
authenticate_pydrive() | |
!cat {mycreds_file} |
Hello, I'm the completest noob of the whole internet (since can't find the same problems anywhere)
But the script couldn't find dir_to_backup even if it's root:
in backup_pydrive()
49
50 for path in paths:
---> 51 if os.path.isdir(path) or os.stat(path).st_size == 0:
52 continue
53 file = drive.CreateFile({'title': path, 'parents':
FileNotFoundError: [Errno 2] No such file or directory: '~/'
What am i doing wrong?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Robin, Your technique seems to persist data in Colab Notebook - I used the following config and don't see any data in google drive folder ( id obfuscated for privacy reasons below ) , but data is persistent in Collab Notebook over Browser re-loads and GPU runtime restarts - how do we store a copy of the data in Google Drive - I originally downloaded the data from Kaggle - and used your code to attempt to sync data into GDrive Folder.
The refresh token changes everytime i run the code in collab, but dont see any link or text box, is it because i am already logged into Google and using GDRIVE in another browser session
folder_id = '************************'
dir_to_backup = '/content/data'
mycreds_file_contents = '{"_module": "oauth2client.client", "scopes": [], "token_expiry": null, "id_token": null, "user_agent": "Python client library", "access_token": null, "token_uri": "https://oauth2.googleapis.com/token", "invalid": false, "token_response": null, "client_id": ".apps.googleusercontent.com", "token_info_uri": null, "client_secret": "####################", "revoke_uri": "https://oauth2.googleapis.com/revoke", "_class": "GoogleCredentials", "refresh_token": "$$$$$$$$$$$$$$$$$$$$$$$$$$$$$", "id_token_jwt": null}'
mycreds_file = 'mycreds.json'
Will the data be synced and deleted from Google Drive ? And only kept in Collab Folder ? Or should I begin with data in GDrive ( meaning download data into GDrive and then sync with Colab ?
Thanks for your time !