The JavaScript engine implemented in Google Apps Scripts is decidedly syncronous and sequential, there is a time-limit to how long a script can execute, and there are a variety of quotas on every API endpoint.
In use cases where there is a large amount of information to be retrieved from APIs, the developer will need to deploy techniques to work around the discussed limitations.
This gist explores the variety of methods available in the stack that works around these. Specifically, we will work with the Google Drive API to explore this topic as learning tool.
Throughout this gist, only a minimal amount of code will be used to accomplish code relevant to concurrent processing. There is no attempt made to save extraneous information not essential to the core functionality of the concept being illustrated. There is no attempt to install triggers to ensure the code is re-executed.
Some utility functions have been written for these examples, which are released with an MIT license.
- The Pause/Resume Method
- Main and Worker Tasks
- Asyncronous concurrency with child scripts
Please click the links above to jump to the indicated sections
This method utilizes PropertyServices
in order to keep a running record of our progress throughout the process of retrieving the data. In addition, it creates a trigger that executes a few minutes later to continue where we left off.
The Google Drive API files: list
can be interacted with the Advanced DriveApp Service. The project will need the https://www.googleapis.com/auth/drive.readonly
scope and the DriveApp
Advanced Service enabled. We can collect files names of every file in the User's drive as below:
function returnFileNamesInDrive() {
var files, file, fileNames = [];
files = DriveApp.getFiles();
while (files.hasNext()) {
file = files.next();
fileNames.push(file.getName());
}
return fileNames;
}
As with any of the advanced services, developers can choose to interact with the same endpoints in a more raw style, by utlizing UrlFetchAll.fetch
. In this case, we need both https://www.googleapis.com/auth/script.external_request
and https://www.googleapis.com/auth/drive.readonly
scopes:
function returnFileNamesInDrive() {
var url, response, json, nextPageToken = null, fileNames = [];
do {
// setup the url for fetching
url = 'https://www.googleapis.com/drive/v3/files?corpora=user&pageSize=100';
if (nextPageToken)
url += '&pageToken=' + nextPageToken;
// reach out to the internet, convert to a json so we can use it
response = UrlFetchApp.fetch(url, {
headers: {
"Authorization": "Bearer " + ScriptApp.getOAuthToken(),
},
method: 'get'
});
json = JSON.parse(response);
// process, and continue
json.files.forEach(function (file) {
fileNames.push(file.name);
});
nextPageToken = json.nextPageToken;
} while (nextPageToken);
return fileNames;
}
Since we need functionality to detect a certain amount of time has elapsed, please see the following utility function:
/**
* Continuously call function callback from now until time has elapsed
* callback return true indicates early completion
*
* @param {object} timeObject
* @param {number} timeObject.minutes How many minutes from now, additive, default is 0
* @param {number} timeObject.seconds How many seconds from now, additive, default is 1
* @param {number} timeObject.milliseconds How many milliseconds from now, additive, default is 0
* @param {function} callback The function to execute continously
* @param {any} args Any additional arguments passed will be passed to callback
* @return {void}
*/
function continueUntil(timeObject, callback /*, args */) {
var args, endTime, done = false;
timeObject = timeObject || { seconds: 1 };
endTime = new Date();
args = Array.prototype.slice.call(arguments, 2);
timeObject.minutes = timeObject.minutes || 0;
timeObject.seconds = timeObject.seconds || 0;
timeObject.milliseconds = timeObject.milliseconds || 0;
endTime.setMinutes(
endTime.getMinutes() + timeObject.minutes,
endTime.getSeconds() + timeObject.seconds,
endTime.getMilliseconds() + timeObject.milliseconds
);
while (!done && (new Date()) < endTime) {
done = callback.apply(callback, args);
}}
We can use the above continueUntil
in the following way:
continueUntil({minutes: 5}, function Callback () {
// read in files
// return true if there are no more
// will cease execution when 5 minutes has expired
});
We choose five minutes in our example, because we have a script run-time limitation of 6 minutes.
Now we need to write the body of the Callback
function, with some extra overhead processing to read in and save the token.
Using the Advanced Services DriveApp
, we can gather the relevant token with file.getContinuationToken()
, and store it in the script properties. When we know we have exhausted the results, clear it from the properties.
function pauseResume_DriveApp() {
var properties, file, files, fileNames = [], token, tokenKey = 'tokenKey';
// read in the token, if available
properties = PropertiesService.getScriptProperties();
token = properties.getProperty(tokenKey);
if (!token)
// must be the first time executing
files = DriveApp.getFiles();
else
// there is a token saved from our previous execution
files = DriveApp.continueFileIterator(token);
continueUntil({minutes: 5}, function () {
file = files.next();
fileNames.push(file.getName());
return !files.hasNext();
});
if (files.hasNext())
// we stopped because we ran out of time, which we can tell b/c there are still items left
properties.setProperty(tokenKey, files.getContinuationToken());
else
// no more, so delete key
properties.deleteProperty(tokenKey);
Logger.log(fileNames);
}
This is the equivalent, using the manual method:
function pauseResume_Manual() {
var properties, file, files, fileNames = [], nextPageToken, tokenKey = 'tokenKey';
properties = PropertiesService.getScriptProperties();
nextPageToken = properties.getProperty(tokenKey);
continueUntil({minutes: 5}, function () {
var url, response, json;
url = 'https://www.googleapis.com/drive/v3/files?corpora=user&pageSize=5';
if (nextPageToken)
url += '&pageToken=' + nextPageToken;
response = UrlFetchApp.fetch(url, {
headers: {
"Authorization": "Bearer " + ScriptApp.getOAuthToken(),
},
method: 'get'
});
json = JSON.parse(response);
// process, and continue
json.files.forEach(function (file) {
fileNames.push(file.name);
});
nextPageToken = json.nextPageToken;
return nextPageToken;
});
if (nextPageToken)
properties.setProperty(tokenKey, nextPageToken);
else
properties.deleteProperty(tokenKey);
Logger.log(fileNames);
}
The main technical advantage of the DriveApp
pause/resume method is that it uses the least amount of quota debt. The PropertiesServices
stores have the largest quotas available to the developer, and tokens are within the space constraints for values in the stores. You are least likely to hit any of the quota constraints with this method.
Meanwhile, the manual method is subject to UrlFetchApp.fetch
quota — which is signficantly less than the properties stores quotas — and thus for pause/resume, it may appear there is no real advantage to using the manual method. However, with the manual method, you are able to define the pageSize
, and thus increase it, thereby resulting in faster processing time.
The main observation to be made here is that the pause/resume is probably the best option for cases where — such as this one — that all of the information we need downloaded is available to us in one call, and only then can the next token be retrieved. The only bit of info we are interested in is the name of the file, which is provided to us. The only speedup to be found is in use of pageSize
which is possible with the manual method but not the advanced service.
However, if we wanted to download information about the file beyond just the name — for example the meta data or the content — we need to reach out to an additional API. This is where the use case opens us to more possibilities with concurrent processing.