Skip to content

Instantly share code, notes, and snippets.

@brainysmurf
Created January 14, 2019 11:39
Show Gist options
  • Save brainysmurf/7b4d390b4d560df8321580fc4ea1d73d to your computer and use it in GitHub Desktop.
Save brainysmurf/7b4d390b4d560df8321580fc4ea1d73d to your computer and use it in GitHub Desktop.
A Tour of Concurrent Processing in Apps Scripts

A Tour of Concurrent Processing Techniques in Apps Scripts

The JavaScript engine implemented in Google Apps Scripts is decidedly syncronous and sequential, there is a time-limit to how long a script can execute, and there are a variety of quotas on every API endpoint.

In use cases where there is a large amount of information to be retrieved from APIs, the developer will need to deploy techniques to work around the discussed limitations.

This gist explores the variety of methods available in the stack that works around these. Specifically, we will work with the Google Drive API to explore this topic as learning tool.

A note on the Code

Throughout this gist, only a minimal amount of code will be used to accomplish code relevant to concurrent processing. There is no attempt made to save extraneous information not essential to the core functionality of the concept being illustrated. There is no attempt to install triggers to ensure the code is re-executed.

Some utility functions have been written for these examples, which are released with an MIT license.

Table of Contents:

Please click the links above to jump to the indicated sections

This method utilizes PropertyServices in order to keep a running record of our progress throughout the process of retrieving the data. In addition, it creates a trigger that executes a few minutes later to continue where we left off.

Use Case: Downloading from Drive API with tokens

The Google Drive API files: list can be interacted with the Advanced DriveApp Service. The project will need the https://www.googleapis.com/auth/drive.readonly scope and the DriveApp Advanced Service enabled. We can collect files names of every file in the User's drive as below:

function returnFileNamesInDrive() {
  var files, file, fileNames = [];
  files = DriveApp.getFiles();
  while (files.hasNext()) {
    file = files.next();
    fileNames.push(file.getName());
  }
  return fileNames;
}

As with any of the advanced services, developers can choose to interact with the same endpoints in a more raw style, by utlizing UrlFetchAll.fetch. In this case, we need both https://www.googleapis.com/auth/script.external_request and https://www.googleapis.com/auth/drive.readonly scopes:

function returnFileNamesInDrive() {
  var url, response, json, nextPageToken = null, fileNames = [];

  do {
    // setup the url for fetching
    url = 'https://www.googleapis.com/drive/v3/files?corpora=user&pageSize=100';
    if (nextPageToken)
      url += '&pageToken=' + nextPageToken;
    
    // reach out to the internet, convert to a json so we can use it
    response = UrlFetchApp.fetch(url, {
      headers: {
        "Authorization": "Bearer " + ScriptApp.getOAuthToken(),
      },
      method: 'get'
    });
    json = JSON.parse(response);
    
    // process, and continue
    json.files.forEach(function (file) {
      fileNames.push(file.name);
    });
    nextPageToken = json.nextPageToken;
    
  } while (nextPageToken);
  
  return fileNames;
}

Since we need functionality to detect a certain amount of time has elapsed, please see the following utility function:

/**
 * Continuously call function callback from now until time has elapsed
 *   callback return true indicates early completion
 * 
 * @param {object} timeObject
 * @param {number} timeObject.minutes How many minutes from now, additive, default is 0
 * @param {number} timeObject.seconds How many seconds from now, additive, default is 1 
 * @param {number} timeObject.milliseconds How many milliseconds from now, additive, default is 0
 * @param {function} callback The function to execute continously
 * @param {any} args Any additional arguments passed will be passed to callback
 * @return {void}
 */
function continueUntil(timeObject, callback /*, args */) {
  var args, endTime, done = false;
  timeObject = timeObject || { seconds: 1 };
  endTime = new Date();
  args = Array.prototype.slice.call(arguments, 2);

  timeObject.minutes = timeObject.minutes || 0;
  timeObject.seconds = timeObject.seconds || 0;
  timeObject.milliseconds = timeObject.milliseconds || 0;
  endTime.setMinutes(
    endTime.getMinutes() + timeObject.minutes,
    endTime.getSeconds() + timeObject.seconds,
    endTime.getMilliseconds() + timeObject.milliseconds
  );

  while (!done && (new Date()) < endTime) {
    done = callback.apply(callback, args);
  }}

We can use the above continueUntil in the following way:

continueUntil({minutes: 5}, function Callback () {
  // read in files
  // return true if there are no more
  // will cease execution when 5 minutes has expired
});

We choose five minutes in our example, because we have a script run-time limitation of 6 minutes.

Now we need to write the body of the Callback function, with some extra overhead processing to read in and save the token.

Using the Advanced Services DriveApp, we can gather the relevant token with file.getContinuationToken(), and store it in the script properties. When we know we have exhausted the results, clear it from the properties.

function pauseResume_DriveApp() {
  var properties, file, files, fileNames = [], token, tokenKey = 'tokenKey';
  
  // read in the token, if available
  properties = PropertiesService.getScriptProperties();
  token = properties.getProperty(tokenKey);  
  
  if (!token)
    // must be the first time executing
    files = DriveApp.getFiles();
  else
    // there is a token saved from our previous execution
    files = DriveApp.continueFileIterator(token);

  continueUntil({minutes: 5}, function () {
    file = files.next();
    fileNames.push(file.getName());
    return !files.hasNext();
  });
  
  if (files.hasNext())
    // we stopped because we ran out of time, which we can tell b/c there are still items left
    properties.setProperty(tokenKey, files.getContinuationToken());
  else
    // no more, so delete key
    properties.deleteProperty(tokenKey);

  Logger.log(fileNames);
}

This is the equivalent, using the manual method:

function pauseResume_Manual() {
  var properties, file, files, fileNames = [], nextPageToken, tokenKey = 'tokenKey';
  
  properties = PropertiesService.getScriptProperties();
  nextPageToken = properties.getProperty(tokenKey);  
  
  continueUntil({minutes: 5}, function () {
    var url, response, json;
    url = 'https://www.googleapis.com/drive/v3/files?corpora=user&pageSize=5';
    if (nextPageToken) 
      url += '&pageToken=' + nextPageToken;

    response = UrlFetchApp.fetch(url, {
      headers: {
        "Authorization": "Bearer " + ScriptApp.getOAuthToken(),
      },
      method: 'get'
    });
    json = JSON.parse(response);
    
    // process, and continue
    json.files.forEach(function (file) {
      fileNames.push(file.name);
    });
    nextPageToken = json.nextPageToken;
    
    return nextPageToken;
  });
  
  if (nextPageToken)
    properties.setProperty(tokenKey, nextPageToken);
  else
    properties.deleteProperty(tokenKey);

  Logger.log(fileNames);
}

Discussion

The main technical advantage of the DriveApp pause/resume method is that it uses the least amount of quota debt. The PropertiesServices stores have the largest quotas available to the developer, and tokens are within the space constraints for values in the stores. You are least likely to hit any of the quota constraints with this method.

Meanwhile, the manual method is subject to UrlFetchApp.fetch quota — which is signficantly less than the properties stores quotas — and thus for pause/resume, it may appear there is no real advantage to using the manual method. However, with the manual method, you are able to define the pageSize, and thus increase it, thereby resulting in faster processing time.

The main observation to be made here is that the pause/resume is probably the best option for cases where — such as this one — that all of the information we need downloaded is available to us in one call, and only then can the next token be retrieved. The only bit of info we are interested in is the name of the file, which is provided to us. The only speedup to be found is in use of pageSize which is possible with the manual method but not the advanced service.

However, if we wanted to download information about the file beyond just the name — for example the meta data or the content — we need to reach out to an additional API. This is where the use case opens us to more possibilities with concurrent processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment