Skip to content

Instantly share code, notes, and snippets.

@zeddee
Last active January 3, 2020 07:47
Show Gist options
  • Save zeddee/00b809facea8fb888585a69deeb90ee0 to your computer and use it in GitHub Desktop.
Save zeddee/00b809facea8fb888585a69deeb90ee0 to your computer and use it in GitHub Desktop.
removed z-iam-roles file for shorter gist

Setting up nytimes/library

This gist is just me trying to make sense of installing and running https://github.com/nytimes/library and figuring out where I've gone wrong.

So far, I've found that at v1.2.0, the library:

  • [edit 3 jan: turns out partially my fault -- I didn't remove the comments in .env; and it turns out that the deployed app on App Engine doesn't read from .env, or at least I'm not seeing that it does, but instead reads env variables from app.yml. So using shared folders is fine. ]Doesn't properly accept shared folders as a configuration option (see config below).
  • [edit 3 jan: can't figure out how to restrict permissions i.e. require users to log in via oauth in order to see the docs (or get oauth to work as expected at all. now just displays logged in user as user@your_organization.com.] Still runs into configuration issues with using shared folders. from the environment variables in our .env file, and
  • Docs are not clear enough about requiring datastore mode when setting up the cloud datastore/firestore API. (To be fair, this is another instance of Google/GCP changing their APIs and breaking their users' stuff).
  • Get a Google Cloud account. G suite account holders can activate a free 365-day trial with ~US$400 credits.
  • Install the gcloud command-line tool (CLI). (Google Cloud SDK)
  • (Optional) Install Node 10.x if you intend to try and run Library locally (not recommended).
    • I recommend using nvm to manage your Node versions.
    • Once nvm is installed, add a .nvmrc file to explicitly set the Node version to use for this directory.

You can interact with the Google Cloud Platform (GCP) using the web ui at https://console.cloud.google.com, or by using the gcloud CLI.

Because I prefer using the CLI, this guide will walk you through setting up your GCP project using gcloud only.

For a guide to how to do this using the web ui, look at the NYTimes's own guide here: https://nyt-library-demo.herokuapp.com/

To sign in to GCP using gcloud for the first time, run:

gcloud auth login # opens an OAuth page asking you for your GCP credentials

Otherwise, you can find a list of saved credentials by running:

gcloud auth list

To select a set of credentials from this list, run:

gcloud config set account <[email protected]>

Create a new GCP project, and give it a name. For this guide, we'll call our GCP project nytimes-library-test-2 (2 because project IDs must be unique, and I've messed up many many times while writing this).

gcloud projects create nytimes-library-test-2

# Output
Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/nytimes-library-test-2].
Waiting for [operations/cp.6973727416446203228] to finish...done.
Enabling service [cloudapis.googleapis.com] on project [nytimes-library-test-2]...
Operation "operations/acf.c5bd12b3-f1c0-4826-871f-5bd77c209984" finished successfully.

To list all projects available to the GCP user:

gcloud projects list

# Output
PROJECT_ID                NAME                    PROJECT_NUMBER
...
nytimes-library-test-2    nytimes-library-test-2  517690955308

Set the project as the default project for the user signed into gcloud by running:

gcloud config set project nytimes-library-test-2

# Output
Updated property [core/project].

To check active project for your GCP user:

gcloud config list project

# Output
[core]
project = nytimes-library-test-2

Your active configuration is: [default]

Create a service account (SA) to interact with your GCP project.

Create an SA by running:

gcloud iam service-accounts create nytimes-library-user

# Output
Created service account [nytimes-library-user].

List SAs:

gcloud iam service-accounts list

# Output
NAME  EMAIL                                                                DISABLED
  nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com  False

We'll need to assign a role to the SA. To get a list of all possible roles, we can run:

gcloud iam roles list

This gives us a massive list of all possible roles that we can assign to a user/service account in GCP. What we're interested in the "datastore user" role.

gcloud iam roles list | grep datastore

# Output
name: roles/datastore.importExportAdmin
name: roles/datastore.indexAdmin
name: roles/datastore.owner
name: roles/datastore.user
name: roles/datastore.viewer

To get a granular view of what permissions a role provides, run:

gcloud iam roles describe roles/datastore.user

# Output

description: Provides read/write access to data in a Cloud Datastore database. Intended
  for application developers and service accounts.
etag: AA==
includedPermissions:
- appengine.applications.get
- datastore.databases.get
- datastore.entities.allocateIds
- datastore.entities.create
- datastore.entities.delete
- datastore.entities.get
- datastore.entities.list
- datastore.entities.update
- datastore.indexes.list
- datastore.namespaces.get
- datastore.namespaces.list
- datastore.statistics.get
- datastore.statistics.list
- resourcemanager.projects.get
- resourcemanager.projects.list
name: roles/datastore.user
stage: GA
title: Cloud Datastore User

To grant a role to a service account:

gcloud projects add-iam-policy-binding nytimes-library-test-2 --member serviceAccount:nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com --role roles/datastore.user

# Output
Updated IAM policy for project [nytimes-library-test-2].
bindings:
- members:
  - serviceAccount:nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com
  role: roles/datastore.user
- members:
  - user:<gsuite_user_name>
  role: roles/owner
etag: <etag>
version: 1

To allow our library application to use this service account, we have to create a new key:

gcloud iam service-accounts keys create key.json --iam-account nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com

# Output
created key [46849c420c8295df9d73764739e137784ae0813e] of type [json] as [key.json] for [nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com]

This creates a new key and saves a JSON key file.

Copy the contents of this JSON key file, and enter it as the value of GOOGLE_APPLICATION_JSON in the .env file of the library repository.

NOTE: When pasting the contents of the JSON key file, make sure you remove all newlines so that the contents of the key file fit in a single line. Otherwise, the application cannot parse the JSON key and will throw an 'Unexpected end of JSON input' error.

Next, share the team drive or Google Drive folder that you want to have library publish.

Go to your Google Drive account, right click on the resource you want to publish to library, and select "Share".

There, enter the email of the SA we created earlier to give it access to the resource.

Before trying to enable APIs using gcloud, make sure that the active project for your gcloud session is our nytimes-library-test-2 project. Run:

gcloud config list project

# Output

[core]
project = nytimes-library-test-2

Your active configuration is: [default]

If the value of project is not the project we're using for library, go back to Create a new Google Cloud project and follow the steps there to set an active project.

Next, we want to enable two APIs:

  • the Google Drive API
  • the Cloud Datastore API

The Cloud Datastore API is usually enabled by default. To check what APIs are already enabled for your project, run:

gcloud services list --enabled

# Output
NAME                              TITLE
bigquery.googleapis.com           BigQuery API
bigquerystorage.googleapis.com    BigQuery Storage API
cloudapis.googleapis.com          Google Cloud APIs
clouddebugger.googleapis.com      Stackdriver Debugger API
cloudtrace.googleapis.com         Stackdriver Trace API
datastore.googleapis.com          Cloud Datastore API
logging.googleapis.com            Stackdriver Logging API
monitoring.googleapis.com         Stackdriver Monitoring API
servicemanagement.googleapis.com  Service Management API
serviceusage.googleapis.com       Service Usage API
sql-component.googleapis.com      Cloud SQL
storage-api.googleapis.com        Google Cloud Storage JSON API
storage-component.googleapis.com  Cloud Storage

To see a list of all available APIs for a project, run:

gcloud services list --available

We're interested in drive.googleapis.com.

To enable the Google Drive API for our project, run:

gcloud services enable drive.googleapis.com

# Output

Operation "operations/acf.dafa9f39-5c9f-4ecc-81d4-6f62ec825b69" finished successfully.

From this point on, we'll need files from the NYTimes/Library repository to configure our GCP project.

Clone the repository by running:

git clone https://github.com/nytimes/library

# (optional) Switch to the latest release
# In this case, v1.2.0
cd library
git checkout v1.2.0

Library is written to use the Cloud Datastore API. Looking at the indices that we initialise using config/index.yaml, we're probably using it to store user data and view history.

NOTE: It's probably possible to run a local instance of the datastore using the datastore emulator, but I haven't figured it out. What we'd eventually want to do is to figure out how to decouple Library from GCP, which requires us to figure out how to modify the code to use another datastore.

Before running gcloud datastore ... to configure our datastore, we first have to go to the Google Cloud console to initialise it: https://console.cloud.google.com/

In the Google Cloud console, go to Datastore -> Entitites. If this is the first time you're setting up the datastore, the console asks you to choose between Firestore Native mode or Firestore Datastore mode (GCP appears to be deprecating the Cloud Datastore). Right now, only Datastore mode works with Library, so pick that.

./z-select-cloud-datastore.jpg

If you're not asked to select a mode to use Firestore in, then check if your project has been set to use Datastore mode. If your project has already been set to use Firestore Native mode, then you cannot use this project with Library. You'll have to create a new project.

Once you've initialised Cloud Firestore in Datastore mode, go to your terminal and navigate to your clone of NYTimes/library. Run:

gcloud datastore indexes create config/index.yaml

We need to set up OAuth so that our users can sign into their organisation's Google accounts to interact with both the Library web interface and the backing Google Drive resources.

NOTE: I haven't figured out how to restrict access to the Library web interface using OAuth. In fact, there appears to be a bug that causes the ui to constantly display user@your_organization.com as the current user, whether you are signed in or not.

But first, we need to set up the OAuth consent screen for our project.

In the Google Cloud console, go to APIS & Services -> OAuth consent screen.

If this is the first time you're setting this up for your project, you'll be asked select a User Type that you expect for your project.

Since we intend to use our Library app for internal documents only (so far), we can safely select Internal here.

Select Create, and enter the following in the next page:

Application name:

Pick a name

Authorized domains:

This should be the domain your app will be published on. For this guide, we'll assume that our app is deployed on the App Engine default address, which is <project_name>.appspot.com.

For this project, we should then enter here: nytimes-library-test-2.appspot.com.

Application Homepage link:

Same as "Authorized domains", but prefix with https://: https://nytimes-library-test-2.appspot.com

Application Privacy Policy link:

Same as "Authorized domains", but prefix with https://: https://nytimes-library-test-2.appspot.com We should change this for a production-ready application.

Once we've set up the OAuth consent screen, we can set up OAuth credentials.

  1. Go to APIs & Services -> Credentials and select the Create credentials drop down menu near the top left of the screen.

  2. From the drop down menu, select OAuth client ID.

  3. Then, select Web application from the Application type list.

  4. (Optional) Change the Name of your application.

  5. Set Authorized JavaScript origins to https://nytimes-library-test-2.appspot.com

  6. Set Authorized redirect URIs to http://nytimes-library-test-2.appspot.com/auth/redirect

    NOTE: Setting this to https://... produces a redirect URI error, because for some reason either Library or GCP reads a redirect from http://<project_name>.... Haven't been able to figure this out.

  7. Select Create.

Once done, you'll see a OAuth client window that shows your client ID and client secret.

Store these credentials somewhere safe. We'll need to use them in a bit.

Clone the library repository if you haven't already:

git clone https://github.com/nytimes/library.git

Once done, open the cloned directory and open app.yaml. You should see just this (as of 4483dec on 20 Mar 2019):

runtime: nodejs10
instance_class: F4

We need to add an env_variables field to app.yaml. Add the following to app.yaml files and replace values where indicated:

runtime: nodejs10
instance_class: F4
env_variables:
  NODE_ENV: 'development'
  GOOGLE_CLIENT_ID: <'OAuth client ID'>
  GOOGLE_CLIENT_SECRET: <'OAuth client secret'>
  GCP_PROJECT_ID: 'nytimes-library-test-2'
  APPROVED_DOMAINS: 'appspot.com, nytimes-library-test-2.appspot.com'
  SESSION_SECRET: 'supersupersupersecret'
  DRIVE_TYPE: <'folder' | 'team'>
  DRIVE_ID: <'id of team drive or folder'>
  // see Configuring Library to deploy on App Engine
  GOOGLE_APPLICATION_JSON: <copy contents of SA JSON key>
  GOOGLE_APPLICATION_CREDENTIALS: 'parse_json'

Once done, open your terminal and run in the project root directory:

gcloud app deploy

If deploying to the Google App Engine, running your application off *.appspot.com itself gives you an SSL cert for free, so be sure to have all your OAuth and App Engine links direct themselves to https URLS.

Also, a seemingly persistent OAuth behaviour/bug (now observed separately in GCP and Azure) is that redirects somehow prefer to send a GET request containing a ?redirect=http... query rather than enforcing HTTPS, which causes a URI redirect error. Which is weird, because why, if you're expecting HTTPS anyways, would you use HTTP? Don't know if this is application level i.e. boilerplate code that hasn't been written over properly, or if this is a platform thing.

To view the logs for the active app engine instance:

gcloud lapp logs tail -s default

When deploying to App Engine, set your environment variables in the app.yaml file instead of in .env.

The app.yaml file also requires the value for the GOOGLE_APPLICATION_JSON field to be formatted correctly. It should look like this:

GOOGLE_APPLICATION_JSON: |
{
"type":"service_account",
"project_id":"nytimes-library-try2",
"private_key_id":"9460682c3af49b3a43b2798044b21a8ccdcbb27b",
"private_key":"-----BEGINPRIVATEKEY-----\n....\n",
"client_email":"my-library-service-account@nytimes-library-try2.iam.gserviceaccount.com",
"client_id":"104407737591677470914",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/my-library-service-account%40nytimes-library-try2.iam.gserviceaccount.com"
}

Give it a few minutes. While Google Drive signals that the document has been saved, changes are not propogated and pulled through the Google Drive API until a few minutes later.

If the issue persists, check that:

  • The parent folder (and the document itself) is shared with the service account you've set up for the Library instance.
  • Check that you have the correct environment variables set for your deployment platform.
  • Check the logs. If deploying on App Engine, run gcloud app logs tail -s default to see the latest logs. If you find errors that indicate a JSON parse error, then you might have mis-configured one of the set environment variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment