This gist is just me trying to make sense of installing and running https://github.com/nytimes/library and figuring out where I've gone wrong.
So far, I've found that at v1.2.0
,
the library:
- [edit 3 jan: turns out partially my fault --
I didn't remove the comments in
.env
; and it turns out that the deployed app on App Engine doesn't read from.env
, or at least I'm not seeing that it does, but instead reads env variables fromapp.yml
. So using shared folders is fine. ]Doesn't properly accept shared folders as a configuration option (see config below). - [edit 3 jan: can't figure out how to restrict
permissions i.e. require users to log in via oauth
in order to see the docs (or get oauth to work as
expected at all. now just displays logged in user as
user@your_organization.com
.] Still runs into configuration issues with using shared folders. from the environment variables in our.env
file, and - Docs are not clear enough about requiring datastore mode when setting up the cloud datastore/firestore API. (To be fair, this is another instance of Google/GCP changing their APIs and breaking their users' stuff).
Contents
- Preflight
- Set up Google Cloud
- Sign in with the gcloud console
- Create a new Google Cloud project
- Create a service account
- Assign role to service account
- Create a service account key
- Share team drive or folder with service account
- Enable APIs
- Clone the repository
- Set up Google Cloud Datastore
- Set up OAuth consent screen
- Set up OAuth credentials
- Deploy App Engine application
- SSL
- Useful
gcloud
operations - Troubleshooting
- Get a Google Cloud account. G suite account holders can activate a free 365-day trial with ~US$400 credits.
- Install the
gcloud
command-line tool (CLI). (Google Cloud SDK) - (Optional) Install Node
10.x
if you intend to try and run Library locally (not recommended).
- Sign in with the gcloud console
- Create a new Google Cloud project
- Create a service account
- Assign role to service account
- Create a service account key
- Share team drive or folder with service account
- Enable APIs
- Clone the repository
- Set up Google Cloud Datastore
- Set up OAuth consent screen
- Set up OAuth credentials
You can interact with the Google Cloud Platform (GCP)
using the web ui at https://console.cloud.google.com,
or by using the gcloud
CLI.
Because I prefer using the CLI, this guide will
walk you through setting up your GCP project
using gcloud
only.
For a guide to how to do this using the web ui, look at the NYTimes's own guide here: https://nyt-library-demo.herokuapp.com/
To sign in to GCP using gcloud
for
the first time, run:
gcloud auth login # opens an OAuth page asking you for your GCP credentials
Otherwise, you can find a list of saved credentials by running:
gcloud auth list
To select a set of credentials from this list, run:
gcloud config set account <[email protected]>
Create a new GCP project, and give it a name.
For this guide, we'll call our
GCP project nytimes-library-test-2
(2
because project IDs must be unique,
and I've messed up many many times while writing this).
gcloud projects create nytimes-library-test-2 # Output Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/nytimes-library-test-2]. Waiting for [operations/cp.6973727416446203228] to finish...done. Enabling service [cloudapis.googleapis.com] on project [nytimes-library-test-2]... Operation "operations/acf.c5bd12b3-f1c0-4826-871f-5bd77c209984" finished successfully.
To list all projects available to the GCP user:
gcloud projects list # Output PROJECT_ID NAME PROJECT_NUMBER ... nytimes-library-test-2 nytimes-library-test-2 517690955308
Set the project as the default project for
the user signed into gcloud
by running:
gcloud config set project nytimes-library-test-2 # Output Updated property [core/project].
To check active project for your GCP user:
gcloud config list project # Output [core] project = nytimes-library-test-2 Your active configuration is: [default]
Create a service account (SA) to interact with your GCP project.
Create an SA by running:
gcloud iam service-accounts create nytimes-library-user # Output Created service account [nytimes-library-user].
List SAs:
gcloud iam service-accounts list # Output NAME EMAIL DISABLED nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com False
We'll need to assign a role to the SA. To get a list of all possible roles, we can run:
gcloud iam roles list
This gives us a massive list of all possible roles that we can assign to a user/service account in GCP. What we're interested in the "datastore user" role.
gcloud iam roles list | grep datastore # Output name: roles/datastore.importExportAdmin name: roles/datastore.indexAdmin name: roles/datastore.owner name: roles/datastore.user name: roles/datastore.viewer
To get a granular view of what permissions a role provides, run:
gcloud iam roles describe roles/datastore.user # Output description: Provides read/write access to data in a Cloud Datastore database. Intended for application developers and service accounts. etag: AA== includedPermissions: - appengine.applications.get - datastore.databases.get - datastore.entities.allocateIds - datastore.entities.create - datastore.entities.delete - datastore.entities.get - datastore.entities.list - datastore.entities.update - datastore.indexes.list - datastore.namespaces.get - datastore.namespaces.list - datastore.statistics.get - datastore.statistics.list - resourcemanager.projects.get - resourcemanager.projects.list name: roles/datastore.user stage: GA title: Cloud Datastore User
To grant a role to a service account:
gcloud projects add-iam-policy-binding nytimes-library-test-2 --member serviceAccount:nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com --role roles/datastore.user # Output Updated IAM policy for project [nytimes-library-test-2]. bindings: - members: - serviceAccount:nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com role: roles/datastore.user - members: - user:<gsuite_user_name> role: roles/owner etag: <etag> version: 1
To allow our library
application
to use this service account, we have
to create a new key:
gcloud iam service-accounts keys create key.json --iam-account nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com # Output created key [46849c420c8295df9d73764739e137784ae0813e] of type [json] as [key.json] for [nytimes-library-user@nytimes-library-test-2.iam.gserviceaccount.com]
This creates a new key and saves a JSON key file.
Copy the contents of this JSON key file,
and enter it as the value of
GOOGLE_APPLICATION_JSON
in the
.env
file of the library
repository.
NOTE: When pasting the contents of the
JSON key file, make sure you remove all
newlines so that the contents of the key file
fit in a single line. Otherwise, the application
cannot parse the JSON key and will throw an
'Unexpected end of JSON input'
error.
Next, share the team drive or Google Drive folder
that you want to have library
publish.
Go to your Google Drive account,
right click on the resource you want
to publish to library
, and select
"Share".
There, enter the email of the SA we created earlier to give it access to the resource.
Before trying to enable APIs using gcloud
,
make sure that the active project
for your gcloud
session is our
nytimes-library-test-2
project. Run:
gcloud config list project # Output [core] project = nytimes-library-test-2 Your active configuration is: [default]
If the value of project
is not
the project we're using for library
,
go back to
Create a new Google Cloud project
and follow the steps there to set an active project.
Next, we want to enable two APIs:
- the Google Drive API
- the Cloud Datastore API
The Cloud Datastore API is usually enabled by default. To check what APIs are already enabled for your project, run:
gcloud services list --enabled # Output NAME TITLE bigquery.googleapis.com BigQuery API bigquerystorage.googleapis.com BigQuery Storage API cloudapis.googleapis.com Google Cloud APIs clouddebugger.googleapis.com Stackdriver Debugger API cloudtrace.googleapis.com Stackdriver Trace API datastore.googleapis.com Cloud Datastore API logging.googleapis.com Stackdriver Logging API monitoring.googleapis.com Stackdriver Monitoring API servicemanagement.googleapis.com Service Management API serviceusage.googleapis.com Service Usage API sql-component.googleapis.com Cloud SQL storage-api.googleapis.com Google Cloud Storage JSON API storage-component.googleapis.com Cloud Storage
To see a list of all available APIs for a project, run:
gcloud services list --available
We're interested in drive.googleapis.com
.
To enable the Google Drive API for our project, run:
gcloud services enable drive.googleapis.com # Output Operation "operations/acf.dafa9f39-5c9f-4ecc-81d4-6f62ec825b69" finished successfully.
From this point on, we'll need files from the NYTimes/Library repository to configure our GCP project.
Clone the repository by running:
git clone https://github.com/nytimes/library
# (optional) Switch to the latest release
# In this case, v1.2.0
cd library
git checkout v1.2.0
Library is written to use the
Cloud Datastore API. Looking at the
indices that we initialise
using config/index.yaml
,
we're probably using it to
store user data and view history.
NOTE: It's probably possible to run a local instance of the datastore using the datastore emulator, but I haven't figured it out. What we'd eventually want to do is to figure out how to decouple Library from GCP, which requires us to figure out how to modify the code to use another datastore.
Before running gcloud datastore ...
to configure our datastore,
we first have to go to the
Google Cloud console
to initialise it:
https://console.cloud.google.com/
In the Google Cloud console, go to Datastore -> Entitites. If this is the first time you're setting up the datastore, the console asks you to choose between Firestore Native mode or Firestore Datastore mode (GCP appears to be deprecating the Cloud Datastore). Right now, only Datastore mode works with Library, so pick that.
If you're not asked to select a mode to use Firestore in, then check if your project has been set to use Datastore mode. If your project has already been set to use Firestore Native mode, then you cannot use this project with Library. You'll have to create a new project.
Once you've initialised Cloud Firestore
in Datastore mode, go to your terminal
and navigate to your clone of NYTimes/library
.
Run:
gcloud datastore indexes create config/index.yaml
We need to set up OAuth so that our users can sign into their organisation's Google accounts to interact with both the Library web interface and the backing Google Drive resources.
NOTE: I haven't figured out how
to restrict access to the Library
web interface using OAuth. In fact, there
appears to be a bug that causes the ui
to constantly display user@your_organization.com
as the current user, whether you are signed
in or not.
But first, we need to set up the OAuth consent screen for our project.
In the Google Cloud console, go to APIS & Services -> OAuth consent screen.
If this is the first time you're setting this up for your project, you'll be asked select a User Type that you expect for your project.
Since we intend to use our Library app for internal documents only (so far), we can safely select Internal here.
Select Create, and enter the following in the next page:
Application name: | Pick a name |
---|---|
Authorized domains: | This should be the domain your app
will be published on. For this guide,
we'll assume that our app is deployed
on the App Engine default address,
which is For this project, we should then enter here:
|
Application Homepage link: | Same as "Authorized domains",
but prefix with |
Application Privacy Policy link: | Same as "Authorized domains",
but prefix with |
Once we've set up the OAuth consent screen, we can set up OAuth credentials.
Go to APIs & Services -> Credentials and select the Create credentials drop down menu near the top left of the screen.
From the drop down menu, select OAuth client ID.
Then, select Web application from the Application type list.
(Optional) Change the Name of your application.
Set Authorized JavaScript origins to
https://nytimes-library-test-2.appspot.com
Set Authorized redirect URIs to
http://nytimes-library-test-2.appspot.com/auth/redirect
NOTE: Setting this to
https://...
produces aredirect URI error
, because for some reason either Library or GCP reads a redirect fromhttp://<project_name>...
. Haven't been able to figure this out.Select Create.
Once done, you'll see a OAuth client window that shows your client ID and client secret.
Store these credentials somewhere safe. We'll need to use them in a bit.
Clone the library
repository if you haven't
already:
git clone https://github.com/nytimes/library.git
Once done, open the cloned directory
and open app.yaml
. You should see
just this (as of 4483dec on 20 Mar 2019):
runtime: nodejs10
instance_class: F4
We need to add an env_variables
field
to app.yaml
. Add the following to
app.yaml
files and replace values
where indicated:
runtime: nodejs10
instance_class: F4
env_variables:
NODE_ENV: 'development'
GOOGLE_CLIENT_ID: <'OAuth client ID'>
GOOGLE_CLIENT_SECRET: <'OAuth client secret'>
GCP_PROJECT_ID: 'nytimes-library-test-2'
APPROVED_DOMAINS: 'appspot.com, nytimes-library-test-2.appspot.com'
SESSION_SECRET: 'supersupersupersecret'
DRIVE_TYPE: <'folder' | 'team'>
DRIVE_ID: <'id of team drive or folder'>
// see Configuring Library to deploy on App Engine
GOOGLE_APPLICATION_JSON: <copy contents of SA JSON key>
GOOGLE_APPLICATION_CREDENTIALS: 'parse_json'
Once done, open your terminal and run in the project root directory:
gcloud app deploy
If deploying to the Google App Engine, running your application off *.appspot.com itself gives you an SSL cert for free, so be sure to have all your OAuth and App Engine links direct themselves to https URLS.
Also, a seemingly persistent OAuth behaviour/bug
(now observed separately in GCP and Azure) is
that redirects somehow prefer to send a
GET request containing a ?redirect=http...
query
rather than enforcing HTTPS, which causes
a URI redirect error. Which is weird, because why,
if you're expecting HTTPS anyways, would you
use HTTP? Don't know if this is application
level i.e. boilerplate code that hasn't been
written over properly, or if this is a platform thing.
To view the logs for the active app engine instance:
gcloud lapp logs tail -s default
When deploying to App Engine,
set your environment variables in
the app.yaml
file instead of in
.env
.
The app.yaml
file also requires
the value for the GOOGLE_APPLICATION_JSON
field to be formatted correctly.
It should look like this:
GOOGLE_APPLICATION_JSON: |
{
"type":"service_account",
"project_id":"nytimes-library-try2",
"private_key_id":"9460682c3af49b3a43b2798044b21a8ccdcbb27b",
"private_key":"-----BEGINPRIVATEKEY-----\n....\n",
"client_email":"my-library-service-account@nytimes-library-try2.iam.gserviceaccount.com",
"client_id":"104407737591677470914",
"auth_uri":"https://accounts.google.com/o/oauth2/auth",
"token_uri":"https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/my-library-service-account%40nytimes-library-try2.iam.gserviceaccount.com"
}
Give it a few minutes. While Google Drive signals that the document has been saved, changes are not propogated and pulled through the Google Drive API until a few minutes later.
If the issue persists, check that:
- The parent folder (and the document itself) is shared with the service account you've set up for the Library instance.
- Check that you have the correct environment variables set for your deployment platform.
- Check the logs. If deploying on App Engine,
run
gcloud app logs tail -s default
to see the latest logs. If you find errors that indicate a JSON parse error, then you might have mis-configured one of the set environment variables.