Skip to content

Instantly share code, notes, and snippets.

View elavenrac's full-sized avatar

Tony Carnevale elavenrac

View GitHub Profile
@elavenrac
elavenrac / Offline-Dataflow.md
Last active June 29, 2022 07:02
GCP Dataflow processing with no external IPs

GCP Dataflow Pipelines

This gist is a detailed walkthrough on how to deploy python Dataflow pipelines in GCP to run without external IPs. Full code samples are available below.

This walkthrough assumes you have a already authenticated with gcloud login commands and have the appropriate IAM privileges to execute these operations.

Step 1 - Gather application dependencies

Since we are planning to use no external IPs on our dataflow worker nodes, we must package up all our application dependencies for an offline deployment. I highly recommend using a virtual environment as your global dependencies will be much more than your single application will require.

Dump your application dependencies into a single file.