Skip to content

Instantly share code, notes, and snippets.

@polleyg
polleyg / app.yaml
Last active September 29, 2018 06:27
GAE (flex) config
runtime: custom
api_version: '1.0'
env: flexible
threadsafe: true
automatic_scaling:
min_num_instances: 1
max_num_instances: 2
cpu_utilization:
target_utilization: 0.5
@polleyg
polleyg / TweetPipeline.java
Last active September 30, 2018 12:38
Tweep pipeline Java
/**
* Dataflow streaming pipeline to read tweets from PubSub topic and write the payload to BigQuery
*/
public class TweetPipeline {
private static final String TOPIC = "projects/grey-sort-challenge/topics/twitter";
private static final String BIGQUERY_DESTINATION = "%s:twitter.tweets";
public static void main(String[] args) {
PipelineOptionsFactory.register(DataflowPipelineOptions.class);
DataflowPipelineOptions options = PipelineOptionsFactory
@polleyg
polleyg / cloudbuild.yaml
Created September 30, 2018 12:16
Config for building and deploying this app
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/polleyg/gcp-tweets-streaming-pipeline.git']
- name: gcr.io/cloud-builders/gcloud
args: ['app', 'deploy', '--version=tweets']
dir: 'twitter-to-pubsub'
- name: gcr.io/cloud-builders/gradle
args: ['build', 'run']
@polleyg
polleyg / build_and_deploy.sh
Created September 30, 2018 12:18
Cloud Build command for build and deploy
gcloud builds submit --config=cloudbuild.yaml .
@polleyg
polleyg / build_output.log
Created September 30, 2018 12:22
Log of the build
SVN-18-148:gcp-tweets-streaming-pipeline grahampolley$ gcloud builds submit --config=cloudbuild.yaml .
Creating temporary tarball archive of 15 file(s) totalling 77.5 KiB before compression.
Some files were not included in the source upload.
Check the gcloud log [/Users/grahampolley/.config/gcloud/logs/2018.09.30/22.13.22.932440.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).
Uploading tarball of [.] to [gs://grey-sort-challenge_cloudbuild/source/1538309603.86-62473cec2d1f41a69edff2d7304b48e2.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/grey-sort-challenge/builds/81befc56-b3b6-4377-ae77-a2e7a30301b6].
@polleyg
polleyg / config.yaml
Last active November 7, 2018 23:20
YAML config for reading some BigQuery tables
# [required] The GCP project id (not the number). You can find this in the GCP console.
project: grey-sort-challenge
# [required] The type of runner. One of:
# - dataflow (runs on GCP)
# - local (runs on local machine)
runner: dataflow
# The actual tables to copy. Options:
#
@polleyg
polleyg / DataflowCopyBQ_part_1.java
Last active June 29, 2019 10:41
This code works out the location of the buckets and also the storage class
//imports & doc omitted for brevity. See repo for full source file.
//https://github.com/polleyg/gcp-dataflow-copy-bigquery/blob/master/src/main/java/org/polleyg/BQTableCopyPipeline.java
public class BQTableCopyPipeline {
private static final Logger LOG = LoggerFactory.getLogger(BQTableCopyPipeline.class);
private static final String DEFAULT_NUM_WORKERS = "1";
private static final String DEFAULT_MAX_WORKERS = "3";
private static final String DEFAULT_TYPE_WORKERS = "n1-standard-1";
private static final String DEFAULT_ZONE = "australia-southeast1-a";
private static final String DEFAULT_WRITE_DISPOSITION = "truncate";
private static final String DEFAULT_DETECT_SCHEMA = "true";
@polleyg
polleyg / cloudbuild.yaml
Created October 19, 2018 03:18
Cloud Build file for copy BQ tables using Dataflow
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/polleyg/gcp-dataflow-copy-bigquery.git']
- name: gcr.io/cloud-builders/gradle
args: ['build', 'run']
@polleyg
polleyg / pull_the_trigger.sh
Created October 19, 2018 05:28
Pull the trigger
gcloud builds submit --config=cloudbuild.yaml .
@polleyg
polleyg / SO_54226149.md
Last active January 17, 2019 11:14
SO_54226149.md

image image