Skip to content

Instantly share code, notes, and snippets.

@ntakouris
Created July 14, 2020 15:37
Show Gist options
  • Select an option

  • Save ntakouris/6b650cba8ba27e144666afcb2761c99d to your computer and use it in GitHub Desktop.

Select an option

Save ntakouris/6b650cba8ba27e144666afcb2761c99d to your computer and use it in GitHub Desktop.
DATAFLOW_BEAM_PIPELINE_ARGS = [
'--project=' + '<your project id>',
'--runner=DataflowRunner',
'--temp_location=' + 'gs://<bucket name>/tmp',
'--staging_location=' + 'gs://<bucket name>/staging',
'--region=' + 'us-central1', # the place where our buckets are
# '--disk_size_gb=50', # no fine tuning needed
# If you are blocked by IPr Address quota, using a bigger machine_type will
# reduce the number of needed IPs.
# '--machine_type=n1-standard-8',
'--maxNumWorkers=' + '1', # for demonstration
'--experiments=shuffle_mode=service', # use the better running service that's in beta
'--job-name=' + '<your job name or auto generated>',
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment