Skip to content

Instantly share code, notes, and snippets.

@ibrezm1
Created April 22, 2022 23:31
Show Gist options
  • Save ibrezm1/8ff4007905605d16bcbab571ca8367ba to your computer and use it in GitHub Desktop.
Save ibrezm1/8ff4007905605d16bcbab571ca8367ba to your computer and use it in GitHub Desktop.
Basic beam read file and operations
# https://beam.apache.org/documentation/transforms/python/aggregation/combinevalues/
#
import apache_beam as beam
import csv
if __name__ == '__main__':
with beam.Pipeline('DirectRunner') as pipeline:
(pipeline
| "Read File" >> beam.io.ReadFromText('test.txt')
| 'Split' >> (
beam.FlatMap(
lambda x: x.split(" "))
| 'PairWithOne' >> beam.Map(lambda x: (x, 1))
| 'GroupAndSum' >> beam.CombinePerKey(sum))
| "Get all" >> beam.Map(lambda x: x[1])
| "Sum" >> beam.CombineGlobally(sum)
| "Print" >> beam.Map(print)
#| "Write to file" >> beam.io.WriteToText('test_data.txt')
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment