Skip to content

Instantly share code, notes, and snippets.

@JairoDuarte
Created September 29, 2019 18:51
Show Gist options
  • Save JairoDuarte/43745dd230a522c392a78bb6b4dd8187 to your computer and use it in GitHub Desktop.
Save JairoDuarte/43745dd230a522c392a78bb6b4dd8187 to your computer and use it in GitHub Desktop.
import sys
from pyspark import SparkContext
sc = SparkContext()
lines = sc.textFile(sys.argv[1])
word_counts = lines.flatMap(lambda line: line.split(' ')) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda count1, count2: count1 + count2) \
.collect()
for (word, count) in word_counts:
print(word, count)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment