Skip to content

Instantly share code, notes, and snippets.

@chathuranga94
Created November 15, 2017 17:40
Show Gist options
  • Save chathuranga94/d39051fd8629d7f34831aa2b4528fbc6 to your computer and use it in GitHub Desktop.
Save chathuranga94/d39051fd8629d7f34831aa2b4528fbc6 to your computer and use it in GitHub Desktop.
SPARK AvgById
sc.textFile("/home/bibi/Desktop/distributed.csv") \
.map(lambda line: line.split(",")) \
.map(lambda line: (int(line[0]),float(line[2]))) \
.combineByKey( lambda value: (value, 1),
lambda x, value: (x[0] + value, x[1] + 1),
lambda x, y: (x[0] + y[0], x[1] + y[1])) \
.map(lambda (label, (value_sum, count)): (label, value_sum / count)) \
.collectAsMap()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment