-
-
Save chenghan/7456549 to your computer and use it in GitHub Desktop.
import sys | |
salesTotal = 0 | |
oldKey = None | |
for line in sys.stdin: | |
data = line.strip().split("\t") | |
if len(data) != 2: | |
# Something has gone wrong. Skip this line. | |
continue | |
thisKey, thisSale = data | |
if oldKey and oldKey != thisKey: | |
print oldKey, "\t", salesTotal | |
oldKey = thisKey | |
salesTotal = 0 | |
oldKey = thisKey | |
salesTotal += float(thisSale) | |
if oldKey != None: | |
print oldKey, "\t", salesTotal |
import sys
salesTotal = 0.0
oldKey = None
dummy_Data=["Miami 12.34","Miami 99.07","Miami 55.07","NYC 88.97","NYC 33.56"]
for line in dummy_Data:
data = line.strip().split(" ")
if len(data) != 2:
# Something has gone wrong. Skip this line.
continue
thisKey, thisSale = data
if oldKey and oldKey != thisKey:
print oldKey, ":", salesTotal
oldKey = thisKey
salesTotal = 0
oldKey = thisKey
salesTotal += float(thisSale)
if oldKey != None:
print oldKey, ":", salesTotal
reducer.py https://gist.github.com/sanoops/9471084
Would it be cleaner to store this info to dictionary? It would make it so you don't have to keep track of oldKey vs thisKey, also it will work if the sort is imperfect, but I'm not sure if there's any map reduce specific thing it would screw up
import sys
salesTotals = {}
for line in sys.stdin:
data = line.strip().split("\t")
if len(data) != 2:
# Something has gone wrong. Skip this line.
continue
store, sale = data
salesTotals.setdefault(store, 0)
salesTotals[store] += float(sale)
for store in salesTotals:
print "{0}\t{1}".format(store, salesTotals[store])
Line 13:
if oldKey and oldKey != thisKey:
Would be better written with an explicit check against None:
if oldKey is not None and oldKey != thisKey:
As it is, this code malfunctions if given input where the key is the empty string, e.g.:
NY\t100
\t200
SF\t300
Will yield an output of:
SF:600
Can anyone explain what the below line does, I understand one part and i don't get the first condition.
"if oldkey and oldkey!=None"
I don't get what the first condition "if oldkey and" does...Thanks in Advance
Senthil
Hi @senthil1988
The sentence "if oldkey..." what tests is that the variable oldkey is assigned to some value and its type is different than NoneType.
It would be clear and easier to write "if oldKey is not None..." instead of "if oldkey..."
Regards!
Just tested the code locally. To me, line 15 is not necessary, is it? In this example (and supposedly in general, with the keys sorted), when a new city gets processed, the assignment oldKey=thisKey will be done in line 18 anyway; setting totalSales=0 is necessary, though.
Happy coding!
I m having some Confusion around here the Reducer script is reading from sys.stdin so how does the mapper passes on to the file to Read, Mapper code is only printing the line, its not storing the lines to pass onto the Reducer, Reducer is reading from stdin so it has read to from the keyboard and not lines passed by mapper
Hi, I'm very new to this and I was wondering why we need these lines of code at lines 21 and 22.
if oldKey != None: print oldKey, "\t", salesTotal
Hi, I'm very new to this and I was wondering why we need these lines of code at lines 21 and 22.
if oldKey != None: print oldKey, "\t", salesTotal
This is for printing the last line
" oldkey!=None ",means its testing if the oldkey has value or not but since the code has come out of for loop oldkey will have value.
Now if you ask "but we don't need if condition for printing last line".This is where its really interesting, if the (if len(data) != 2) turns out true or moreover if the input data is incorrect then the program wont simply print .
i have implanted the code and as an output i find this
newyork 28
amazon 22
washdc 1
i wander why the tab doesnt work and the number are not in the same line thanks
Can I use groupby function in Pandas? That was my first thought
I think this is probably how the groupby function in pandas works.
The ";' isn't needed on line 13
"sys" needs to be imported at the beginning