In the distributed world, while writing map-reduce codes, there are many situations where the input data seems to be non partitiionable. In which case all the data though would be picked up by multiple mappers, it gets mapped to the same key. Once all the mappers are run and if we end up in having one key with huge list of values, then we would be burdening the reducers. When I say burdening reducers it invovles burdening all the steps after mapper till the data enters the reducer nodes.
Lets take an example to deal more on this situation and see how this can be resolved.
Challenge: Finding of average of natural numbers.
Bird-view: