17 Sep 2014 - This is a post on my blog.
MapReduce is a powerful algorithm for processing large sets of data in a distributed, parallel manner. It has proven very popular for many data processing tasks, particularly using the open source Hadoop implementation.
The most basic idea powering MapReduce is to break large data sets into smaller chunks, which are then processed separately (in parallel). The results of the chunk processing are then collected.