The number of mappers per job is a function of the number of blocks across all the files used as input for the mapreduce job. It could be necessary to setup explicitily the number of mappers per job when, for instance, the inputs are just references to files (and the input file containing the references occupies just one HDFS block).
- Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster: