Setting up the number of mappers per job

Rationale

The number of mappers per job is a function of the number of blocks across all the files used as input for the mapreduce job. It could be necessary to setup explicitily the number of mappers per job when, for instance, the inputs are just references to files (and the input file containing the references occupies just one HDFS block).

Procedure

Add the following property to the mapred-site.xml configuration file on all the tasktrackers nodes of the Hadoop cluster:

Cesare Rossi crossi202

Setting up the number of mappers per job

Rationale

Procedure