- HDP-2.2 installed by Ambari
- Install HDFS Client
- Patience
ssh
into the machine and then run this on the command line:
$ sudo su hdfs
This allows you to impersonate the hdfs
user, the hdfs
user comes with the HDP installation. The benefits of this is gives you the power to run the terasort
benchmarking workflow without much interruption.
Look for the hadoop-mapreduce-examples.jar
which would reside in your local machine. It's usually found under the /usr/hdp
folder. You can run the following command to find it:
find /usr/hdp -name hadoop-*examples*.jar
If found, go to its directory and run:
hadoop jar hadoop-mapreduce-examples.jar
And you'll see the teragen
, terasort
, and teravalidate
commands.
Look for the hadoop-mapreduce-client-jobclient-tests.jar
which would reside in your local machine. It's usually found under the /usr/hdp
folder. You can run the following command to find it:
If found, go to its directory and run:
hadoop jar hadoop-mapreduce-client-jobclient-tests.jar
And you'll see the TestDFSIO
commands.
-
teragen
creates sample data and places it in an output directory forterasort
.terasort
runs through the directory and creates the reduce output on an output directory.teravalidate
ensures thatterasort
reduced and mapped correctly. -
TestDFSIO
is a test for IO throughput of the cluster.-write
creates sample files,-read
reads them, and-clean
deletes the test outputs.
Create a /benchmarks
directory in the HDFS:
hadoop fs -mkdir /benchmarks
This is where the TestDFSIO
, teragen
, terasort
and teravalidate
commands would place their output and get their inputs.
Hi Folks,
when i am running Terasort command it is giving me below error.
Sampling 0 splits of 0
18/01/29 10:57:07 ERROR terasort.TeraSort: / by zero
Any Help on this would be appreciated .