- HDP-2.2 installed by Ambari
- Install HDFS Client
- Patience
ssh into the machine and then run this on the command line:
$ sudo su hdfs
This allows you to impersonate the hdfs user, the hdfs user comes with the HDP installation. The benefits of this is gives you the power to run the terasort benchmarking workflow without much interruption.
Look for the hadoop-mapreduce-examples.jar which would reside in your local machine. It's usually found under the /usr/hdp folder. You can run the following command to find it:
find /usr/hdp -name hadoop-*examples*.jar
If found, go to its directory and run:
hadoop jar hadoop-mapreduce-examples.jar
And you'll see the teragen, terasort, and teravalidate commands.
Look for the hadoop-mapreduce-client-jobclient-tests.jar which would reside in your local machine. It's usually found under the /usr/hdp folder. You can run the following command to find it:
If found, go to its directory and run:
hadoop jar hadoop-mapreduce-client-jobclient-tests.jar
And you'll see the TestDFSIO commands.
-
teragencreates sample data and places it in an output directory forterasort.terasortruns through the directory and creates the reduce output on an output directory.teravalidateensures thatterasortreduced and mapped correctly. -
TestDFSIOis a test for IO throughput of the cluster.-writecreates sample files,-readreads them, and-cleandeletes the test outputs.
Create a /benchmarks directory in the HDFS:
hadoop fs -mkdir /benchmarks
This is where the TestDFSIO, teragen, terasort and teravalidate commands would place their output and get their inputs.
Hi Folks,
when i am running Terasort command it is giving me below error.
Sampling 0 splits of 0
18/01/29 10:57:07 ERROR terasort.TeraSort: / by zero
Any Help on this would be appreciated .