Apache Tez: a Framework for YARN-based Data Processing Applications In Hadoop.
Apache™ Tez is an extensible framework for high performance batch and interactive data processing applications in Hadoop for terabyte to petabyte scale datasets. It allows projects in the Hadoop ecosystem (including Apache Hive, Apache Pig, various 3rd-party vendor software) to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale.
from bundle home:
juju quickstart bundles.yaml
In order to increase the amount of slaves, you must add units, to add one unit:
juju add-unit compute-node
Or you can add multiple units at once:
juju add-unit -n4 compute-node
juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0
hdfs users ... /apps/tez/conf
hdfs users ... /apps/tez/lib
hdfs users ... /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar
hdfs users ... /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar
-
Remote HDFS Cluster health
juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0
** validate the returned information **
-
Validate a successful create directory on hdfs cluster
juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0
-
Copy a test data file to hdfs cluster
juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0
-
Run Tez world-count example -
juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0
-
View the result save on hdfs cluster:
juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0
- Amir Sanjar <[email protected]>