Skip to content

Instantly share code, notes, and snippets.

@whitmo
Last active August 29, 2015 14:08
Show Gist options
  • Save whitmo/e1bf2b38949dcff55dd4 to your computer and use it in GitHub Desktop.
Save whitmo/e1bf2b38949dcff55dd4 to your computer and use it in GitHub Desktop.
suggested edits for tez bundle README

hdp-hadoop-tez

Apache Tez: a Framework for YARN-based Data Processing Applications In Hadoop.

Apache™ Tez is an extensible framework for high performance batch and interactive data processing applications in Hadoop for terabyte to petabyte scale datasets. It allows projects in the Hadoop ecosystem (including Apache Hive, Apache Pig, various 3rd-party vendor software) to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale.

Deploy

from bundle home:
juju quickstart bundles.yaml

Scale Out

In order to increase the amount of slaves, you must add units, to add one unit:

juju add-unit compute-node

Or you can add multiple units at once:

juju add-unit -n4 compute-node

Verifying Deploy, Tez (on YARN), and remote HDFS cluster

Validate Tez Application Presence

 juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0

successful result:

 hdfs users   ...  /apps/tez/conf
 hdfs users   ...  /apps/tez/lib
 hdfs users   ...  /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar
 hdfs users   ...  /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar

HDFS validation from Tez Client

  1. Remote HDFS Cluster health

    juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0

** validate the returned information **

  1. Validate a successful create directory on hdfs cluster

    juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0

  2. Copy a test data file to hdfs cluster

    juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0

  3. Run Tez world-count example -

    juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0

  4. View the result save on hdfs cluster:

    juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0

Contact Information

Upstream Tez Project Info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment