thanooj kalathuru thanoojgithub

🏠

Working from home

a member of Java/BigData practice team at IMPETUS bangalore

thanoojgithub / Hive_SH.hql

Created January 27, 2016 16:00

How to run Hive queries using shell script .sh file

	## select.hql

	create schema ${hiveconf:DB_NAME};
	show schemas;
	use ${hiveconf:DB_NAME};
	CREATE TABLE ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME} (ID INT, name STRING, dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
	show tables;
	LOAD DATA LOCAL INPATH '/home/ubuntu/input/raw.txt' OVERWRITE INTO TABLE ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME};
	select * from ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME};

thanoojgithub / GroupBy_in_Hive.hql

Last active May 29, 2018 08:02

GroupBy in Hive

	hive> select * from raw_bucketed_users;
	OK
	raw_bucketed_users.id raw_bucketed_users.name raw_bucketed_users.dt
	1 sriram 12-10-2015
	2 seeta 12-11-2015
	3 sriram 12-10-2015
	4 seeta 12-11-2015
	5 sriram 12-10-2015
	6 seeta 12-11-2015
	7 sriram 12-10-2015

thanoojgithub / Joins_in_Hive.hql

Last active May 29, 2018 08:02

Joins in Hive

	hive> select * from raw1 r1 join raw2 r2 on r1.id=r2.id where r1.name like "s%" and r2.dt like "%2015";
	Query ID = ubuntu_20160128153623_8773b4d1-5b8a-4f41-b9bf-8dc7ed17fc77
	Total jobs = 1
	2016-01-28 15:36:25,997 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
	Execution log at: /tmp/ubuntu/ubuntu_20160128153623_8773b4d1-5b8a-4f41-b9bf-8dc7ed17fc77.log
	2016-01-28 15:36:26 Starting to launch local task to process map join; maximum memory = 518979584
	2016-01-28 15:36:27 Dump the side-table for tag: 0 with group count: 2 into file: file:/tmp/hive/a6180571-652c-4ef6-8150-63c6872bb912/hive_2016-01-28_15-36-23_009_1042588085665505632-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile150--.hashtable
	2016-01-28 15:36:27 Uploaded 1 File to: file:/tmp/hive/a6180571-652c-4ef6-8150-63c6872bb912/hive_2016-01-28_15-36-23_009_1042588085665505632-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile150--.hashtable (336 bytes)
	2016-01-28 15:36:2

thanoojgithub / ORDER BY and SORT BY

Last active May 29, 2018 08:02

ORDER BY and SORT BY in Hive

	ORDER BY and SORT BY
	It performs a total ordering of the query result set.
	This means that all the data is passed through a single reducer, which may take an unacceptably long time to execute for larger data sets.
	Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted. Better performance is traded for total ordering.
	In both cases, the syntax differs only by the use of the ORDER or SORT keyword. You can specify any columns you wish and specify whether or not the columns are ascending using the ASC keyword (the default) or descending using the DESC keyword.

	Here is an example using ORDER BY:
	hive> SELECT s.ymd, s.symbol, s.price_close
	>FROM stocks s
	>ORDER BY s.ymd ASC, s.symbol DESC;

thanoojgithub / Casting_in_Hive

Created January 28, 2016 13:25

Casting in Hive

	hive> set hive.cli.print.header=true;
	hive> select * from raw;
	OK
	raw.id raw.name raw.dt
	1 sriram 2015-10-12
	2 seetamma 2015-09-12
	3 lakshmana 2015-11-12
	4 bharatha 2015-12-12
	5 Shatrughna 2015-12-12
	6 hanuma 2015-11-12

thanoojgithub / Views in Hive

Last active May 29, 2018 08:02

Views in Hive

	hive> CREATE VIEW raw_view AS SELECT r1.id, r2.name,r1.dt FROM raw1 r1 JOIN raw2 r2 ON (r1.id=r2.id) WHERE r1.name like "s%";
	OK
	id name dt
	Time taken: 0.251 seconds

	hive> select * from raw_view;
	Query ID = ubuntu_20160128185605_ee25eca7-b580-42a5-89af-18d412352f08
	Total jobs = 1
	2016-01-28 18:56:09,141 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
	Execution log at: /tmp/ubuntu/ubuntu_20160128185605_ee25eca7-b580-42a5-89af-18d412352f08.log

thanoojgithub / EXPORT query result into a directory in Hive

Created January 28, 2016 14:39

EXPORT query result into a directory in Hive

	hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/ubuntu/input/raw' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from raw;
	Query ID = ubuntu_20160128195617_0b1801f7-00eb-4b61-b8ac-4e190a6a00a5
	Total jobs = 1
	Launching Job 1 out of 1
	Number of reduce tasks is set to 0 since there's no reduce operator
	Starting Job = job_1453963805097_0040, Tracking URL = http://ubuntu:8088/proxy/application_1453963805097_0040/
	Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1453963805097_0040
	Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
	2016-01-28 19:56:24,017 Stage-1 map = 0%, reduce = 0%
	2016-01-28 19:56:29,416 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.7 sec

thanoojgithub / HBase basics

Last active May 29, 2018 08:00

HBase basics

	ubuntu@ubuntu:~$ start-all.sh
	This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
	Starting namenodes on [localhost]
	localhost: starting namenode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-namenode-ubuntu.out
	localhost: starting datanode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-datanode-ubuntu.out
	Starting secondary namenodes [0.0.0.0]
	0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-secondarynamenode-ubuntu.out
	starting yarn daemons
	starting resourcemanager, logging to /usr/local/hadoop2/logs/yarn-ubuntu-resourcemanager-ubuntu.out
	localhost: starting nodemanager, logging to /usr/local/hadoop2/logs/yarn-ubuntu-nodemanager-ubuntu.out

thanoojgithub / Scala setup

Last active March 7, 2016 07:37

Scala setup

	Scala Envi:


	1. Install scala
	1.1 Download scala (latest version) form http://www.scala-lang.org/
	1.2 Uncompress it
	1.3 Add the scala bin folder to path variable

	2. Eclipse mars or luna

thanoojgithub / Hive Basics.sql

Last active May 29, 2018 08:01

Hive Basics

	% tar xzf apache-hive-x.y.z-bin.tar.gz
	% export HIVE_HOME=~/home/ubuntu/hive-1.2.1
	% export PATH=$PATH:$HIVE_HOME/bin


	hadoop: /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop.cmd
	ubuntu@ubuntu:~$ whereis hive
	hive: /home/ubuntu/softwares/apache-hive-2.1.0-SNAPSHOT-bin/bin/hive /home/ubuntu/softwares/apache-hive-2.1.0-SNAPSHOT-bin/bin/hive.cmd