Skip to content

Instantly share code, notes, and snippets.

View thanoojgithub's full-sized avatar
🏠
Working from home

thanooj kalathuru thanoojgithub

🏠
Working from home
View GitHub Profile
@thanoojgithub
thanoojgithub / Hive_SH.hql
Created January 27, 2016 16:00
How to run Hive queries using shell script .sh file
## select.hql
create schema ${hiveconf:DB_NAME};
show schemas;
use ${hiveconf:DB_NAME};
CREATE TABLE ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME} (ID INT, name STRING, dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
show tables;
LOAD DATA LOCAL INPATH '/home/ubuntu/input/raw.txt' OVERWRITE INTO TABLE ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME};
select * from ${hiveconf:DB_NAME}.${hiveconf:TABLE_NAME};
@thanoojgithub
thanoojgithub / GroupBy_in_Hive.hql
Last active May 29, 2018 08:02
GroupBy in Hive
hive> select * from raw_bucketed_users;
OK
raw_bucketed_users.id raw_bucketed_users.name raw_bucketed_users.dt
1 sriram 12-10-2015
2 seeta 12-11-2015
3 sriram 12-10-2015
4 seeta 12-11-2015
5 sriram 12-10-2015
6 seeta 12-11-2015
7 sriram 12-10-2015
@thanoojgithub
thanoojgithub / Joins_in_Hive.hql
Last active May 29, 2018 08:02
Joins in Hive
hive> select * from raw1 r1 join raw2 r2 on r1.id=r2.id where r1.name like "s%" and r2.dt like "%2015";
Query ID = ubuntu_20160128153623_8773b4d1-5b8a-4f41-b9bf-8dc7ed17fc77
Total jobs = 1
2016-01-28 15:36:25,997 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/ubuntu/ubuntu_20160128153623_8773b4d1-5b8a-4f41-b9bf-8dc7ed17fc77.log
2016-01-28 15:36:26 Starting to launch local task to process map join; maximum memory = 518979584
2016-01-28 15:36:27 Dump the side-table for tag: 0 with group count: 2 into file: file:/tmp/hive/a6180571-652c-4ef6-8150-63c6872bb912/hive_2016-01-28_15-36-23_009_1042588085665505632-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile150--.hashtable
2016-01-28 15:36:27 Uploaded 1 File to: file:/tmp/hive/a6180571-652c-4ef6-8150-63c6872bb912/hive_2016-01-28_15-36-23_009_1042588085665505632-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile150--.hashtable (336 bytes)
2016-01-28 15:36:2
@thanoojgithub
thanoojgithub / ORDER BY and SORT BY
Last active May 29, 2018 08:02
ORDER BY and SORT BY in Hive
ORDER BY and SORT BY
It performs a total ordering of the query result set.
This means that all the data is passed through a single reducer, which may take an unacceptably long time to execute for larger data sets.
Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted. Better performance is traded for total ordering.
In both cases, the syntax differs only by the use of the ORDER or SORT keyword. You can specify any columns you wish and specify whether or not the columns are ascending using the ASC keyword (the default) or descending using the DESC keyword.
Here is an example using ORDER BY:
hive> SELECT s.ymd, s.symbol, s.price_close
>FROM stocks s
>ORDER BY s.ymd ASC, s.symbol DESC;
@thanoojgithub
thanoojgithub / Casting_in_Hive
Created January 28, 2016 13:25
Casting in Hive
hive> set hive.cli.print.header=true;
hive> select * from raw;
OK
raw.id raw.name raw.dt
1 sriram 2015-10-12
2 seetamma 2015-09-12
3 lakshmana 2015-11-12
4 bharatha 2015-12-12
5 Shatrughna 2015-12-12
6 hanuma 2015-11-12
@thanoojgithub
thanoojgithub / Views in Hive
Last active May 29, 2018 08:02
Views in Hive
hive> CREATE VIEW raw_view AS SELECT r1.id, r2.name,r1.dt FROM raw1 r1 JOIN raw2 r2 ON (r1.id=r2.id) WHERE r1.name like "s%";
OK
id name dt
Time taken: 0.251 seconds
hive> select * from raw_view;
Query ID = ubuntu_20160128185605_ee25eca7-b580-42a5-89af-18d412352f08
Total jobs = 1
2016-01-28 18:56:09,141 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/ubuntu/ubuntu_20160128185605_ee25eca7-b580-42a5-89af-18d412352f08.log
@thanoojgithub
thanoojgithub / EXPORT query result into a directory in Hive
Created January 28, 2016 14:39
EXPORT query result into a directory in Hive
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/ubuntu/input/raw' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from raw;
Query ID = ubuntu_20160128195617_0b1801f7-00eb-4b61-b8ac-4e190a6a00a5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1453963805097_0040, Tracking URL = http://ubuntu:8088/proxy/application_1453963805097_0040/
Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1453963805097_0040
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-01-28 19:56:24,017 Stage-1 map = 0%, reduce = 0%
2016-01-28 19:56:29,416 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.7 sec
@thanoojgithub
thanoojgithub / HBase basics
Last active May 29, 2018 08:00
HBase basics
ubuntu@ubuntu:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop2/logs/hadoop-ubuntu-secondarynamenode-ubuntu.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop2/logs/yarn-ubuntu-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /usr/local/hadoop2/logs/yarn-ubuntu-nodemanager-ubuntu.out
@thanoojgithub
thanoojgithub / Scala setup
Last active March 7, 2016 07:37
Scala setup
Scala Envi:
1. Install scala
1.1 Download scala (latest version) form http://www.scala-lang.org/
1.2 Uncompress it
1.3 Add the scala bin folder to path variable
2. Eclipse mars or luna
@thanoojgithub
thanoojgithub / Hive Basics.sql
Last active May 29, 2018 08:01
Hive Basics
% tar xzf apache-hive-x.y.z-bin.tar.gz
% export HIVE_HOME=~/home/ubuntu/hive-1.2.1
% export PATH=$PATH:$HIVE_HOME/bin
hadoop: /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop.cmd
ubuntu@ubuntu:~$ whereis hive
hive: /home/ubuntu/softwares/apache-hive-2.1.0-SNAPSHOT-bin/bin/hive /home/ubuntu/softwares/apache-hive-2.1.0-SNAPSHOT-bin/bin/hive.cmd