Skip to content

Instantly share code, notes, and snippets.

@thanoojgithub
Created March 20, 2017 16:22
Show Gist options
  • Select an option

  • Save thanoojgithub/de7f6928047449773bf445a44b8baf86 to your computer and use it in GitHub Desktop.

Select an option

Save thanoojgithub/de7f6928047449773bf445a44b8baf86 to your computer and use it in GitHub Desktop.
Nth highest value in Hive
hive> CREATE TABLE emp_sal(id INT, salary DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile;
OK
Time taken: 0.261 seconds
hive> LOAD DATA INPATH 'maprfs:/home/thanooj/emp_sal.txt' INTO TABLE emp_sal;
Loading data to table thanooj.emp_sal
Table thanooj.emp_sal stats: [numFiles=1, numRows=0, totalSize=139, rawDataSize=0]
OK
Time taken: 0.504 seconds
hive> select * from emp_sal;
OK
1 423.0
2 23.0
3 67878.0
4 84.0
5 9999999.0
6 1.0
7 578.0
8 8877.0
9 4445.0
10 756.0
11 423.0
12 23.0
13 67878.0
14 84.0
15 9999999.0
16 1.0
17 578.0
18 8877.0
19 4445.0
20 756.0
Time taken: 0.459 seconds, Fetched: 20 row(s)
hive> select salary, ROW_NUMBER() over (ORDER BY salary) as row_no from emp_sal group by salary;
Query ID = tkalathu_20170320040147_1a78cd29-1925-4d08-9540-9a185d80b63b
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_109952, Tracking URL = http://localhost:8088/proxy/application_1489591891391_109952/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_109952
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-03-20 04:01:53,254 Stage-1 map = 0%, reduce = 0%
2017-03-20 04:01:59,449 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.02 sec
2017-03-20 04:02:06,642 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.34 sec
MapReduce Total cumulative CPU time: 9 seconds 340 msec
Ended Job = job_1489591891391_109952
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_109964, Tracking URL = http://localhost:8088/proxy/application_1489591891391_109964/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_109964
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-03-20 04:02:12,289 Stage-2 map = 0%, reduce = 0%
2017-03-20 04:02:18,441 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.43 sec
2017-03-20 04:02:24,588 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1489591891391_109964
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.34 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.95 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 290 msec
OK
1.0 1
23.0 2
84.0 3
423.0 4
578.0 5
756.0 6
4445.0 7
8877.0 8
67878.0 9
9999999.0 10
Time taken: 38.156 seconds, Fetched: 10 row(s)
hive> select salary, ROW_NUMBER() over (ORDER BY salary DESC) as row_no from emp_sal group by salary;
Query ID = tkalathu_20170320041238_988084e9-b1af-431a-87e6-bad8ffa47e49
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_110316, Tracking URL = http://localhost:8088/proxy/application_1489591891391_110316/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_110316
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-03-20 04:12:43,224 Stage-1 map = 0%, reduce = 0%
2017-03-20 04:12:49,367 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.33 sec
2017-03-20 04:12:55,514 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.68 sec
MapReduce Total cumulative CPU time: 9 seconds 680 msec
Ended Job = job_1489591891391_110316
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_110328, Tracking URL = http://localhost:8088/proxy/application_1489591891391_110328/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_110328
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-03-20 04:13:01,069 Stage-2 map = 0%, reduce = 0%
2017-03-20 04:13:07,206 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.37 sec
2017-03-20 04:13:13,344 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 2.79 sec
MapReduce Total cumulative CPU time: 2 seconds 790 msec
Ended Job = job_1489591891391_110328
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.68 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.79 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 470 msec
OK
9999999.0 1
67878.0 2
8877.0 3
4445.0 4
756.0 5
578.0 6
423.0 7
84.0 8
23.0 9
1.0 10
Time taken: 36.021 seconds, Fetched: 10 row(s)
hive> select * from (select salary, ROW_NUMBER() over (ORDER BY salary DESC) as row_no from emp_sal group by salary) res where res.row_no = 4;
Query ID = tkalathu_20170320041346_875476c7-40c7-4684-91c4-aacdedcee178
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_110345, Tracking URL = http://localhost:8088/proxy/application_1489591891391_110345/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_110345
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-03-20 04:13:52,428 Stage-1 map = 0%, reduce = 0%
2017-03-20 04:13:58,567 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.73 sec
2017-03-20 04:14:04,694 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.01 sec
MapReduce Total cumulative CPU time: 3 seconds 10 msec
Ended Job = job_1489591891391_110345
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1489591891391_110352, Tracking URL = http://localhost:8088/proxy/application_1489591891391_110352/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1489591891391_110352
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-03-20 04:14:10,263 Stage-2 map = 0%, reduce = 0%
2017-03-20 04:14:16,419 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.38 sec
2017-03-20 04:14:22,549 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.35 sec
MapReduce Total cumulative CPU time: 3 seconds 350 msec
Ended Job = job_1489591891391_110352
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.01 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.35 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 360 msec
OK
4445.0 4
Time taken: 37.302 seconds, Fetched: 1 row(s)
hive> select * from emp_sal_dept;
OK
1 423.0 2
2 23.0 1
3 67878.0 1
4 84.0 2
5 9999999.0 3
6 1.0 3
7 578.0 2
8 8877.0 4
9 4445.0 5
10 756.0 2
11 423.0 1
12 23.0 3
13 67878.0 4
14 84.0 2
15 9999999.0 1
16 1.0 2
17 578.0 1
18 8877.0 4
19 4445.0 3
20 756.0 2
Time taken: 0.295 seconds, Fetched: 20 row(s)
hive> select * from (select salary, ROW_NUMBER() over (ORDER BY salary DESC) as row_no from emp_sal_dept group by dept, salary) res;
OK
9999999.0 1
9999999.0 2
67878.0 3
67878.0 4
8877.0 5
4445.0 6
4445.0 7
756.0 8
578.0 9
578.0 10
423.0 11
423.0 12
84.0 13
23.0 14
23.0 15
1.0 16
1.0 17
Time taken: 37.1 seconds, Fetched: 17 row(s)
hive> select * from (select salary, ROW_NUMBER() over (ORDER BY salary DESC) as row_no from emp_sal_dept group by dept, salary) res where res.row_no = 4;
OK
67878.0 4
Time taken: 39.167 seconds, Fetched: 1 row(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment