Last active
August 9, 2018 12:01
-
-
Save thanoojgithub/84c7509db07d8d3e7798 to your computer and use it in GitHub Desktop.
collect_set collect_list in Hive
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hive> select * from thanooj.cust_credit; | |
OK | |
cust_credit.id cust_credit.name cust_credit.doa cust_credit.location | |
1 sriram 2015-10-12 ayodhya | |
2 seeta 2015-09-12 midhila | |
3 lakshman 2015-11-12 ayodhya | |
4 bharata 2015-12-12 ayodhya | |
5 sathrugna 2015-12-12 ayodhya | |
6 hanuma 2015-11-12 ayodhya | |
7 sriram 2015-10-12 ayodhya | |
8 seeta 2015-09-12 midhila | |
9 lakshman 2015-11-12 ayodhya | |
10 bharata 2015-12-12 ayodhya | |
11 sathrugna 2015-12-12 ayodhya | |
12 hanuma 2015-11-12 ayodhya | |
Time taken: 0.085 seconds, Fetched: 12 row(s) | |
hive> select doa, collect_set(location) as c_location from thanooj.cust_credit group by doa; | |
Query ID = ubuntu_20160304013923_5b743ed0-5ce2-4663-ad7a-9127fc1b8f65 | |
Total jobs = 1 | |
Launching Job 1 out of 1 | |
Number of reduce tasks not specified. Estimated from input data size: 1 | |
In order to change the average load for a reducer (in bytes): | |
set hive.exec.reducers.bytes.per.reducer=<number> | |
In order to limit the maximum number of reducers: | |
set hive.exec.reducers.max=<number> | |
In order to set a constant number of reducers: | |
set mapreduce.job.reduces=<number> | |
Starting Job = job_1457070465534_0011, Tracking URL = http://ubuntu:8088/proxy/application_1457070465534_0011/ | |
Kill Command = /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop job -kill job_1457070465534_0011 | |
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 | |
2016-03-04 01:39:30,505 Stage-1 map = 0%, reduce = 0% | |
2016-03-04 01:39:49,275 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.28 sec | |
2016-03-04 01:39:55,563 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.82 sec | |
MapReduce Total cumulative CPU time: 2 seconds 820 msec | |
Ended Job = job_1457070465534_0011 | |
MapReduce Jobs Launched: | |
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.82 sec HDFS Read: 8357 HDFS Write: 76 SUCCESS | |
Total MapReduce CPU Time Spent: 2 seconds 820 msec | |
OK | |
doa c_location | |
2015-09-12 ["midhila"] | |
2015-10-12 ["ayodhya"] | |
2015-11-12 ["ayodhya"] | |
2015-12-12 ["ayodhya"] | |
Time taken: 33.306 seconds, Fetched: 4 row(s) | |
hive> select doa, collect_list(location) as c_location from thanooj.cust_credit group by doa; | |
Query ID = ubuntu_20160304014426_86800032-1c12-4d9d-8b42-b3f14984ef8d | |
Total jobs = 1 | |
Launching Job 1 out of 1 | |
Number of reduce tasks not specified. Estimated from input data size: 1 | |
In order to change the average load for a reducer (in bytes): | |
set hive.exec.reducers.bytes.per.reducer=<number> | |
In order to limit the maximum number of reducers: | |
set hive.exec.reducers.max=<number> | |
In order to set a constant number of reducers: | |
set mapreduce.job.reduces=<number> | |
Starting Job = job_1457070465534_0012, Tracking URL = http://ubuntu:8088/proxy/application_1457070465534_0012/ | |
Kill Command = /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop job -kill job_1457070465534_0012 | |
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 | |
2016-03-04 01:44:34,866 Stage-1 map = 0%, reduce = 0% | |
2016-03-04 01:44:53,565 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.35 sec | |
2016-03-04 01:45:00,850 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.87 sec | |
MapReduce Total cumulative CPU time: 2 seconds 870 msec | |
Ended Job = job_1457070465534_0012 | |
MapReduce Jobs Launched: | |
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.87 sec HDFS Read: 8365 HDFS Write: 140 SUCCESS | |
Total MapReduce CPU Time Spent: 2 seconds 870 msec | |
OK | |
doa c_location | |
2015-09-12 ["midhila","midhila"] | |
2015-10-12 ["ayodhya","ayodhya"] | |
2015-11-12 ["ayodhya","ayodhya","ayodhya","ayodhya"] | |
2015-12-12 ["ayodhya","ayodhya","ayodhya","ayodhya"] | |
Time taken: 35.462 seconds, Fetched: 4 row(s) | |
hive> select doa, collect_set(location)[0] as c_location from thanooj.cust_credit group by doa; | |
Query ID = ubuntu_20160304013713_6b5f6836-aea4-4f3d-9db7-8bc48afdc25a | |
Total jobs = 1 | |
Launching Job 1 out of 1 | |
Number of reduce tasks not specified. Estimated from input data size: 1 | |
In order to change the average load for a reducer (in bytes): | |
set hive.exec.reducers.bytes.per.reducer=<number> | |
In order to limit the maximum number of reducers: | |
set hive.exec.reducers.max=<number> | |
In order to set a constant number of reducers: | |
set mapreduce.job.reduces=<number> | |
Starting Job = job_1457070465534_0009, Tracking URL = http://ubuntu:8088/proxy/application_1457070465534_0009/ | |
Kill Command = /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop job -kill job_1457070465534_0009 | |
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 | |
2016-03-04 01:37:21,641 Stage-1 map = 0%, reduce = 0% | |
2016-03-04 01:37:51,873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.1 sec | |
2016-03-04 01:38:07,512 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.0 sec | |
MapReduce Total cumulative CPU time: 3 seconds 0 msec | |
Ended Job = job_1457070465534_0009 | |
MapReduce Jobs Launched: | |
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.0 sec HDFS Read: 8662 HDFS Write: 76 SUCCESS | |
Total MapReduce CPU Time Spent: 3 seconds 0 msec | |
OK | |
doa c_location | |
2015-09-12 midhila | |
2015-10-12 ayodhya | |
2015-11-12 ayodhya | |
2015-12-12 ayodhya | |
Time taken: 55.03 seconds, Fetched: 4 row(s) | |
hive> select doa, collect_list(location)[0] as c_location from thanooj.cust_credit group by doa; | |
Query ID = ubuntu_20160304013821_aadf5d59-2744-4399-a112-6f0b168dc49c | |
Total jobs = 1 | |
Launching Job 1 out of 1 | |
Number of reduce tasks not specified. Estimated from input data size: 1 | |
In order to change the average load for a reducer (in bytes): | |
set hive.exec.reducers.bytes.per.reducer=<number> | |
In order to limit the maximum number of reducers: | |
set hive.exec.reducers.max=<number> | |
In order to set a constant number of reducers: | |
set mapreduce.job.reduces=<number> | |
Starting Job = job_1457070465534_0010, Tracking URL = http://ubuntu:8088/proxy/application_1457070465534_0010/ | |
Kill Command = /home/ubuntu/softwares/hadoop-2.7.2/bin/hadoop job -kill job_1457070465534_0010 | |
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 | |
2016-03-04 01:38:28,539 Stage-1 map = 0%, reduce = 0% | |
2016-03-04 01:38:42,097 Stage-1 map = 100%, reduce = 0% | |
2016-03-04 01:38:43,121 Stage-1 map = 0%, reduce = 0% | |
2016-03-04 01:38:48,278 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.51 sec | |
2016-03-04 01:38:55,556 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.51 sec | |
MapReduce Total cumulative CPU time: 1 seconds 510 msec | |
Ended Job = job_1457070465534_0010 | |
MapReduce Jobs Launched: | |
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.35 sec HDFS Read: 8670 HDFS Write: 76 SUCCESS | |
Total MapReduce CPU Time Spent: 3 seconds 350 msec | |
OK | |
doa c_location | |
2015-09-12 midhila | |
2015-10-12 ayodhya | |
2015-11-12 ayodhya | |
2015-12-12 ayodhya | |
Time taken: 36.058 seconds, Fetched: 4 row(s) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
collect_set(col) Returns a set of objects(array) with duplicate elements eliminated.
collect_list(col) Returns a list of objects(array) with duplicates. (As of Hive 0.13.0.).