Last active
August 29, 2015 13:57
-
-
Save natbusa/9651913 to your computer and use it in GitHub Desktop.
word count: hadoop hive using later views and string operators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- Hive queries for Word Count | |
| drop table if exists doc; | |
| -- 1) create table to load whole file | |
| create table doc( | |
| text string | |
| ) row format delimited fields terminated by '\n' stored as textfile; | |
| --2) loads plain text file | |
| --if file is .csv then in replace '\n' by ',' in step no 1 (creation of doc table) | |
| load data local inpath './lorem.txt' overwrite into table doc; | |
| -- 3) wordCount in single line | |
| SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(lower(text), '\\W+')) lTable as word GROUP BY word; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment