Skip to content

Instantly share code, notes, and snippets.

@natbusa
Last active August 29, 2015 13:57
Show Gist options
  • Save natbusa/9651913 to your computer and use it in GitHub Desktop.
Save natbusa/9651913 to your computer and use it in GitHub Desktop.
word count: hadoop hive using later views and string operators
-- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by '\n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace '\n' by ',' in step no 1 (creation of doc table)
load data local inpath './lorem.txt' overwrite into table doc;
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(lower(text), '\\W+')) lTable as word GROUP BY word;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment