Last active
August 29, 2015 13:56
-
-
Save evidanary/9148018 to your computer and use it in GitHub Desktop.
Extract Schema from a JSON row. Generates a hive table. Flattens based on {"event" => ....., "properties" = > ....}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This utilityi extracts schema of a JSON object | |
# It returns the keys of the JSON object separated by commas | |
# It also returns a create table statement in Hive | |
require '/var/lib/jenkins/scripts/lib/hive_utils.rb' | |
require 'json' | |
# This function should probably be in hive_utils.rb | |
def create_table_string(params) | |
header = "CREATE TABLE #{params[:hive_schema]}.#{params[:hive_table]}(" | |
footer = ")" + "\n" + | |
"PARTITIONED BY (dt string)\n" + | |
"ROW FORMAT DELIMITED" + "\n" + | |
" FIELDS TERMINATED BY '\\t'" + "\n" + | |
" LINES TERMINATED BY '\\n';" | |
header + params[:fields].join(",\n") + footer | |
end | |
def drop_table_string(params) | |
"DROP TABLE IF EXISTS #{params[:hive_schema]}.#{params[:hive_table]};" | |
end | |
def flatten_event(jevent) | |
new_event = {} | |
new_event["event"] = jevent["event"] | |
new_event.merge!(jevent["properties"]) | |
new_event | |
end | |
def sanitize_to_hive_columns(field_name) | |
#replacce $ with '' and space with _, : with _ | |
field_name.gsub(/\$+/, '').gsub(' ', '_').gsub(':', '_') | |
end | |
json_payload = flatten_event(JSON.load(gets.split("\n").first)) | |
keys = json_payload.keys | |
3.times {puts} | |
puts "JSON_KEYS: " | |
puts keys.join(',') | |
3.times {puts} | |
#sanitize keys to column_names | |
keys.map! {|x| " " + sanitize_to_hive_columns(x) + " string"} | |
params = { | |
:hive_schema => 'sandbox_yranadive', | |
:hive_table => 'mixpanel_AirplaneModeChanged', | |
:fields => keys | |
} | |
puts "HIVE CREATE TABLE STMT: " | |
puts drop_table_string(params) | |
puts create_table_string(params) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Where do I find the hive_utils.rb file?