Skip to content

Instantly share code, notes, and snippets.

@y2k-shubham
Last active April 22, 2019 16:22
Show Gist options
  • Save y2k-shubham/5cdf9d03c97f8ae5eab9ea5a5c78fa00 to your computer and use it in GitHub Desktop.
Save y2k-shubham/5cdf9d03c97f8ae5eab9ea5a5c78fa00 to your computer and use it in GitHub Desktop.
Hive Create table DDL statement for CSV file
-- DO NOT use this; see comment below
CREATE EXTERNAL TABLE `db_name.table_name`(
`column_1` string,
`column_2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'escapeChar'='\\',
'quoteChar'='"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://my-bucket/db_name.db/table_name'
TBLPROPERTIES (
'skip.header.line.count'='1');
@y2k-shubham
Copy link
Author

note that in above (hive.serde2.OpenCSVSerde)

  • columns will automatically be forced into string regardless of DDL statement
  • its acceptable to enclose strings in double quotes (in input CSV)

to preserve types, use following

CREATE EXTERNAL TABLE `my_db.my_table`(
  `col_1` bigint,
  `col_2` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION
  's3://my-bucket/my_db.db/my_table'
TBLPROPERTIES (
  'skip.header.line.count'='1');

note that in above

  • the input CSV must NOT enclose string fields in quotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment