Skip to content

Instantly share code, notes, and snippets.

@cherniag
Last active January 24, 2017 18:35
Show Gist options
  • Save cherniag/e77099106126a096ba531813f7542671 to your computer and use it in GitHub Desktop.
Save cherniag/e77099106126a096ba531813f7542671 to your computer and use it in GitHub Desktop.
impala
1.Import csv file to hdfs:
export HADOOP_USER_NAME=impala
./hadoop fs -put /Users/gennadii/dev/performance_impala.csv hdfs://<host>:<port>/user/impala/performance_impala.csv
2.Create IDEA datasource Impala using Impala JDBC driver (all jars in zip)
3.Create connection
jdbc:impala://<host>:<port>
4.Create table (the same name as file):
create table performance_impala(
field_string_1 string,
field_integer_1 int,
field_double_1 double,
field_date_1 timestamp)
row format delimited
fields terminated by ',';
5.Load data from csv (use absolute hdfs path)
LOAD DATA INPATH '/user/impala/performance_impala.csv' INTO TABLE performance_impala
6.Check:
select * from performance_impala limit 10;
7.Skip 1st header row:
alter table performance_impala set tblproperties('skip.header.line.count'='1');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment