Skip to content

Instantly share code, notes, and snippets.

@nsabharwal
Created May 22, 2015 16:10
Show Gist options
  • Save nsabharwal/ba7d875e058add1b4843 to your computer and use it in GitHub Desktop.
Save nsabharwal/ba7d875e058add1b4843 to your computer and use it in GitHub Desktop.
Phoenix
Please see the following details on Apache Phoenix “sql skin for HBase” .
Phoenix
The following details are based on a test done in one of my lab environments. You can see that we can run sql, secondary indexes, explain plan, data load and bulk load by using phoenix.
Table definition
drop table if exists crime;
create table crime (
caseid varchar,
Date varchar,
block varchar,
description varchar,
sdesc varchar,
ldesc varchar,
arrest char(2),
domestic char(2),
lat float,
long float
constraint PK PRIMARY KEY (caseid)
);
Sql prompt can be launched from your desktop by using
sqlline.py <zookeeper server>:<zookeeper port>
0: jdbc:phoenix:lab> !describe CRIME
+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+—————+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | COLUMN_DEF | SQL_DATA_TYPE |
+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+—————+
| null | null | CRIME | CASEID | 12 | VARCHAR | null | null | null | null | 0 | null | null |
| null | null | CRIME | DATE | 12 | VARCHAR | null | null | null | null | 1 | null | null |
| null | null | CRIME | BLOCK | 12 | VARCHAR | null | null | null | null | 1 | null | null |
| null | null | CRIME | DESCRIPTION | 12 | VARCHAR | null | null | null | null | 1 | null | null |
| null | null | CRIME | SDESC | 12 | VARCHAR | null | null | null | null | 1 | null | null |
| null | null | CRIME | LDESC | 12 | VARCHAR | null | null | null | null | 1 | null | null |
| null | null | CRIME | ARREST | 1 | CHAR | 2 | null | null | null | 1 | null | null |
| null | null | CRIME | DOMESTIC | 1 | CHAR | 2 | null | null | null | 1 | null | null |
| null | null | CRIME | LAT | 6 | FLOAT | null | null | null | null | 1 | null | null |
| null | null | CRIME | LONG | 6 | FLOAT | null | null | null | null | 1 | null | null |
+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+———————+
0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’Y';
+————+
| COUNT(1) |
+————+
|1092172 |
+————+
1 row selected (9.964 seconds)
Secondary Indexes : http://phoenix.apache.org/secondary_indexing.html
0: jdbc:phoenix:lab> create index index_arrest on CRIME (arrest);
3,837,776 rows affected (95.428 seconds)
0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’Y';
+————+
| COUNT(1) |
+————+
|1092172 |
+————+
1 row selected (2.169 seconds)
0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’N';
+————+
| COUNT(1) |
+————+
|2745604 |
+————+
1 row selected (4.055 seconds)
explain select count(1) from CRIME where ARREST=’Y';
+————+
| PLAN |
+————+
| CLIENT PARALLEL 32-WAY RANGE SCAN OVER INDEX_ARREST [‘Y’] |
| SERVER AGGREGATE INTO SINGLE ROW |
+————+
2 rows selected (0.03 seconds)
Statistics: http://phoenix.apache.org/update_statistics.html
Tuning : http://phoenix.apache.org/tuning.html
Infomarion: http://phoenix.apache.org/language/index.html
You can see table in HBASE
hbase(main):004:0> list
TABLE
CRIME
Data load
/usr/lib/phoenix/bin/psql.py -t CRIME lab:2181 crim4.csv
Bulk load – Be careful about ignore-errors parameter
HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/lib/phoenix/phoenix-4.0.0.2.1.4.0-632-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool –table CRIME –input /tmp/crim4.csv -z lab:2181 –ignore-errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment