Skip to content

Instantly share code, notes, and snippets.

@steeve
Created September 22, 2011 22:37
Show Gist options
  • Save steeve/1236242 to your computer and use it in GitHub Desktop.
Save steeve/1236242 to your computer and use it in GitHub Desktop.
Brisk + Cassandra + get_slice
Brisk's Hive allows you to transpose a row key to a table of (row_key, column_name, value).
Now we are able to leverage Cassandra's get_slice to only return
a subset of columns. Very useful when using Cassandra indexes (wide rows).
See the pull request: https://github.com/riptano/hive/pull/3
Let's say you have a wide row index:
So instead of having:
SELECT * FROM MyTable WHERE a > x and b < y;
You can do:
CREATE EXTERNAL TABLE MyDB.MyTmpIdx(key string, col_value int, foreign_key string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.cf.name" = "MyCF",
"cassandra.columns.mapping" = ":key,:column,:value")
TBLPROPERTIES (
"cassandra.ks.name" = "MyKS",
"cassandra.slice.predicate.range.start" = "50",
"cassandra.slice.predicate.range.finish" = "800",
"cassandra.slice.predicate.range.reversed" = "false",
"cassandra.slice.predicate.range.comparator" = "org.apache.cassandra.db.marshal.IntegerType" );
SELECT MyTable.* FROM MyTable LEFT SEMI JOIN MyTmpIdx on (MyTable.key = MyTmpIdx.foreign_key);
Boom, you only have a subset, directly from Cassandra, before the mapping even began :)
@steeve
Copy link
Author

steeve commented Sep 22, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment