steeve · September 22, 2011 22:37 · steeve · Sep 22, 2011
diff --git a/gistfile1.txt b/gistfile1.txt
 Brisk's Hive allows you to transpose a row key to a table of (row_key, column_name, value).
 Now we are able to leverage Cassandra's get_slice to only return
 a subset of columns. Very useful when using Cassandra indexes (wide rows).

 See the pull request: https://github.com/riptano/hive/pull/3

 Let's say you have a wide row index:
 So instead of having:
    SELECT * FROM MyTable WHERE a > x and b < y;

 You can do:
    CREATE EXTERNAL TABLE MyDB.MyTmpIdx(key string, col_value int, foreign_key string)
    STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
    WITH SERDEPROPERTIES (
        "cassandra.cf.name" = "MyCF",
        "cassandra.columns.mapping" = ":key,:column,:value")
    TBLPROPERTIES (
        "cassandra.ks.name" = "MyKS",
        "cassandra.slice.predicate.range.start" = "50",
        "cassandra.slice.predicate.range.finish" = "800",
        "cassandra.slice.predicate.range.reversed" = "false",
        "cassandra.slice.predicate.range.comparator" = "org.apache.cassandra.db.marshal.IntegerType" );
    
    SELECT MyTable.* FROM MyTable LEFT SEMI JOIN MyTmpIdx on (MyTable.key = MyTmpIdx.foreign_key);

 Boom, you only have a subset, directly from Cassandra, before the mapping even began :)
	Brisk's Hive allows you to transpose a row key to a table of (row_key, column_name, value).
	Now we are able to leverage Cassandra's get_slice to only return
	a subset of columns. Very useful when using Cassandra indexes (wide rows).

	See the pull request: https://github.com/riptano/hive/pull/3

	Let's say you have a wide row index:
	So instead of having:
	SELECT * FROM MyTable WHERE a > x and b < y;

	You can do:
	CREATE EXTERNAL TABLE MyDB.MyTmpIdx(key string, col_value int, foreign_key string)
	STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
	WITH SERDEPROPERTIES (
	"cassandra.cf.name" = "MyCF",
	"cassandra.columns.mapping" = ":key,:column,:value")
	TBLPROPERTIES (
	"cassandra.ks.name" = "MyKS",
	"cassandra.slice.predicate.range.start" = "50",
	"cassandra.slice.predicate.range.finish" = "800",
	"cassandra.slice.predicate.range.reversed" = "false",
	"cassandra.slice.predicate.range.comparator" = "org.apache.cassandra.db.marshal.IntegerType" );

	SELECT MyTable.* FROM MyTable LEFT SEMI JOIN MyTmpIdx on (MyTable.key = MyTmpIdx.foreign_key);

	Boom, you only have a subset, directly from Cassandra, before the mapping even began :)