Skip to content

Instantly share code, notes, and snippets.

@ndimiduk
Last active December 11, 2015 00:29
Show Gist options
  • Select an option

  • Save ndimiduk/4516811 to your computer and use it in GitHub Desktop.

Select an option

Save ndimiduk/4516811 to your computer and use it in GitHub Desktop.
An example of consuming HBase tables from Cascalog using my patched HBaseScheme in ndimiduk/maple.
(ns cascalog-repl.core
"An example of consuming an HBase tap from Cascalog. First, create and
populate the table:
$ hbase shell
> create 't1', 'f1'
> put 't1', 'TheRealMT', 'f1:user', 'Mark Twain'
> put 't1', 'TheRealMT', 'f1:email', 'samuel@clemens.org'
> put 't1', 'TheRealMT', 'f1:pass', 'abc123'
> put 't1', 'GrandpaD', 'f1:user', 'Fyodor Dostoyevsky'
> put 't1', 'GrandpaD', 'f1:email', 'fyodor@brothers.net'
> put 't1', 'GrandpaD', 'f1:pass', 'abc123'
> put 't1', 'SirDoyle', 'f1:user', 'Sir Arthur Conan Doyle'
> put 't1', 'SirDoyle', 'f1:email', 'art@TheQueensMen.co.uk'
> put 't1', 'SirDoyle', 'f1:pass', 'abc123'
> put 't1', 'HMS_Surprise', 'f1:user', 'Patrick OBrian
> put 't1', 'HMS_Surprise', 'f1:email', 'aubrey@sea.com'
> put 't1', 'HMS_Surprise', 'f1:pass', 'abc123'
> ^D
Now compile and run this example. `lein run` will suffice."
(:require
[cascalog [api :as api] [workflow :as w]])
(:import
[com.twitter.maple.hbase HBaseScheme HBaseTap]
[org.apache.hadoop.hbase.util Bytes])
(:gen-class))
(defn stringify
"Convert ImmutableBytesWritable and similar to String."
[bs]
(-> (.get bs)
(Bytes/toString)))
(defn -main [& args]
(api/with-job-conf {"cascading.hbase.cascalogsafe" true}
(let [keyfields (w/fields "?rowkey")
valuefields (w/fields ["?user" "?email" "?pass"])
hbase-scheme (HBaseScheme. keyfields "f1" valuefields)
hbase-tap (HBaseTap. "localhost" "t1" hbase-scheme)]
(println "example 1: dump the tap -- Fields are instances of ImmutableBytesWritable.")
(api/?- (api/stdout) hbase-tap)
(println "example 2: select a couple fields, convert to Strings.")
(api/?<- (api/stdout) [?rowkey ?user]
(hbase-tap ?rowkey-bytes ?user-bytes _ _)
(stringify ?rowkey-bytes :> ?rowkey)
(stringify ?user-bytes :> ?user)))))
(defproject cascalog-repl "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:repositories [["conjars" "http://conjars.org/repo/"]]
:dependencies [[org.clojure/clojure "1.4.0"]
[cascalog "1.9.0"]]
:profiles {:dev
{:dependencies
[[midje "1.3.1" :exclusions [org.clojure/clojure]]
[org.apache.hadoop/hadoop-core "1.0.3"]
[asm "3.2"]
[org.apache.hbase/hbase "0.94.3"
:exclusions [org.apache.hadoop/hadoop-core asm]]
[cascading/cascading-hadoop "2.0.0"
:exclusions [org.codehaus.janino/janino
org.apache.hadoop/hadoop-core]]
;; be sure to build and install the patched version
[com.twitter/maple "0.2.7-SNAPSHOT"]]}}
:aot [cascalog-repl.core]
:main cascalog-repl.core)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment