This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Same workflow but using Cascading, output fields in Parquet file are obviously fine and not prepended with '?' | |
*/ | |
package cascading.sandbox; | |
import cascading.flow.Flow; | |
import cascading.flow.FlowDef; | |
import cascading.flow.hadoop.HadoopFlowConnector; | |
import cascading.pipe.Pipe; | |
import cascading.property.AppProps; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You have an array of integers, and for each index you want to find the product of every integer except the integer at that index. | |
Write a function get_products_of_all_ints_except_at_index() that takes an array of integers and returns an array of the products. | |
For example, given: | |
[1, 7, 3, 4] | |
your function would return: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(def test-tap [["a" "x1" 0 1 0] | |
["a" "x2" 1 0 1] | |
["b" "bar" 0 1 0] | |
["b" "foo" 1 1 0]]) | |
(defaggregatefn dosum | |
([] 0) | |
([state val] (+ state val)) | |
([state] [state])) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defparallelagg collect-set | |
:init-var (mapfn [s] #{s}) | |
:combine-var into | |
:present-var identity) | |
(let [set->string (mapfn [s1 s2] (clojure.string/join "," (clojure.set/union s1 s2)))] | |
(??<- [?id ?fruits] | |
([["1" "banana" "grape"] | |
["1" "apple" "apple"] | |
["1" "apple" "lemon"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn- transform-values [parse-tree values-map] | |
"Replaces all expressions in parsed tree with values from a given map." | |
(loop [loc (zip/vector-zip parse-tree)] | |
(if (zip/end? loc) | |
(zip/root loc) | |
(if (zip/branch? loc) | |
(let [id (last (zip/children loc))] | |
(if (contains? values-map id) | |
(recur (zip/next (zip/replace loc (zip/node [(get values-map id)])))) | |
(recur (zip/next (zip/edit loc #(into [] (butlast %))))))) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defproject myapp "0.1.0-SNAPSHOT" | |
:license {:name "Eclipse Public License" | |
:url "http://www.eclipse.org/legal/epl-v10.html"} | |
:repositories [["conjars.org" "http://conjars.org/repo"]] | |
:dependencies [[org.clojure/clojure "1.6.0"]] | |
:aot [myapp.core] | |
:main myapp.core | |
:profiles {:provided {:dependencies [[org.apache.hadoop/hadoop-client "2.4.0"] | |
[org.apache.hadoop/hadoop-mapreduce-client-core "2.4.0"]]} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(let [source1 [[1 2 3]] | |
source2 [[4 5 6]] | |
fields ["?a" "?b" "?c"] | |
query (combine | |
(<- fields | |
(source1 :>> fields)) | |
(<- fields | |
(source2 :>> fields)))] | |
(:available-fields query)) | |
=> nil |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(let [fs (FileSystem/getLocal (Configuration.)) | |
path (Path. "/data/in/*.txt.gz")] | |
(map #(.getPath %) (.globStatus fs path))) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns cascalog-class.core | |
(:require [cascalog.api :refer :all] | |
[cascalog.ops :as c])) | |
(defmapcatop split | |
[^String sentence] | |
(.split sentence "\\s+")) | |
(def -main | |
(?<- (stdout) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2015-02-04 12:47:45,300 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(581)) - Using ResourceCalculatorProcessTree : null | |
2015-02-04 12:47:45,327 INFO [LocalJobRunner Map Task Executor #0] io.MultiInputSplit (MultiInputSplit.java:readFields(161)) - current split input path: file:/tmp/staging/data/staging-path/part-00000 | |
2015-02-04 12:47:45,328 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:updateJobWithSplit(462)) - Processing split: cascading.tap.hadoop.io.MultiInputSplit@29033afe | |
2015-02-04 12:47:45,333 WARN [LocalJobRunner Map Task Executor #0] io.MultiInputFormat (Util.java:retry(768)) - unable to get record reader, but not retrying | |
java.io.IOException: file:/tmp/staging/data/staging-path/part-00000 not a SequenceFile | |
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1850) | |
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) | |
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) | |
a |