Skip to content

Instantly share code, notes, and snippets.

View erasmas's full-sized avatar
🇺🇦

Dmytro Kobza erasmas

🇺🇦
View GitHub Profile
@erasmas
erasmas / gist:cbc348b3c95d961a9b19
Created December 26, 2014 15:20
Why combined query doesn't have :available-fields? #cascalog
(let [source1 [[1 2 3]]
source2 [[4 5 6]]
fields ["?a" "?b" "?c"]
query (combine
(<- fields
(source1 :>> fields))
(<- fields
(source2 :>> fields)))]
(:available-fields query))
=> nil
@erasmas
erasmas / project.clj
Created December 17, 2014 11:18
project.clj example for Cascalog app
(defproject myapp "0.1.0-SNAPSHOT"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:repositories [["conjars.org" "http://conjars.org/repo"]]
:dependencies [[org.clojure/clojure "1.6.0"]]
:aot [myapp.core]
:main myapp.core
:profiles {:provided {:dependencies [[org.apache.hadoop/hadoop-client "2.4.0"]
[org.apache.hadoop/hadoop-mapreduce-client-core "2.4.0"]]}
(defn- transform-values [parse-tree values-map]
"Replaces all expressions in parsed tree with values from a given map."
(loop [loc (zip/vector-zip parse-tree)]
(if (zip/end? loc)
(zip/root loc)
(if (zip/branch? loc)
(let [id (last (zip/children loc))]
(if (contains? values-map id)
(recur (zip/next (zip/replace loc (zip/node [(get values-map id)]))))
(recur (zip/next (zip/edit loc #(into [] (butlast %)))))))
@erasmas
erasmas / gist:dd94da1404f46abdcdca
Last active August 29, 2015 14:09
Collecting unique values from multiple columns in Cascalog. https://groups.google.com/forum/#!topic/cascalog-user/CjpApzUiwHw
(defparallelagg collect-set
:init-var (mapfn [s] #{s})
:combine-var into
:present-var identity)
(let [set->string (mapfn [s1 s2] (clojure.string/join "," (clojure.set/union s1 s2)))]
(??<- [?id ?fruits]
([["1" "banana" "grape"]
["1" "apple" "apple"]
["1" "apple" "lemon"]
@erasmas
erasmas / gist:eaff8ee6f9feb7700622
Created November 12, 2014 16:54
Using aggregations in Cascalog
(def test-tap [["a" "x1" 0 1 0]
["a" "x2" 1 0 1]
["b" "bar" 0 1 0]
["b" "foo" 1 1 0]])
(defaggregatefn dosum
([] 0)
([state val] (+ state val))
([state] [state]))
You have an array of integers, and for each index you want to find the product of every integer except the integer at that index.
Write a function get_products_of_all_ints_except_at_index() that takes an array of integers and returns an array of the products.
For example, given:
[1, 7, 3, 4]
your function would return:
@erasmas
erasmas / WriteToParquetExample.java
Last active March 29, 2017 05:47
Cascalog workflow to copy data from CSV to Parquet. How do I fix this so that schema fields are not prepended with '?' ?
/**
* Same workflow but using Cascading, output fields in Parquet file are obviously fine and not prepended with '?'
*/
package cascading.sandbox;
import cascading.flow.Flow;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.property.AppProps;
@erasmas
erasmas / gist:0d946ca76a17fb309bee
Last active August 29, 2015 14:07
Matrix trace using map-indexed from clatrix library.
(require '[clatrix.core :as cl])
(import '[clatrix.core Matrix])
(defn matrix-trace
[^Matrix m]
(->> m
(cl/map-indexed (fn [i j v] (if (= i j) v 0)))
(mapv #(reduce + %))
(reduce +)))

Keybase proof

I hereby claim:

  • I am erasmas on github.
  • I am dmorozov (https://keybase.io/dmorozov) on keybase.
  • I have a public key whose fingerprint is 6618 55DB F07F 1E3D A04B 3114 ED24 AA27 0634 E2A0

To claim this, I am signing this object:

1. Writool - tool for writers

Online markdown text editor capable to generate PDF and HTML. Contains a set of useful options for writers:

  • spelling correction
  • grammar checker
  • translations
  • synonyms

2. StoryTeller