BOOM Analytics: Exploring Data-Centric, Declarative Programming for the Cloud
(ns creator.core | |
(:require-macros [cljs.core.async.macros :refer [go]]) | |
(:require [om.core :as om :include-macros true] | |
[cljs.core.async :as async :refer [chan <! >! put!]] | |
[om-tools.core :refer-macros [defcomponent]] | |
[cljs.reader :as reader] | |
[goog.dom :as gdom] | |
[om-tools.dom :as dom :include-macros true]) | |
(:import [goog.net XhrIo])) |
Quick summary:
Alienation is one of the ways that capitalism sucks. It's a symptom that something's not right, not the underlying cause. Alienation is something that happens because of the way that capitalism is built.
In short, alienation is a separation between things that should be together. This separation causes tension.
Four ways that capitalism is alienating:
{ | |
"1": "7642c4c2e24b176ed442153e3097e819bf5ca80e", | |
"2": "b3d784bbaf1286612ad44308231aa58350da17a6", | |
"3": "a05cb173c1ba1c6ae30bcf5edbd5d5e19566e764", | |
"4": "55d264f8671b50b5dbaffa56d1ee719fd429e8f4", | |
"5": "478ec952d186329f825db4ee7978e31dd42de622", | |
"6": "e82bd73e8c188d151ed790b5ad5a24fa01533fa0", | |
"7": "28a9dfb84ef81592f21ed285d7cf09be0b605988", | |
"8": "e98c13fc9df6a893efa0c57bcfa548be664ab8c8", | |
"9": "af316e6df5bded14f9b268fcbffe7892eb5f244c", |
stems = ARGF.read | |
.split | |
.each_cons(2) | |
.group_by { |word_pair| word_pair[0] } | |
def next_word ary | |
ary[rand(ary.length).to_i][1] | |
end | |
e = Enumerator.new do |e| |
N.B. This is now a library, thanks to the efforts of the wonderful @mtnygard. And the README does a good job of making clear just how terrible an idea it is to actually do this. :)
As any Clojurist knows, the REPL is an incredibly handy development tool. It can also be useful as a tool for debugging running programs. Of course, this raises the question of how to limit access to the REPL to authorized parties. With the Apache SSHD library, you can embed an SSH server in any JVM process. It takes only a little code to hook this up to a REPL, and to limit access either by public key or
This simple script will take a picture of a whiteboard and use parts of the ImageMagick library with sane defaults to clean it up tremendously.
The script is here:
#!/bin/bash
convert "$1" -morphology Convolve DoG:15,100,0 -negate -normalize -blur 0x1 -channel RBG -level 60%,91%,0.1 "$2"
Allocation = Struct.new(:project, :person, :start, :finish) do | |
include Comparable | |
def mergable_with?(other) | |
if start > other.start | |
other.mergable_with?(self) | |
else | |
same_assignment?(other) && | |
finish >= other.start - 1 | |
end |
Recent versions of Cloudera's Impala added NDV, a "number of distinct values" aggregate function that uses the HyperLogLog algorithm to estimate this number, in parallel, in a fixed amount of space.
This can make a really, really big difference: in a large table I tested this on, which had roughly 100M unique values of mycolumn
, using NDV(mycolumn)
got me an approximate answer in 27 seconds, whereas the exact answer using count(distinct mycolumn)
took ... well, I don't know how long, because I got tired of waiting for it after 45 minutes.
It's fun to note, though, that because of another recent addition to Impala's dialect of SQL, the fnv_hash
function, you don't actually need to use NDV; instead, you can build HyperLogLog yourself from mathematical primitives.
HyperLogLog hashes each value it sees, and then assigns them to a bucket based on the low order bits of the hash. It's common to use 1024 buckets, so we can get the bucket by using a bitwise & with 1023:
select