- Cube is awesome
- streaming dashboards, yay
- etsy says so
- flexible, pretty, and streaming: meets our needs
- what's so great?
- metrics and events
- event data
- metrics and events
- metrics are calculation caches
package com.infochimps.kafka.consumers; | |
import java.io.IOException; | |
import java.io.File; | |
import java.io.FileInputStream; | |
import java.io.InputStream; | |
import java.nio.ByteBuffer; | |
import java.util.HashMap; | |
import java.util.List; | |
import java.util.Map; |
Assuming these goals, from most to least important:
- that you, shaving finally learned (mostly) how to switch inputs, continue to have a system whose behavior you can predict
- reasonable-quality sound in den and living room ...
- ... while playing CDs
- ... while playing radio
- ---- line of essential ^^^ vs. important vvv ---
- ... while playing songs from computer
From this Language Log post about the mysterious language on two postcards from the 1910's --
[core] | |
excludesfile = ~/.gitignore | |
editor = nano | |
askpass = /Users/flip/bin/git-password | |
# pager = less -FRSX | |
# whitespace = fix,-indent-with-non-tab,trailing-space,cr-at-eol | |
[credential] | |
helper = osxkeychain | |
[color] | |
diff = auto |
require 'configliere' ; Settings.use :commandline | |
require 'gorillib' | |
require 'gorillib/data_munging' | |
require 'pry' | |
Settings.define :data_root, default: 's3n://bigdata.chimpy.us', description: "directory root for data to process" | |
Settings.resolve! | |
Pathname.register_paths( |
#!/usr/bin/env ruby | |
warn [ARGV.inspect, $0] | |
WULIGN_VERSION = "1.0" | |
USAGE= %Q{ | |
# h1. wulign -- format a tab-separated file as aligned columns | |
# | |
# wulign will intelligently reformat a tab-separated file into a tab-separated, |
{ | |
"fancy_chars.rb": {}, | |
"hello.txt|jinja": { "allinputs": true } | |
} |
== Overview of Datasets ==
The examples in this book use the "Chimpmark" datasets: a set of freely-redistributable datasets, converted to simple standard formats, with traceable provenance and documented schema. They are the same datasets as used in the upcoming Chimpmark Challenge big-data benchmark. The datasets are:
-
Wikipedia English-language Article Corpus (
wikipedia_corpus
; 38 GB, 619 million records, 4 billion tokens): the full text of every English-language wikipedia article, in -
Wikipedia Pagelink Graph (
wikipedia_pagelinks
; ) -- -
Wikipedia Pageview Stats (
wikipedia_pageviews
; 2.3 TB, about 250 billion records (FIXME: verify num records)) -- hour-by-hour pageview
require 'formula' | |
class Kindlegen < Formula | |
url 'http://s3.amazonaws.com/kindlegen/KindleGen_Mac_i386_v2_5.zip' | |
homepage 'http://www.amazon.com/gp/feature.html?docId=1000234621' | |
md5 '8daf6956d54df8030b12ec9116945482' | |
version '2.5' | |
skip_clean 'bin' |