Skip to content

Instantly share code, notes, and snippets.

I want to spend a week (during Hacker School alumni reunion week) better understanding performance (probably of things in the Hadoop ecosystem) on a few different dataset sizes (8GB, 100GB, 1TB). I have $1000 of AWS credit that I can spend on this (yay!)

Some things I want:

  • get a much better grasp on the performance of in-memory operations (put 8GB of data into memory and be done) vs running a distributed map reduce.
  • Understand what goes into the performance (how much time is spent copying data? sending data over the network? CPU?)
  • Learn something about tradeoffs

I'd love suggestions for experiments to run and setups to use. At work I've been using HDFS / Impala / Scalding, so my current thought is to spend time looking in depth at running a map/reduce with Scalding vs an Impala query vs running a non-distributed job in memory, because I already know about those things. But I'm open to other ideas!

@jberkus
jberkus / gist:6b1bcaf7724dfc2a54f3
Last active March 7, 2026 17:59
Finding Unused Indexes
WITH table_scans as (
SELECT relid,
tables.idx_scan + tables.seq_scan as all_scans,
( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
pg_relation_size(relid) as table_size
FROM pg_stat_user_tables as tables
),
all_writes as (
SELECT sum(writes) as total_writes
FROM table_scans
@tsiege
tsiege / The Technical Interview Cheat Sheet.md
Last active May 17, 2026 17:30
This is my technical interview cheat sheet. Feel free to fork it or do whatever you want with it. PLEASE let me know if there are any errors or if anything crucial is missing. I will add more links soon.

ANNOUNCEMENT

I have moved this over to the Tech Interview Cheat Sheet Repo and has been expanded and even has code challenges you can run and practice against!






\

This tool is used to compare microbenchmarks across two versions of code. It's
paranoid about nulling out timing error, so the numbers should be meaningful.
It runs the benchmarks many times, scaling the iterations up if the benchmark
is extremely short, and it nulls out its own timing overhead while doing so. It
reports results graphically with a text interface in the terminal.
You first run it with --record, which generates a JSON dotfile with runtimes
for each of your benchmarks. Then you change the code and run again with
--compare, which re-runs and generates comparison plots between your recorded
and current times. In the example output, I did a --record on the master
@camillebaldock
camillebaldock / codebar.md
Last active August 29, 2015 14:02
Codebar notes

MEDIUM-TERM QUESTIONS

  • Define a high level mission statement
  • what are we focusing on and what we do not plan on teaching?
  • why we are not doing Rails, why you should go back to basics: blog post? disclaimer? (e.g. wordpress/tools/startup-type helpers vs learning how to program from the ground up)
  • Coach inductions
  • Explicit paths/tracks through our training content
  • Development environment surgeries
  • Define roles such as course coordinator/tutorial coordinator/tutorial owner
@ptaoussanis
ptaoussanis / transducers.clj
Last active April 24, 2026 20:17
Quick recap/commentary: Clojure transducers
(comment ; Fun with transducers, v2
;; Still haven't found a brief + approachable overview of Clojure 1.7's new
;; transducers in the particular way I would have preferred myself - so here goes:
;;;; Definitions
;; Looking at the `reduce` docstring, we can define a 'reducing-fn' as:
(fn reducing-fn ([]) ([accumulation next-input])) -> new-accumulation
;; (The `[]` arity is actually optional; it's only used when calling
;; `reduce` w/o an init-accumulator).
@progrium
progrium / prog
Last active January 20, 2017 18:48
playing around with a little bash subcommand environment
#!/bin/bash
cmd-hello() {
declare desc="Displays a friendly hello"
declare firstname="$1" lastname="$2"
echo "Hello, $firstname $lastname."
}
cmd-help() {
declare desc="Shows help information for a command"
@vdm
vdm / ixgbevf-upgrade.sh
Last active November 28, 2019 21:35
ixgbevf 2.16.1 upgrade for AWS EC2 SR-IOV "Enhanced Networking" on Ubuntu 14.04 (Trusty) LTS
ssh ubuntu@n.n.n.n "bash -s -x" -- <ixgbevf-upgrade.sh
@imjasonh
imjasonh / markdown.css
Last active September 3, 2025 22:12
Render Markdown as unrendered Markdown (see http://jsbin.com/huwosomawo)
* {
font-size: 12pt;
font-family: monospace;
font-weight: normal;
font-style: normal;
text-decoration: none;
color: black;
cursor: default;
}