Andrei Dragomir adragomir

Spying on Hadoop with strace

As you may already know, I really like strace. (It has a whole category on this blog). So when the people at Big Data Montreal asked if I wanted to give a talk about stracing Hadoop, the answer was YES OBVIOUSLY.

I set up a small Hadoop cluster (1 master, 2 workers, replication set to 1) on Google Compute Engine to get this working, so that's what we'll be talking about. It has one 14GB CSV file, which contains part of this Wikipedia revision history dataset

Let's start diving into HDFS! (If this is familiar to you, I talked about a lot of this already in Diving into HFDS. There are new things, though! At the end of this we edit the blocks on the data node and see what happens and it's GREAT.)

$ snakebite ls -h /

Internet Scale Services Checklist

A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."

http://mvdirona.com/jrh/talksandpapers/jamesrh_lisa.pdf

Basic tenets

Does the design expect failures to happen regularly and handle them gracefully?
Have we kept things as simple as possible?

Channels Are Not Enough

... or Why Pipelining Is Not That Easy

Golang Concurrency Patterns for brave and smart.

By @kachayev

Intro

Operational PGP

This is a guide on how to email securely.

There are many guides on how to install and use PGP to encrypt email. This is not one of them. This is a guide on secure communication using email with PGP encryption. If you are not familiar with PGP, please read another guide first. If you are comfortable using PGP to encrypt and decrypt emails, this guide will raise your security to the next level.

	/* http://redd.it/2z68di */
	#define _BSD_SOURCE // MAP_ANONYMOUS
	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <string.h>
	#include <sys/mman.h>

	#define PAGE_SIZE 4096

	#include <stdlib.h>
	#include <stdio.h>
	#include <stdarg.h>
	#include <new>

	// compile and run:
	// g++ -std=c++11 -nodefaultlibs -fno-exceptions -fno-rtti -o test.o -c test.cpp && gcc -o test test.o && ./test

	static void panic(const char *format, ...) __attribute__ ((noreturn)) __attribute__ ((format (printf, 1, 2)));
	static void panic(const char *format, ...) {

	var allDone = function(done){ // takes a callback that will be called as callback(errors, values) when all async parallel operations are finished.
	var context = {task: {}, data: {}};
	context.end = function(e,v){ return done(e,v), done = function(){} }; // this can always be called if you want to terminate early, like because an error.
	context.add = function(fn, id){ // if the async operation you are doing replies with standard fn(err, value) then just pass in a string ID, else your own callback and ID.
	context.task[id = (typeof fn == 'string')? fn : id] = false;
	var next = function(err, val){
	context.task[id] = true; // good, we're done with this one!
	if(err){ (context.err = context.err \|\| {})[id] = err } // record errors.
	context.data[id] = val; // record the values.
	for(var i in c.task){ if(c.task.hasOwnProperty(i)){ // loop over the async task checker

	require "erb"
	require "pathname"

	DOT_TEMPLATE=<<-END
	digraph {
	size="20,20";
	overlap=false;
	sep=0.4;
	graph [fontname=Helvetica,fontsize=10];
	node [fontname=Helvetica,fontsize=10];

	(Chapters marked with * are already written. This gets reorganized constantly
	and 10 or so written chapters that I'm on the fence about aren't listed.)

	Programmer Epistemology
	* Dispersed Cost vs. Reduced Cost
	* Verificationist Fallacy
	* Mistake Metastasis
	The Overton Window
	Epicycles All The Way Down
	The Hyperspace Gates Were Just There