Nathan Suh nsuh

MapReduce Patterns

Roy Keyes

17 Sep 2014 - This is a post on my blog.

MapReduce is a powerful algorithm for processing large sets of data in a distributed, parallel manner. It has proven very popular for many data processing tasks, particularly using the open source Hadoop implementation.

MapReduce basics

The most basic idea powering MapReduce is to break large data sets into smaller chunks, which are then processed separately (in parallel). The results of the chunk processing are then collected.

	from math import sqrt

	def put_kernels_on_grid (kernel, pad = 1):

	'''Visualize conv. filters as an image (mostly for the 1st layer).
	Arranges filters into a grid, with some paddings between adjacent filters.

	Args:
	kernel: tensor of shape [Y, X, NumChannels, NumKernels]
	pad: number of black pixels around each filter (between them)

	# 0 is too far from ` ;)
	set -g base-index 1

	# Automatically set window title
	set-window-option -g automatic-rename on
	set-option -g set-titles on

	#set -g default-terminal screen-256color
	set -g status-keys vi
	set -g history-limit 10000