ionas ionas

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

Nutch - the crawler (fetches and parses websites)
HBase - filesystem storage for Nutch (Hadoop component, basically)

PostgreSQL on Lion

Much to my dismay, PostgreSQL on OS X is not as trouble-free to install and configure as my experiences with it on a variety of Linux distributions. This Gist documents how to overcome these issues using the Homebrew installation of postgresql.

This is an effort to document how to get setup to a coworker who was having troubles with Postgres on Lion recently.

Installation

This is usually the easy part:

OS X Preferences

most of these require logout/restart to take effect

# Enable character repeat on keydown
defaults write -g ApplePressAndHoldEnabled -bool false

# Set a shorter Delay until key repeat

	#!/bin/bash
	# Warning: This is NOT a productive script, but for local dev envs only!

	echo "### INSTALL/UPDATE ###";
	php composer.phar selfupdate

	git pull

	php composer.phar install --prefer-dist --no-dev --optimize-autoloader --no-interaction

	# Google Art Project fullsize image downloader.
	# By Henrik Nyh <http://henrik.nyh.se> 2011-02-05 under the MIT license.
	# Requires Ruby and ImageMagick.
	#
	# Usage e.g.:
	# ruby google_art_project.rb http://www.googleartproject.com/museums/tate/portrait-of-william-style-of-langley-174
	#
	# You can specify multiple URLs on the command line, separated by space.
	# Or you can specify no URLs on the command line and instead list them at the end of this file, one on each line,
	# with "__END__" before the list.


	/*-
	* fuse_wait - Light wrapper around a FUSE mount program that waits
	* for the "mounted" notification before exiting.
	*
	* Copyright (C) 2007-2009 Erik Larsson
	*
	* This program is free software; you can redistribute it and/or
	* modify it under the terms of the GNU General Public License
	* as published by the Free Software Foundation; either version 2

	class ApplicationModel < ActiveRecord::Base
	self.abstract_class = true

	class_attribute :formatted_attributes_options
	self.formatted_attributes_options = {}

	def self.formatted_attributes(*attributes)
	options = attributes.extract_options!
	attributes.each do \|attribute\|
	formatted_attributes_options[attribute] = options