Edgar Marca matiskay

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

Text Classification

To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.

Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.

For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.

Loading Examples

Dataset Resources

UCI Datasets: http://archive.ics.uci.edu/ml/datasets.html
Enron Spam Datasets: http://csmining.org/index.php/enron-spam-datasets.html
Spamassassin Spam Dataset: http://spamassassin.apache.org/publiccorpus/
2006 ECML Discovery Challenge http://ecmlpkdd2006.org/challenge.html

Setting a new ubuntu server

	# Mathieu Blondel, September 2010
	# License: BSD 3 clause

	import numpy as np
	from numpy import linalg
	import cvxopt
	import cvxopt.solvers

	def linear_kernel(x1, x2):
	return np.dot(x1, x2)

	library('testthat');

	test_that('The Standard Desviation of n numbers', {
	x <- c(1, 2, 3, 4, 5, 6, 7, 8);
	sd <- sd(x);

	expect_that(sd, equals(2.449490))
	});

	javascript:( function() {
	console.group( 'Performance Information for all entries of ' + window.location.href );
	console.log( '\n-> Duration is displayed in ms\n ' )

	var entries = window.performance.getEntries();

	entries = entries.sort( function( a, b ) {
	return b.duration - a.duration;
	} );

	#!/bin/bash

	# config
	IMGS=(
	"
	+ o + + o \n\
	+ o o + + \n\
	+ o + o + \n\
	o + 0 \n\
	o + + + + \n\

	SESSION_PAGE="https://streamza.com/api/sessions/_login"


	COOKIE_FILE="wget-cookies.txt"
	USERNAME="USERNAME"
	PASSWORD="PASSWORD"

	FILE_URL="FILE_TO_DOWNLOAD"

	if [[ ! -f $COOKIE_FILE ]]; then

	require 'mechanize'

	@username = 'USERNAME'
	@password = 'PASSWORD'
	@download_path = File.expand_path '~/videos'
	@wget_cookie = File.expand_path(File.dirname(__FILE__)) + '/wget-cookies.txt'

	unless File.directory? @download_path
	puts "@{download_path} doesn't exist!"
	exit