Skip to content

Instantly share code, notes, and snippets.

View matiskay's full-sized avatar

Edgar Marca matiskay

View GitHub Profile
@matiskay
matiskay / svm.py
Created August 28, 2014 03:12 — forked from mblondel/svm.py
# Mathieu Blondel, September 2010
# License: BSD 3 clause
import numpy as np
from numpy import linalg
import cvxopt
import cvxopt.solvers
def linear_kernel(x1, x2):
return np.dot(x1, x2)
library('testthat');
test_that('The Standard Desviation of n numbers', {
x <- c(1, 2, 3, 4, 5, 6, 7, 8);
sd <- sd(x);
expect_that(sd, equals(2.449490))
});
javascript:( function() {
console.group( 'Performance Information for all entries of ' + window.location.href );
console.log( '\n-> Duration is displayed in ms\n ' )
var entries = window.performance.getEntries();
entries = entries.sort( function( a, b ) {
return b.duration - a.duration;
} );

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Screencapture GIF

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

#!/bin/bash
# config
IMGS=(
"
+ o + + o \n\
+ o o + + \n\
+ o + o + \n\
o + 0 \n\
o + + + + \n\
SESSION_PAGE="https://streamza.com/api/sessions/_login"
COOKIE_FILE="wget-cookies.txt"
USERNAME="USERNAME"
PASSWORD="PASSWORD"
FILE_URL="FILE_TO_DOWNLOAD"
if [[ ! -f $COOKIE_FILE ]]; then
@matiskay
matiskay / peepcode.rb
Last active January 1, 2016 11:28 — forked from gertig/peepcode.rb
require 'mechanize'
@username = 'USERNAME'
@password = 'PASSWORD'
@download_path = File.expand_path '~/videos'
@wget_cookie = File.expand_path(File.dirname(__FILE__)) + '/wget-cookies.txt'
unless File.directory? @download_path
puts "@{download_path} doesn't exist!"
exit

Text Classification

To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.

Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.

For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.

Loading Examples