Skip to content

Instantly share code, notes, and snippets.

@mattb
mattb / gist:3888345
Created October 14, 2012 11:53
Some pointers for Natural Language Processing / Machine Learning

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/

@max-mapper
max-mapper / helloworld.js
Created November 27, 2012 06:55
droneduino
var serialport = require('node-serialport')
var sp = new serialport.SerialPort("/dev/ttyO3", {
parser: serialport.parsers.raw,
baud: 9600
})
sp.on('data', function(chunk) {
console.log(chunk.toString('hex'), chunk.toString(), chunk)
})

Changing the Public API

When I started looking for ways to help on the "github.com/alecthomas/gozmq" package a few months ago, a recurring topic in discussions was that an earlier decision to provide a hidden struct type through a public interface type had become restrictive. We couldn't just drop the interface and make the struct public because that would be backwards-incompatible. Or could we?

No

This had to be the first considered option, and was the standing decision at the time. If we could continue providing what the package aims to provide without breaking backwards compatibility, then by all means, we should not break it.

Unfortunately this meant we couldn't provide all that we aimed to provide. Or could we?

@creationix
creationix / run.js
Last active March 7, 2017 18:36
universal callback/continuable/thunk generator runner
function run(generator) {
// Pass in resume for no-wrap function calls
var iterator = generator(resume);
var data = null, yielded = false;
next();
check();
function next(item) {
var cont = iterator.next(item).value;
@karlseguin
karlseguin / dnscache.go
Created June 12, 2013 07:43
Cache DNS responses and refresh them using a single goroutine on a 1 minute timer. Avoids having a spike of threads from cgo launched under load.
package dnscache
import (
"net"
"sync"
"time"
"math/rand"
)
var (