Skip to content

Instantly share code, notes, and snippets.

@dpressel
dpressel / gist:d2961a8775b3ed7798d0
Created February 14, 2015 21:12
Distance between points on the earth (Vincenty algorithm) in JS
// Vincenty distance in JS
(function (geof) {
var A = 6378137.0 // Meters
// Flattening
var F = 1/298.257223563
// Semi-minor axis
var B = A * (1.0 - F) // 6356752.31424518
// First ellipticity squared
var ESQR = 2*F - (F*F)
var METERS_TO_MILES = 0.000621371192
@dpressel
dpressel / counts.py
Last active August 29, 2015 14:19
PMI,Sent score for AffLex/NegLex s140 lexicon
import os.path
from collections import Counter
from csv import *
import math
import sys
LOG2_SCALE = 1 / math.log(2)
NEG = 'neg'
POS = 'pos'
@dpressel
dpressel / counts.py
Last active August 29, 2015 14:19
Sentiment score calculation as described in State-of-the-Art in Sentiment Analysis ofShort Informal Texts
import os.path
from collections import Counter
from csv import *
import math
import sys
LOG2_SCALE = 1 / math.log(2)
NEG = 'neg'
POS = 'pos'
@dpressel
dpressel / counts-single.py
Last active August 29, 2015 14:19
Actual sentiment score calculation performed for State-of-the-Art in Sentiment Analysis of Short Informal Texts. Does not actually start from two corpora that are split, but rather just assumes negated contexts are suffixed in a single.
import os.path
from collections import Counter
from csv import *
import math
import sys
LOG2_SCALE = 1 / math.log(2)
"""
This version of counts assumes a single corpus, with negated contexts already written.
Here we generate a single lexicon file and
@dpressel
dpressel / ConsumerProducerQueue.h
Created September 16, 2015 14:42
C++ 11 Consumer Producer Buffer with a single Condition Variable
#ifndef __CONSUMERPRODUCERQUEUE_H__
#define __CONSUMERPRODUCERQUEUE_H__
#include <queue>
#include <mutex>
#include <condition_variable>
/*
* Some references in order
*
@dpressel
dpressel / SumWordVecDatasetReader.java
Created November 12, 2015 13:26
Turn a set of space delimited words and a label into a sum of dense (word vector) representations using medallia's Word2Vec impl.
package org.n3rd.util;
import com.google.common.collect.ImmutableList;
import com.medallia.word2vec.Searcher;
import com.medallia.word2vec.Word2VecModel;
import org.sgdtk.ArrayDouble;
import org.sgdtk.DenseVectorN;
import org.sgdtk.FeatureVector;
import java.io.BufferedReader;
@dpressel
dpressel / OrderedEmbeddedDatasetReader.java
Created November 12, 2015 13:30
Read in a sentence of word vectors with optional padding for conv. using medallia's Word2vec impl.
package org.n3rd.util;
import com.google.common.collect.ImmutableList;
import com.medallia.word2vec.Searcher;
import com.medallia.word2vec.Word2VecModel;
import org.sgdtk.DenseVectorN;
import org.sgdtk.FeatureVector;
import java.io.BufferedReader;
import java.io.File;
@dpressel
dpressel / train_from_json.py
Created April 1, 2016 19:33
Train on preprocessed politeness input
# Check performance of baseline on a held out set that is decimation sampled
# Over the ranks descending
#
# http://www.mpi-sws.org/~cristian/Politeness_files/politeness.pdf
#
import sys
import cPickle
import numpy as np
import nltk.data
import json
@dpressel
dpressel / train_from_csv.py
Created April 1, 2016 19:43
Train on preprocessed politeness corpus sentences to make Linear SVM BoW (baseline)
# Check performance of baseline on a held out set that is decimation sampled
# Over the ranks descending
#
# http://www.mpi-sws.org/~cristian/Politeness_files/politeness.pdf
#
#-----------------------------------------------------------------
# Sample output with and without Punkt sentence processing seems
# to not have much effect
import cPickle
import numpy as np
@dpressel
dpressel / polite-deps.js
Created April 1, 2016 19:49
Preprocess TSV files using Stanford Core NLP to generate dependency and sentence features required to train politeness API
// Nashorn (JS) script to create input documents from post-processed TSV files
// from Stanford Politeness corpus.
// TSV looks like
// 1|-1\tText content
//
// Target data looks much like the data described here:
// https://github.com/sudhof/politeness
//
// Original paper is here
// http://www.mpi-sws.org/~cristian/Politeness_files/politeness.pdf