Skip to content

Instantly share code, notes, and snippets.

View dmarx's full-sized avatar

David Marx dmarx

View GitHub Profile
@dmarx
dmarx / undirected_to_directed_bipartite_projection.R
Last active January 3, 2017 20:54
Novel (?) technique for inferring a directed bipartite projection from an undirected bipartite graph. Code is for pedagogical demonstration to accompany the article here: http://dmarx.github.io/map-of-reddit-by-active-users/
library(igraph)
# Experiment parameters
n=10 # Primary class (i.e. subreddits)
m=100 # Secondary class (i.e. users)
threshold = .5 # edge threshold
######################################
seed(123)
@dmarx
dmarx / election_rage.py
Last active November 8, 2016 22:10
Get a pulse on the election-relevant conversation on reddit by streaming sentences containing some relevant terms
import praw
import string
import re
import nltk
r = praw.Reddit('anger fuel comment monitor, by /u/shaggorama')
targets = ['hillary', 'trump', 'hilary', 'election']
punc_pat = re.compile('['+string.punctuation+']')
blacklist = ['AutoModerator', '2016VoteBot']
@dmarx
dmarx / dynamic_edgelist_demo.r
Created September 2, 2016 20:57
Given an edgelist of a dynamic graph in the form of (timestamp, source, target) triplets, construct a compressed edgelist in the form (onset, terminus, source, target)
#' Try to construct a dynamic graph object from an edgelist with sequential timestamps, to use render.d3movie per:
#' https://rpubs.com/kateto/netviz
#'
#install.packages('statnet')
#install.packages("ndtv")
library(igraph)
library(statnet)
library(ndtv)
@dmarx
dmarx / binomial_algorithm_benchmarking.py
Created February 29, 2016 19:37
Experiments homebrewing a binomial CDF (or approximation to binomial CDF) to enable a poisson hypothesis test for "burst" scoring in an oracle environment. Unavailable functions: factorial, nCr, dbinom, pbinom, dnorm, pnorm.
'''
Binomial coefficient algorithm tests
Doing this in python just for basic prototyping, but we'll need to ultimately
port this to oracle.
'''
from __future__ import division
import timeit
import math
@dmarx
dmarx / TPOT.export.py
Last active December 18, 2015 17:07
Anticipated form of TPOT.export after refactoring is completed
def export(self, output_file_name):
"""Exports the current optimized pipeline as Python code.
Parameters
----------
output_file_name: string
String containing the path and file name of the desired output file
Returns
@dmarx
dmarx / KNNc.py
Last active December 17, 2015 21:14
Untested (and definitely non-working) demo for proposed class template for TPOT operator modularization (https://github.com/rhiever/tpot)
# tpot/operators/KNNc.py
from base import LearnerOperator
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
class KNNc(LearnerOperator):
def __init__(self):
super(KNNc, self).__init__(
func = KNeighborsClassifier,
#' # Constructing a naive bayes classifer from scratch
#'
#' ## Background: bayes rule
#'
#' Recall bayes rule:
#'
#' $$P(\theta|X) = \frac{P(X|\theta)P(\theta)}{P(X)}$$
#'
#' Each components of this formula has a name:
#'
install.packages('caret')
install.packages('ccd')
install.packages('d3Network')
install.packages('data.table')
install.packages('dplyr')
install.packages('DMwR')
install.packages('e1071')
install.packages('ergm')
install.packages('ff')
install.packages('foreach')
@dmarx
dmarx / denseMatrix_to_sparseMatrix.R
Last active March 30, 2017 19:25
Code snippet demonstrating a vectorized method for transforming a dense matrix to a sparse matrix in R
#' This is the right way
dense_to_sparse = function(m, binary=FALSE){
library(Matrix)
xy = which(abs(m)>0, arr.ind=TRUE)
if(binary){
dense = sparseMatrix(i=xy[,1], j=xy[,2], x=1, dims=dim(m) )
} else {
dense = sparseMatrix(i=xy[,1], j=xy[,2], x=m[xy], dims=dim(m) )
}
#' Arrival rate of new max values given some generating distribution
#'
#' To do: Wrapper function to perform repeated simulations from same generating distribution.
#' AFter each iteration, convert output into pairs of (time last max observed, wait time to next max).
#' Spaghetti plot to try to infer a conditional distribution of wait time to next max, given when
#' last max observed. Some kind of exteme value distribution, probably Gumbel or Frechet or something...
```{r}
set.seed(123)