Skip to content

Instantly share code, notes, and snippets.

View coppeliaMLA's full-sized avatar

coppelia machine learning and analytics coppeliaMLA

View GitHub Profile
@coppeliaMLA
coppeliaMLA / visGAExamples.R
Created June 27, 2014 10:52
Examples for visualising the path of a genetic algorithm
#Maximize a mixture of multivariate normal distributions
library(mvtnorm)
mnMix<-function(args){
mean.vec.d1<-rep(0.3,5)
std.vec.d1<-diag(rep(1,5))
mean.vec.d2<-rep(1,5)
std.vec.d2<-diag(rep(1.5,5))
mean.vec.d3<-c(1, 5, 2, 1, 0)
std.vec.d3<-diag(rep(0.5, 5))
if (args[1]<0){
@coppeliaMLA
coppeliaMLA / visGAPath.R
Last active August 29, 2015 14:03
Visualising the path of a genetic algorithm
# *--------------------------------------------------------------------
# | FUNCTION: visGAPath
# | Function for visualising the path of a genetic algorithmn using
# | principal components analysis
# *--------------------------------------------------------------------
# | Version |Date |Programmer |Details of Change
# | 01 |18/04/2012|Simon Raper |first version.
# *--------------------------------------------------------------------
# | INPUTS: func The function to be optimised
# | npar The number of parameters to optimise over
@coppeliaMLA
coppeliaMLA / bagHclust.R
Created June 26, 2014 16:28
Bagging algorithm for hclust
library(reshape2)
#Bagging hierarchical clustering
bagHClust<-function(data, n, k, size, outlier.th) {
clus.bs<-NULL
for (i in 1:n) {
@coppeliaMLA
coppeliaMLA / SankeyClusComp
Last active August 29, 2015 14:03
Generates the data for comparing two clusters using a Sankey diagram
clusComp<-function(cl1, cl2, num.clus){
#Set up object for recording clusters
clus.change<-NULL
ct1<-cutree(cl1, k=num.clus)
add.1<-data.frame(size=rep(1, length(ct1)), ind=names(ct1), cluster=paste0(1, ".", ct1))
ct2<-cutree(cl2, k=num.clus)
add.2<-data.frame(size=rep(2, length(ct2)), ind=names(ct2), cluster=paste0(2, ".", ct2))
@coppeliaMLA
coppeliaMLA / compCorrMI.R
Created June 25, 2014 16:00
Look at the relationship between MI and correlation for binary vars (since it's quicker than doing the maths)
#Check the relationship between correlation and mutual information for binary vars
store<-NULL
for (i in 1:1000){
prob.1<-runif(1)
prob.2<-runif(1)
x<-rbinom(10000, 1, prob.1)
y<-rbinom(10000, 1, prob.2)
c<-cor(x,y)
m<-mi.empirical(table(x,y))
store<-rbind(store, data.frame(c=c, m=m))
@coppeliaMLA
coppeliaMLA / confusion.htm
Created June 24, 2014 07:52
Exploration of a confusion matrix using tangle.js
<!DOCTYPE html>
<html>
<head>
<title>Tangle: a JavaScript library for reactive documents</title>
<link rel="stylesheet" href="http://worrydream.com/Tangle/TangleKit/TangleKit.css" type="text/css">
<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/mootools.js"></script>
<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/sprintf.js"></script>
<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/BVTouchable.js"></script>
@coppeliaMLA
coppeliaMLA / DendToForce.R
Created June 20, 2014 16:30
Converts a hclust dendrogram into a graph in JSON for input into D3
#Run hclust
hc <- hclust(dist(USArrests[1:40,]), "ave")
#Function for extracting nodes and links
extractGraph<-function(hc){
n<-length(hc$order)
m<-hc$merge
links<-data.frame(source=as.numeric(), target=as.numeric(), value=as.numeric())
@coppeliaMLA
coppeliaMLA / clusterSankey.R
Last active August 29, 2015 14:02
Visualising cluster stability using a Sankey diagram
#Sequence for adding new data
s<-seq(20,50, by=5)
#Set up object for recording clusters
clus.change<-NULL
#Cycle through the clustering solutions
for (i in s){
hc <- hclust(dist(USArrests[1:i,]), "ave")
@coppeliaMLA
coppeliaMLA / binDiff.R
Created March 21, 2014 08:14
A function that gives the probability mass function for the difference between to binomially distributed random variables
modBin<-function(k, n, p){
if (k<=n) {
return(dbinom(k, n, p))
}
else {
return(0)
}
}
@coppeliaMLA
coppeliaMLA / csvToPipe.py
Created March 7, 2014 12:50
Another useful bit of code for preparing flat files for Hive. Takes in csvs with double quote text delimiters and outputs pipe delimited files.
import os, csv
progDir = '/pathToFolderContainingCSVs/'
for filename in os.listdir(progDir):
if filename != '.DS_Store':
with open(progDir+filename, 'rb') as csvfile:
progReader = csv.reader(csvfile, delimiter=',', quotechar='"')