coppelia machine learning and analytics coppeliaMLA

UK statistician into machine learning, simulation, visualisation, maths and coding. Building a business specialising in machine learning and analytics.

87 followers · 1 following

London
www.coppelia.io

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

coppeliaMLA / visGAExamples.R

Created June 27, 2014 10:52

Examples for visualising the path of a genetic algorithm

	#Maximize a mixture of multivariate normal distributions
	library(mvtnorm)
	mnMix<-function(args){
	mean.vec.d1<-rep(0.3,5)
	std.vec.d1<-diag(rep(1,5))
	mean.vec.d2<-rep(1,5)
	std.vec.d2<-diag(rep(1.5,5))
	mean.vec.d3<-c(1, 5, 2, 1, 0)
	std.vec.d3<-diag(rep(0.5, 5))
	if (args[1]<0){

coppeliaMLA / visGAPath.R

Last active August 29, 2015 14:03

Visualising the path of a genetic algorithm

	# *--------------------------------------------------------------------
	# \| FUNCTION: visGAPath
	# \| Function for visualising the path of a genetic algorithmn using
	# \| principal components analysis
	# *--------------------------------------------------------------------
	# \| Version \|Date \|Programmer \|Details of Change
	# \| 01 \|18/04/2012\|Simon Raper \|first version.
	# *--------------------------------------------------------------------
	# \| INPUTS: func The function to be optimised
	# \| npar The number of parameters to optimise over

coppeliaMLA / bagHclust.R

Created June 26, 2014 16:28

Bagging algorithm for hclust

	library(reshape2)

	#Bagging hierarchical clustering

	bagHClust<-function(data, n, k, size, outlier.th) {

	clus.bs<-NULL

	for (i in 1:n) {

coppeliaMLA / SankeyClusComp

Last active August 29, 2015 14:03

Generates the data for comparing two clusters using a Sankey diagram

	clusComp<-function(cl1, cl2, num.clus){

	#Set up object for recording clusters
	clus.change<-NULL

	ct1<-cutree(cl1, k=num.clus)
	add.1<-data.frame(size=rep(1, length(ct1)), ind=names(ct1), cluster=paste0(1, ".", ct1))
	ct2<-cutree(cl2, k=num.clus)
	add.2<-data.frame(size=rep(2, length(ct2)), ind=names(ct2), cluster=paste0(2, ".", ct2))

coppeliaMLA / compCorrMI.R

Created June 25, 2014 16:00

Look at the relationship between MI and correlation for binary vars (since it's quicker than doing the maths)

	#Check the relationship between correlation and mutual information for binary vars
	store<-NULL
	for (i in 1:1000){
	prob.1<-runif(1)
	prob.2<-runif(1)
	x<-rbinom(10000, 1, prob.1)
	y<-rbinom(10000, 1, prob.2)
	c<-cor(x,y)
	m<-mi.empirical(table(x,y))
	store<-rbind(store, data.frame(c=c, m=m))

coppeliaMLA / confusion.htm

Created June 24, 2014 07:52

Exploration of a confusion matrix using tangle.js


	<!DOCTYPE html>
	<html>
	<head>

	<title>Tangle: a JavaScript library for reactive documents</title>
	<link rel="stylesheet" href="http://worrydream.com/Tangle/TangleKit/TangleKit.css" type="text/css">
	<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/mootools.js"></script>
	<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/sprintf.js"></script>
	<script type="text/javascript" src="http://worrydream.com/Tangle/TangleKit/BVTouchable.js"></script>

coppeliaMLA / DendToForce.R

Created June 20, 2014 16:30

Converts a hclust dendrogram into a graph in JSON for input into D3

	#Run hclust
	hc <- hclust(dist(USArrests[1:40,]), "ave")

	#Function for extracting nodes and links
	extractGraph<-function(hc){

	n<-length(hc$order)
	m<-hc$merge

	links<-data.frame(source=as.numeric(), target=as.numeric(), value=as.numeric())

coppeliaMLA / clusterSankey.R

Last active August 29, 2015 14:02

Visualising cluster stability using a Sankey diagram

	#Sequence for adding new data
	s<-seq(20,50, by=5)

	#Set up object for recording clusters
	clus.change<-NULL

	#Cycle through the clustering solutions
	for (i in s){

	hc <- hclust(dist(USArrests[1:i,]), "ave")

coppeliaMLA / binDiff.R

Created March 21, 2014 08:14

A function that gives the probability mass function for the difference between to binomially distributed random variables

	modBin<-function(k, n, p){

	if (k<=n) {
	return(dbinom(k, n, p))
	}
	else {
	return(0)
	}
	}

coppeliaMLA / csvToPipe.py

Created March 7, 2014 12:50

Another useful bit of code for preparing flat files for Hive. Takes in csvs with double quote text delimiters and outputs pipe delimited files.

	import os, csv

	progDir = '/pathToFolderContainingCSVs/'



	for filename in os.listdir(progDir):
	if filename != '.DS_Store':
	with open(progDir+filename, 'rb') as csvfile:
	progReader = csv.reader(csvfile, delimiter=',', quotechar='"')

Newer Older

	# *--------------------------------------------------------------------
	# \| FUNCTION: visGAPath
	# \| Function for visualising the path of a genetic algorithmn using
	# \| principal components analysis
	# *--------------------------------------------------------------------
	# \| Version \|Date \|Programmer \|Details of Change
	# \| 01 \|18/04/2012\|Simon Raper \|first version.
	# *--------------------------------------------------------------------
	# \| INPUTS: func The function to be optimised
	# \| npar The number of parameters to optimise over