Alrecenk’s gists

Alrecenk / LogisticRegressionInitialGuess.java

Created May 17, 2014 08:25

A simple initial guess for a logistic regression. P and N are the average inputs of the positives and negatives respectively.

	for(int k=0;k<pos.length;k++){
	pp += pos[k]*pos[k] ;
	pn += pos[k]*neg[k] ;// calculate relevant dot products
	nn += neg[k]*neg[k] ;
	}
	//assuming w = alpha * (pos- neg) with b in the last w slot
	//solve for alpha and b so that pos returns 0.75 and neg returns 0.25
	double alphab[] = lineIntersection(pp-pn,1,sinv(0.75),pn-nn,1,sinv(0.25)) ;
	w = new double[input[0].length] ;
	for(int k=0;k<w.length-1;k++){

Alrecenk / LogisticRegressionSimple.java

Last active August 29, 2015 14:01

A logistic regression algorithm for binary classification implemented using Newton's method and a Wolfe condition based inexact line-search.

	/* A logistic regression algorithm for binary classification implemented using Newton's method and
	* a Wolfe condition based inexact line-search.
	*created by Alrecenk for inductivebias.com May 2014
	*/
	public class LogisticRegressionSimple {

	double w[] ; //the weights for the logistic regression
	int degree ; // degree of polynomial used for preprocessing

	//preprocessed list of input/output used for calculating error and its gradients

Alrecenk / RotationForestsplit.java

Created November 5, 2013 04:43

The core learning algorithm for the rotation forest that calculates the best split based on approximate information gain.

	//splits this node if it should and returns whether it did
	//data is assumed to be a set of presorted lists where data[k][j] is the jth element of data when sorted by axis[k]
	public boolean split(int minpoints){
	//if already split or one class or not enough points remaining then don't split
	if (branchnode \|\| totalpositive == 0 \|\| totalnegative == 0 \|\| totalpositive + totalnegative < minpoints){
	return false;
	}else{
	int bestaxis = -1, splitafter=-1;
	double bestscore = Double.MAX_VALUE;//any valid split will beat no split
	int bestLp=0, bestLn=0;

Alrecenk / normalize.java

Last active December 26, 2015 22:09

Normalizing a vector to length one, normalizing a data point into a distribution of mean zero and standard deviation of one, and generating a vector by a normal distribution. Different operations that are named similarly and might be confusion.

	//makes a vector of length one
	public static void normalize(double a[]){
	double scale = 0 ;
	for(int k=0;k<a.length;k++){
	scale+=a[k]*a[k];
	}
	scale = 1/Math.sqrt(scale);
	for(int k=0;k<a.length;k++){
	a[k]*=scale ;
	}

Alrecenk / pseudobootstrap.java

Last active December 26, 2015 04:39

Bootstrap aggregation for a random forest algorithm.

	//bootstrap aggregating of training data for a random forest
	Random rand = new Random(seed);
	treenode tree[] = new treenode[trees] ;
	for(int k=0;k<trees;k++){
	ArrayList<Datapoint> treedata = new ArrayList<Datapoint>()
	for (int j = 0; j < datapermodel; j++){
	//add a random data point to the training data for this tree
	int nj = Math.abs(rand.nextInt())%data.size();
	treedata.add(alldata.get(nj)) ;
	}

Alrecenk / pseudonaivetreelearn.java

Last active December 26, 2015 01:59

Pseudocode for a naive implementation of a decision tree learning algorithm.

	int splitvariable=-1; // split on this variable
	double splitvalue ;//split at this value
	// total positives and negatives used for leaf node probabilities
	int totalpositives,totalnegatives ;
	Datapoint trainingdata[]; //the training data in this node
	treenode leftnode,rightnode;//This node's children if it's a branch

	//splits this node greedily using approximate information gain
	public void split(){
	double bestscore = Maxvalue ;//lower is better so default is very high number

Alrecenk / RotationForestSimple.java

Last active December 25, 2015 17:49

An optimized rotation forest algorithm for binary classification on fixed length feature vectors.

	/*A rotation forest algorithm for binary classification with fixed length feature vectors.
	*created by Alrecenk for inductivebias.com Oct 2013
	*/
	import java.util.ArrayList;
	import java.util.Arrays;
	import java.util.Random;

	public class RotationForestSimple{

	double mean[] ; //the mean of each axis for normalization

Alrecenk / basicLDLsolve.java

Last active December 23, 2015 17:09

Solves for C given an LDL decomposition in the form LDL^T C = X^T Y.

	public double[] solvesystem(double L[][], double D[], double XTY[]){
	//back substitution with L
	double p[] = new double[XTY.length] ;
	for (int j = 0; j < inputs; j++){
	p[j] = XTY[j] ;
	for (int i = 0; i < j; i++){
	p[j] -= L[j][i] * p[i];
	}
	}
	//Multiply by inverse of D matrix

Alrecenk / basicLDL.java

Last active December 23, 2015 17:09

A basic LDL decomposition of a matrix X times its transpose.

	double[][] L = new double[inputs][ inputs];
	double D[] = new double[inputs] ;
	//for each column j
	for (int j = 0; j < inputs; j++){
	D[j] = XTX[j][j];//calculate Dj
	for (int k = 0; k < j; k++){
	D[j] -= L[j][k] * L[j][k] * D[k];
	}
	//calculate jth column of L
	L[j][j] = 1 ; // don't really need to save this but its a 1

Alrecenk / LeastSquaresTrain.java

Last active August 1, 2016 04:16

This code provides all functions necessary to perform and apply a least squares fit of a polynomial from multiple inputs to multiple outputs. The fit is performed using an in-place LDL Cholesky decomposition based on the Cholesky–Banachiewicz algorithm.

	//performs a least squares fit of a polynomial function of the given degree
	//mapping each input[k] vector to each output[k] vector
	//returns the coefficients in a matrix
	public static double[][] fitpolynomial(double input[][], double output[][], int degree){
	double[][] X = new double[input.length][];
	//Run the input through the polynomialization and add the bias term
	for (int k = 0; k < input.length; k++){
	X[k] = polynomial(input[k], degree);
	}
	int inputs = X[0].length ;//number of inputs after the polynomial

Alrecenk Alrecenk