tdunning’s gists

tdunning / figure.r

Created June 18, 2021 07:30

Snippet of R to recreate an analysis of t-digest interpolation on real data

	# Analysis of how two t-digests see some sample data
	png("figure.png", width=1200, height=1000, points=30)
	# the first few actual data points with filler for the remainder
	d = c(241, 543, 575, 702, 890, 1530, 1940, 2166, 2168, rep(3000,33))
	# the cumulative distribution function
	f = ecdf(d)
	# plot the actual CDF
	plot(x=d, y=f(d), xlim = c(700, 2300), ylim = c(0.08, 0.25), type='s',
	xlab="Sample value", ylab="Cumulative Distribution Function",
	cex.lab=1.3)

tdunning / lorenz-animator.jl

Created May 4, 2021 00:49

Animates the evolution of an initially tight group of points ... my intro to Julia


	using DifferentialEquations
	using Plots
	using Statistics
	using LinearAlgebra

	function lorenz!(du, u, p, t)
	x, y, z = u
	σ, ρ, β = p

tdunning / shift-detection.r

Last active December 6, 2020 02:15

Sample code that shows how distributional changes in a single tail can be detected accurately using counts targeted at particular parts of a reference dataset

	### Draws a figure illustrating change detection in the distribution of synthetic data.
	### Each dot represents a single time period with 1000 samples. Before the change,
	### the data is sampled from a unit normal distribution. After the change, 20 samples
	### in each time period are taken from N(3,1). Comparing counts with a chi^2 test that
	### is robust to small expected counts robustly detects this shift.

	### log-likelihood ratio test for multinomial data
	llr = function(k) {
	2 * sum(k) * (H(k) - H(rowSums(k)) - H(colSums(k)))
	}

tdunning / mcem.r

Last active December 7, 2020 22:59

Implementation of Monte Carlo EM algorithm for reconstructing a standard distribution from censored observations

	### This is a demonstration of a Monte Carlo Expectation Maximization
	### algorithm that can recover the mean and standard deviation of
	### truncated normally distributed data. We get 10,000 samples from
	### a unit normal distribution, but every sample below 0.5 is truncated
	### to that value. Every sample above 2.5 is truncated to that value.
	### These choices were made to get quick and visually appealling convergence
	### but the algorithm still converges for any choice. The converges
	### could be very, very slow if there is little information in the samples
	### and the final answer could have substantial uncertainty. For instance,
	### if we truncated at 4 and 6, almost all samples would be piled up at

tdunning / tesla-range-sim

Last active July 27, 2020 23:21

	### This code builds a simple physical model of the range of an 85kWh Tesla Model S and
	### compares it to real data. The data here is digitized from
	### https://www.tesla.com/blog/model-s-efficiency-and-range

	### The model here accounts for aerodynamic drag, viscous drag, constant
	### friction and constant power drain

	### First the digitized data
	x = read.csv(text="v,range
	10.22976354700292, 393.9005561997566

tdunning / viewpoints.r

Last active July 12, 2019 20:31

how different definition of distance changes our view of clustering

	# you can run this script with the following R command:
	# source('https://gist.githubusercontent.com/tdunning/badb88043d41d916a3148c669f2fb0cd/raw/8d3289fdbf2a7999bd5d9687002488b904e1d82f/viewpoints.r')

	set.seed(1)
	noise = matrix(nrow=2000, ncol=8, data=rnorm(48500))
	offsets = matrix(
	c(rep(-1,1000), rep(1,1000),
	rep(-1, 500), rep(1, 500), rep(-1, 500), rep(1, 500)),
	ncol=2)
	xy = rbind(matrix(nrow=2000, ncol=2, data=rnorm(22000))) + offsets 8

tdunning / Summarizer.java

Created April 12, 2019 23:18

Demonstrates the summarization of database fields using t-digest

	package com.tdunning.tdigest.quality;

	import com.google.common.collect.ImmutableList;
	import com.google.common.io.Resources;
	import com.tdunning.math.stats.MergingDigest;
	import com.tdunning.math.stats.TDigest;
	import org.junit.Test;

	import java.io.File;
	import java.io.IOException;

tdunning / MomentSketchOffsetTest.java

Created March 25, 2019 22:12

Test for moment sketches versus offset distribution

	public class MomentSketchOffsetTest {
	@Test
	public void testOffsetUniform() throws Exception {
	MomentSketch ms = new MomentSketch(1e-10);
	ms.setSizeParam(7);
	ms.initialize();

	double[] data = TestDataSource.getUniform(20e1, 20e1 + 1, 1_000_000);
	ms.add(data);

tdunning / HighDynamicRangeQuantile.java

Last active August 25, 2017 06:50 — forked from oertl/HighDynamicRangeQuantile.java

Simpler and slightly faster version of Otmar Oertl's idea for improving FastHistogram / HdrHistogram

	public class HighDynamicRangeQuantile {
	private final long[] counts;
	private double minimum = Double.POSITIVE_INFINITY;
	private double maximum = Double.NEGATIVE_INFINITY;
	private long underFlowCount = 0;
	private long overFlowCount = 0;
	private final double factor;
	private final double offset;
	private final double minExpectedQuantileValue;

tdunning / replication-diagram.graffle

Last active August 4, 2017 17:46 — forked from tgrall/replication.md

View raw

Ted Dunning tdunning