Christian Theil Have cth

Objectives

The dimensions of data on DNA variation such as single nucleotide polymorphisms or SNPs can be very large, involving thousands or millions of SNPs, measured on potentially thousands of individuals. Typical genotyping platforms may examine from 50K(K=thousand) to 2.5M (M= millions) SNPs. Some platforms could be even denser. There are 2 nucleotides (A, C, G or T) at each position (one on each chromosome). If the genotyping read is not sufficiently good, a missing value could be recorded in one or both chromosomes for that position/SNP. A frequently used re-codification of the nocleotide data is to replace the characters (i.e. alleles) by the count of the allele with the lower frequency in the sample, or according to a pre-specified allele as determined in the genotyping platform and software. Thus, instead of storing a pair of nucleotides (e.g., AA, AG, GG), researchers store the individual’s genotype as either 0,1,2, or NA. In thi

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

	function mycd()
	{
	#if this directory is writable then write to directory-based history file
	#otherwise write history in the usual home-based history file
	tmpDir=$PWD
	echo "#"`date '+%s'` >> $HISTFILE
	echo $USER' has exited '$PWD' for '$@ >> $HISTFILE
	builtin cd "$@" # do actual cd
	if [ -w $PWD ]; then export HISTFILE="$PWD/.dir_bash_history"; touch $HISTFILE; chmod --silent 777 $HISTFILE;
	else export HISTFILE="$HOME/.bash_history";

	import jpl.Atom;
	import jpl.Compound;
	import jpl.JPL;
	import jpl.Query;
	import jpl.Term;


	/**
	* This class shows how to configure Logtalk in SWI or YAP using the Jpl library.
	* You need to have the jpl jar in your classpath to compile and execute this file.

	using DataFrames
	using Dates # I am on 0.3

	# Note the quoting style and the custom time-style
	# sed is used to remove softlinks "dir" -> "../dir"
	cmd = `ls -1 -l --quoting-style=c --time-style='+%Y-%m-%d_%H:%M'` \|> `sed 's/ -> ".*"$//g'`

	df = open(cmd, "r", STDOUT) do io
	readtable(io, header=false,
	separator=' ',