Olga Pustovalova olp-cs

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

I wrote this in early January 2012, but never finished it. The research and thinking in this area led to a lot of the design of Yeoman and talks like "Javascript Development Workflow of 2013", "Web Application Development Workflow" and "App development stack for JS developers" (surpisingly little overlap in those talks, btw).

Now it's June 2013 and the state of web app tooling has matured quite a bit. But here's a snapshot of the story from 18 months ago, even if a little ugly and incomplete. :p

In the beginning…

Intro to tooling

Backend Development

node.js
- Installation paths: use one of these techniques to install node and npm without having to sudo.
- Node.js HOWTO: Install Node+NPM as user (not root) under Unix OSes
- Felix's Node.js Guide
- Creating a REST API using Node.js, Express, and MongoDB
- Node Cellar Sample Application with Backbone.js, Twitter Bootstrap, Node.js, Express, and MongoDB
- JavaScript Event Loop
Node.js for PHP programmers

Frontend Development

Attention: the list was moved to

https://github.com/dypsilon/frontend-dev-bookmarks

This page is not maintained anymore, please update your bookmarks.

	import selenium
	import time
	from selenium import webdriver

	browser = webdriver.PhantomJS("phantomjs")
	browser.get("https://twitter.com/StackStatus")
	print browser.title

	pause = 3

	from sklearn.metrics import confusion_matrix

	def print_cm(cm, labels, hide_zeroes=False, hide_diagonal=False, hide_threshold=None):
	"""pretty print for confusion matrixes"""
	columnwidth = max([len(x) for x in labels]+[5]) # 5 is value length
	empty_cell = " " * columnwidth
	# Print header
	print " " + empty_cell,
	for label in labels:
	print "%{0}s".format(columnwidth) % label,

	-- Build a sorted word frequency list from a file, trimmed to a given quantile.
	--
	-- Usage: WordStats <book.txt> <quantile>
	--
	-- `quantile` is a number between 0 and 1.
	--
	-- Example:
	-- ./WordStats "Don Quijote.txt" 0.85 > "Don Quijote.words.85"

	import Control.Applicative

	########3 rep 10 fold CV to determine feature sparsity percentage via RFE#########

	#X = concatenated text features for training set (title, body, url) transformed via TfIdfVectorizer
	#y = training set classification (0, 1)

	import numpy as np
	import pandas as pd
	import sklearn.linear_model as lm
	from sklearn.cross_validation import KFold
	from sklearn import metrics

	var doctors = [
	{ number: 1, actor: "William Hartnell", begin: 1963, end: 1966 },
	{ number: 2, actor: "Patrick Troughton", begin: 1966, end: 1969 },
	{ number: 3, actor: "Jon Pertwee", begin: 1970, end: 1974 },
	{ number: 4, actor: "Tom Baker", begin: 1974, end: 1981 },
	{ number: 5, actor: "Peter Davison", begin: 1982, end: 1984 },
	{ number: 6, actor: "Colin Baker", begin: 1984, end: 1986 },
	{ number: 7, actor: "Sylvester McCoy", begin: 1987, end: 1989 },
	{ number: 8, actor: "Paul McGann", begin: 1996, end: 1996 },
	{ number: 9, actor: "Christopher Eccleston", begin: 2005, end: 2005 },

	#!/bin/sh

	# System update
	sudo apt-get update

	# Curl
	sudo apt-get -y install curl

	# Git
	sudo apt-get -y install git-core