stephenLee

*2vec

word2vec https://arxiv.org/abs/1310.4546
sentence2vec, paragraph2vec, doc2vec http://arxiv.org/abs/1405.4053
tweet2vec http://arxiv.org/abs/1605.03481
tweet2vec https://arxiv.org/abs/1607.07514
author2vec http://dl.acm.org/citation.cfm?id=2889382
item2vec http://arxiv.org/abs/1603.04259
lda2vec https://arxiv.org/abs/1605.02019
illustration2vec http://dl.acm.org/citation.cfm?id=2820907

A Few Useful Things to Know about Machine Learning

The paper presents some key lessons and "folk wisdom" that machine learning researchers and practitioners have learnt from experience and which are hard to find in textbooks.

1. Learning = Representation + Evaluation + Optimization

All machine learning algorithms have three components:

Representation for a learner is the set if classifiers/functions that can be possibly learnt. This set is called hypothesis space. If a function is not in hypothesis space, it can not be learnt.
Evaluation function tells how good the machine learning model is.
Optimisation is the method to search for the most optimal learning model.

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep

	Hey, I'm stephenLee-1144508 and I have contributed to the Semaphore Binary Merkle Root Fix MPC Phase2 Trusted Setup ceremony.
	The following are my contribution signatures:

	Circuit # 1 (semaphore-1)
	Contributor # 563
	Contribution Hash: d886c054 5c8337c3 22a67d42 eea214a8
	abf3f5c1 fb02e898 c1aac8d5 d0573c76
	9c660670 b702b150 9cdc1e5d 86819d81
	f990ec81 c18e5ee3 c3eb5651 de300c3e

	"""Information Retrieval metrics

	Useful Resources:
	http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
	http://www.nii.ac.jp/TechReports/05-014E.pdf
	http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
	http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
	Learning to Rank for Information Retrieval (Tie-Yan Liu)
	"""
	import numpy as np

	#List unique values in a DataFrame column
	pd.unique(df.column_name.ravel())

	#Convert Series datatype to numeric, getting rid of any non-numeric values
	df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

	#Grab DataFrame rows where column has certain values
	valuelist = ['value1', 'value2', 'value3']
	df = df[df.column.isin(value_list)]

	"""
	leverage work of briancappello and quantopian team
	(especcially twiecki, eddie, and fawce)
	"""
	import pandas as pd
	from zipline.gens.utils import hash_args
	from zipline.sources.data_source import DataSource
	import datetime
	import csv
	import numpy as np

	#!/usr/bin/env python
	"""
	Downloads and cleans up a CSV file from a Google Trends query.

	Usage:
	trends.py [email protected] google.password /path/to/filename query1 [query2 ...]

	Requires mechanize:
	pip install mechanize
	"""