Skip to content

Instantly share code, notes, and snippets.

View chrishwiggins's full-sized avatar

chris wiggins chrishwiggins

View GitHub Profile
nice NPR story illustrating a conceptual and methodological
difference between AI and ML, using some of the more
press-grabbing, (human) game-beating systems:
http://www.npr.org/blogs/alltechconsidered/2015/01/08/375736513/look-out-this-poker-playing-computer-is-unbeatable
this story's pretty interesting in general but one particular
part grabs my attention:
Oren Etzioni, the head of Seattle's Allen Institute for
Q: what are "single tree-based" (as opposed to forest-based) supervised learning methods?
A: some of my favorites:
- ADT
+ wiki: http://en.wikipedia.org/wiki/Alternating_decision_tree
+ ref: http://perun.pmf.uns.ac.rs/radovanovic/dmsem/cd/install/Weka/doc/classifiers-papers/trees/ADTree/atrees.pdf
- rpart in R
+ http://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
FAQ:
where are some fun datasets to play with?
1. CMU:
http://lib.stat.cmu.edu/datasets/
2. UCI:
a) MLR@UCI (machine learning repository / machine learning archive )
For current information please
- see http://modelingsocialdata.org/ and
- follow @CUSocialData ( https://twitter.com/CUSocialData )
official bulletin URL:
http://www.columbia.edu/cu/bulletin/uwb/subj/APMA/E4990-20151-001/
Q: what book should i use to learn ML?
A: use several, and find the one that speaks to you.
the list below assumes you know a bit of math but
are not very mathematical, and are interested in learning
enough to be practical. that is, it is not at the
mathematical level of MIJ's alleged list
(cf. https://news.ycombinator.com/item?id=1055389 )
Q: I want to sign up for 3900 (supervised research). How many
credits will you give me?
A: If you want to take 3900 with me, we need to come to a
contract, and this contract needs to be closed before the start
of the semester. The contract will stipulate:
- Who is the scientific advisor (if not me)
- What is the deliverable (e.g., technical report, oral report)
- tukey's 1962 paper on the tension between
mathematical statistics and applied computational statistics
http://web.stanford.edu/~gavish/documents/Tukey_the_future_of_data_analysis.pdf
- william cleveland's 2001 "data science" paper
http://www.datascienceassn.org/sites/default/files/Data%20Science%20An%20Action%20Plan%20for%20Expanding%20the%20Technical%20Areas%20of%20the%20Field%20of%20Statistics.pdf
- interview w/leo breiman, heretical statistician
http://projecteuclid.org/euclid.ss/1009213290
learning mixtures of ranking models
consistency of spectral partitioning of uniform hypergraphs under
optimal rates for $k$-nn density and mode estimation
bayesian inference for structured spike and slab priors
grouping-based low-rank video completion and 3d reconstruction
tightening after relax: minimax-optimal sparse pca in polynomial
belief propagation recursive neural networks
communication efficient distributed machine learning with the
on the statistical consistency of plug-in classifiers for
distributed context-aware bayesian posterior sampling via
The Bayesian approach to model selection is a subject you'll
like. The basic idea is to compute the "Bayes Factor":
http://en.wikipedia.org/wiki/Bayes_factor .
As the page says "Bayesian inference has been put forward as a
theoretical justification for and generalization of Occam's
razor".
( http://en.wikipedia.org/wiki/Occam%27s_razor )
The Bayes factor can be approximated under sum assumptions,
leading to a simple penalized maximum likelihood called the