Start here.
- Intro to Data Science UW / Coursera
- Topics: Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
- Linear Algebra / Levandosky Stanford / Book
- Linear Programming (Math 407) University of Washington / Course
- Statistics Stats in a Nutshell / Book
- Forecasting: Principles and Practice Monash University / Book *uses R
- Problem-Solving Heuristics "How To Solve It" Polya / Book
- Coding the Matrix: Linear Algebra through Computer Science Applications Brown / Coursera
- Think Bayes Allen Downey / Book
- Think Stats: Probability and Statistics for Programmers Allen Downey / Book
-
Algorithms
-
Introduction to Algorithms Udacity
-
Algorithms Design & Analysis I Stanford / Coursera
-
Algorithm Design Kleinberg & Tardos / Book
-
Databases
-
SQL Tutorial W3Schools / Tutorials
-
Another SQL Tutorial SQLZOO
-
Introduction to Databases Stanford / Coursera
-
Data Mining
-
Mining Massive Data Sets Stanford / Book
-
Mining The Social Web O'Reilly / Book
-
Introduction to Information Retrieval Stanford / Book
-
Machine Learning
-
Machine Learning / Ng Stanford / Coursera
-
A Course in Machine Learning / Hal Daumé III UMD Online Book
-
Programming Collective Intelligence O'Reilly / Book
-
Statistics The Elements of Statistical Learning / Book
-
Machine Learning / CaltechX Caltech / Edx
-
Probabilistic Graphical Models
-
Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)
-
PGMs / Koller Stanford / Coursera
-
Natural Language Processing
-
NLP with Python O'Reilly / Book
-
Analysis
-
Python for Data Analysis O'Reilly / Book
-
Big Data Analysis with Twitter UC Berkeley / Lectures
-
Social Network Analysis University of Michigan~~
-
Social and Economic Networks: Models and Analysis / Stanford / Coursera
-
Information Visualization "Envisioning Information" Tufte / Book
-
Python (Learning)
-
New To Python: Learn Python the Hard Way, Google's Python Class
-
Python (Libraries)
-
Basic Packages Python, virtualenv, NumPy, SciPy, matplotlib and IPython
-
Data Science in iPython Notebooks (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering)
-
Bayesian Inference | pymc
-
Labeled data structures objects, statistical functions, etc pandas (See: Python for Data Analysis)
-
Python wrapper for the Twitter API twython
-
Tools for Data Mining & Analysis scikit-learn
-
Network Modeling & Viz networkx
-
Natural Language Toolkit NLTK
- Coursework
- Sentiment analysis, trending topics, and friendship mapping with Twitter API
- Joins and Matrix Manipulation in MapReduce (AWS EC2)
- In-database Text analysis (SQL)
- Sentiment analysis of movie tweets (Python)
- Coursera
- Khan Academy
- Metacademy
- Wolfram Alpha
- Wikipedia
- Quora
- Kindle .mobis
- Great PopSci Read: The Signal and The Noise Nate Silver
- Zipfian Academy's List of Resources
- A Software Engineer's Guide to Getting Started w Data Science
- Data Scientist Interviews Metamarkets
- Harvard's Data Viz Class [Harvard CS 171] (http://cs171.org)
- D3 Tips and Tricks [Leanpub] (https://leanpub.com/D3-Tips-and-Tricks)
- Scott Murray's Tutorial on D3 [Scott Murray's Blog] (http://alignedleft.com/tutorials/)
- Berkely's Viz Class [UC Berkeley] (http://vis.berkeley.edu/courses/cs294-10-sp11/wiki/index.php/CS294-10_Visualization)
- Rice University's Data Viz class [Rice University] (http://had.co.nz/stat645/)
- Interactive Data Visualization Book [O'Reilley] (http://chimera.labs.oreilly.com/books/1230000000345/index.html)
This is an introduction geared toward those with at least a minimum understanding of programming, and (perhaps obviously) an interest in the components of Data Science (like statistics and distributed computing). Out of personal preference and need for focus, I geared the original curriculum toward Python tools and resources, so I've explicitly marked when resources use other tools to teach conceptual material (like R)
From Open Source Data Science Masters. Follow me on Twitter @cChpmnSiu