datadave/gist:0033b43208e223d215af

Last active August 29, 2015 14:04

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/datadave/0033b43208e223d215af.js"></script>
Save datadave/0033b43208e223d215af to your computer and use it in GitHub Desktop.

Download ZIP

Sample Data Science Course Curriculum

Raw

gistfile1.md

Sample Data Science Course Curriculum

Unit 1: The Basics

Introduction to Data Exploration

Describe the data mining workflow and the key traits of a successful data scientist.
Extract, format, and preprocess data using UNIX command-line tools.
Explore & visualize data.

Introduction to Machine Learning

Explain the concepts and applications of supervised & unsupervised learning techniques.
Describe categorical and continuous feature spaces, including examples and techniques for each.
Discuss the purpose of machine learning and the interpretation of predictive modeling results.

Unit 2: Fundamental Modeling Techniques

K-Nearest Neighbors Classification

Describe the setting and goal of a classification task.
Minimize prediction error using training & test sets, optimize predictive performance using cross-validation.
Understand the kNN classification algorithm, its intuition and implementation.
Implement the "hello world" of machine learning (kNN classification of iris dataset).

Naive Bayes Classification

Outline the basic principles of probability, including conditional probability and Bayes’ theorem.
Describe inference in the Bayesian setting, including the prior and posterior distributions and the likelihood function.
Understand the naive Bayes classifier and its assumptions.
Implement a spam filter using the naive Bayes technique.

Regression and Regularization

Explain the concepts of regression models, including their assumptions and applications.
Discuss the motivation for regularization techniques and their use.
Implement a regularized fit.

Logistic Regression

Describe the applications of logistic regression to classification problems and probability estimation.
Introduce the concepts underlying logistic regression, including its relation to other regression models.
Predict the probability of a user action on a website using logistic regression.

K-Means Clustering

Explain the purpose of exploratory data analysis, its applications in continuous and categorical feature spaces, and the interpretation and use of clustering results.
Discuss the importance of the distance function in cluster formation, as well as the importance of scale normalization.
Implement a k-means clustering algorithm.

Unit 3: Furthering Modeling Techniques

Ensemble Techniques

Describe general ensemble techniques such as bagging and boosting.
Build an enhanced classification algorithm using AdaBoost.

Decision Trees and Random Forests

Describe the use and construction of decision trees for classification tasks.
Create a random forest model for ensemble classification.

Dimensionality Reduction

Explain the practical and conceptual difficulties in working with very high-dimensional data.
Understand the application and use of dimensionality reduction techniques.
Draw inferences from high-dimensional datasets using principal components analysis.

Recommendation Systems

Explain the use of recommendation systems, and discuss several familiar examples.
Understand the underlying concepts, including collaborative & content-based filtering.
Implement a recommendation system.

Unit 4: other tools

Database Technologies

Introduce concepts and use of relational databases, alternative database technologies such as NoSQL, and popular examples of each.

Network Analysis

Describe the use of graphs and graph theory to analyze problems in network analysis.
Explore network visualization.

Map-Reduce

Describe the concepts of parallel computing and applications to problems in big data.
Introduce the map-reduce framework
Implement and explore examples of map-reduce tasks.

Final Project Working Session

Where To Go Next
Review of concepts and examples from preceding weeks.
Discussion of resources & tools for further study.

Final Project Presentations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment