12-Week Data Science Bootcamp in NYC

WEEKS 1 & 2 - Data Science with R: Data Analysis

Week 1 Morning: Data Science with R: Data Analysis Part I

Day 1-3: Basic Programming Elements

What is R?
Why R?
Why R?
How to get help
R language resources
RStudio
Installing and using packages
Workspace
Data Objects: Vectors, Matrices, Data Frames, and Lists
Local data import/export
Functions
Control Statements

Day 4-5: Primary Statistical Methods

Descriptive statistics
Hypothesis testing
Linear Regression
Logistic Regression
Introducing non-parametric statistics

Week 1 Afternoon: Source Code Control with Git, Github

Day 1: learn Git init, add, push, pull,merge

Day 2: learn Github features

Day 3: work in team using branching and merging

Day 4: make student portfolio page

Week 2: Data Science with R: Data Analysis Part II

Day 1-3: Data Manipulation

Data sorting
Merging Data
Remodeling Data
String manipulation
Dates and time stamps
Web data capture
API data sources
Connecting to an external database

Day 4-5: Data Visualization

Histograms
Point graphics
Columnar graphics
Line charts
Pie charts
Box Plots
Scatter plots
Visualizing multivariate data
Matrix-based visualizations
Maps

WEEKS 3 & 4 - Data Science with R: Machine Learning

Week 3: Data Science with R: Machine Learning Part I

Day 1-2: Introducing Data mining

What is data mining and how to do it
Steps to apply data mining to your data
Supervised versus unsupervised learning
Regression versus classification problems
Review of linear models
Simple linear regression
Logistic regression
Generalized linear models

Day 3: Performance Measures and Dimension Reduction

Evaluating model performance
Confusion matrices
Beyond accuracy
Estimating future performance
Extension of linear models
Subset selection
Shrinkage methods
Dimension reduction methods

Day 4-5: KNN and Naive Bayes models

The k-Nearest Neighbors model
Understanding the kNN algorithm
Calculating distance
Choosing an appropriate k
Case study
Naive Bayes models
Understanding joint probability
The Naive Bayes algorithm
The Laplace estimator
Case study

Week 4: Data Science with R: Machine Learning Part II

Day 1-3: Tree models and SVMs

Tree models
Regression trees and classification trees
Tree models with party
Tree models with rpart
Random Forest models
GBM models
Support Vector Machines
Maximal margin classifiers
Support vector classifiers
Support vector machines

Day 4-5: The Association Rule and More Models

Market Basket Analysis
Understanding association rules
The a priori algorithm
Case study
Unsupervised learning
K-means clustering
Hierarchical clustering
Case study
Time series models
Stationary time series
The ARIMA model
The seasonal model

Week 4 Afternoon(Optional): Data Visualization with D3.js

Day 1: Development Tools,Scatter plots

Day 2: Loading Data,Bar charts – Grouped and Stacked

Day 3: Line charts,Brushing, Reusable charts

Day 4: Choropleth maps,Projections

D3 Class

The D3.js library is one of the more exciting visualization libraries to be released in the last few years. Based on the concept of building data-driven documents, D3 skills are highly useful for any data scientist that wants to build top-quality interactive visualizations for the web. This class will go over the basics of designing good visualizations and leveraging the browser to communicate data in a really effective manner. Students will have a chance to explore a variety of data sets using D3.js, including plotting dynamic, geographic data using a variety of projections. Finally, we’ll explore other libraries built on top of D3 that make building time-series data visualization really simple.

WEEK 5 - Most Popular and Useful R Toolkits

Day 1: Knitr – Dynamic and Reproducible Reporting
Day 2: Shiny – Make Web Applications
Day 3: rCharts – Bring R and D3.js
Day 4: QuantMod – R for Finance
Day 5: Slidify – Make html5 slides with R

WEEK 6 - Data Science with Python: Data Analysis

Week 6: Data Science with Python: Data Analysis Part I

Day 1: The Python Programming Language

Overview of syntax, built in functions and data structures
Introduction to the standard library
Object oriented programming

Day 2: Computational Statistics

Review of probability and statistics
Hypothesis testing
Introduction to Pandas

Day 3: Data Analysis with Pandas

The exploratory data analysis process
Working with real world data
Data visualization with Matplotlib

Day 4: Getting Data from the Web

Web scraping
Accessing APIs
Building web applications

Day 5: Introduction to Machine Learning

What is machine learning?
The Scikit-Learn API
Image Processing / Text Classification

WEEK 7 - Data Science with Python: Machine Learning

Day 1 – Introduction

Mathematics review Linear Regression Multivariate linear regression Lab: Numpy/Scikit-Learn

Day 2 – Regression and Classification

Naive Bayes Classifiers k-Nearest Neighbors Logistic Regression Linear Discriminant Analysis Lab: Supervised Learning

Day 3 – Resampling and Model selection

Cross-validation Bootstrap Feature selection Lab: Model selection and regularization

Day 4 – Support Vector Machines and Decision Trees

Support Vector Machines Decision Trees Forests Lab: Decision Trees and SVMs

Day 5 – Unsupervised Learning

Principal Component Analysis Clustering with K-Means State Estimation Lab: PCA and clustering

Algorithms we cover in week 7:

Methods	Algorithms
Regression	linear_model.LinearRegression
	linear_model.Ridge
	linear_model.Lasso
	linear_model.ElasticNet
Classification(Discriminant Analysis)	lda.LDA
	qda.QDA
Classification(Tree based model)	tree.DecisionTreeClassifier
	ensemble.RandomForestClassifier
Classification(the others)	linear_model.LogisticRegression
	svm.SVC
Classification(Nearest Neighbors)	neighbors.KNeighborsClassifier
	neighbors.RadiusNeighborsClassifier
Classification(Naive Bayes)	naive_bayes.GaussianNB
	naive_bayes.MultinomialNB
	naive_bayes.BernoulliNB
Unsupervised Learning	decomposition.PCA
	cluster.KMeans
	cluster.AgglomerativeClustering
Feature Selection	feature_selection.VarianceThreshold
	feature_selection.SelectKBest
	feature_selection.SelectPercentile
Cross-Validation	cross_validation.KFold
	cross_validation.StratifiedKFold
	cross_validation.cross_val_score
	cross_validation.train_test_split
Model Selection	linear_model.RidgeCV
	linear_model.LassoCV
	linear_model.ElasticNetCV
	grid_search.GridSearchCV

WEEK 8 - Big Data with Hadoop: Data Engineering Professionals

Day 1

Introduction to the origin and functions of Hadoop
How to build a Hadoop cluster on Amazon cloud

Day 2

The principle operations of Hadoop Distributed File System (HDFS)
HDFS API programming.

Day 3

The principle system and working mechanisms of Map-Reduce
Hadoop data flow
Map-Reduce programming
Connecting Eclipse to a Hadoop cluster

Day 4

Advanced Hadoop applications
Installation and applications of Pig
Architecture and installation of Hive
Applications of HiveQL
Data Mining with Mahout

Day 5

Architecture of HBase and Zookeeper
Installation and management of HBase
The data model of HBase

WEEK 9 - Big Data with Hadoop: 5 Real World Applications

Day 1

Review of Hadoop basics
Summary of Hadoop applications
Analysis of high volume website log systems
Retrieving KPI data (using Map-Reduce)

Day 2

LBS applications for telecommunication companies
Analysis of trace of users’ mobile phones (using Map-Reduce)
User analysis for telecommunication companies
Labeling duplicate users by the fingerprint of calls (using Map-Reduce)
Recommendation systems for E-commerce companies (using Map-Reduce)

Day 3

Complicated recommendation system applications (using Mahout)
Social networks
Distance between users
Community detection (using Pig)
Importance of nodes in a social network (using Map-Reduce)

Day 4

Application of clustering algorithms
Analysis of VIP (using Map-Reduce, Mahout)
Financial data analysis
Retrieving reverse repurchase information from historical data (using Hive)
Setting stock strategies with data analysis (using Map-Reduce, Hive)

Day 5

GPS applications
Sign-in data analysis (using Pig)
Implementation and optimization of sorting (using Map-Reduce)
Middleware development
Cooperation between multiple Hadoop clusters

WEEKS 1 - 9 - Afternoon Hardware Project

Raspberry Pi Class

Alan Perlis once wrote, "I think that it's extraordinarily important that we in computer science keep fun in computing." Learning about the Raspberry Pi is just about that: it’s supposed to be FUN! It's also an inexpensive ticket to discovering more about hardware hacking, operating systems, and programming languages. Need a small web server? Done. Want to build a small, amateur weather station? Done. Want to watch your home DVD collection? Done, all from the same device. We'll cover the basics of this credit-card sized computer as well as explore fun applications in software and hardware. This series will focus on the RPi Model B+ with kits provided for students. The first few classes will cover setup and install, while later classes will cover installing new programs and packages, and finally, students will interface the RPi to hobby electronics and sensors. You will create your own data-collecting machine and be able to leverage your new data science skills to make sense of it. Raspberry Pi is a trademark of the Raspberry Pi Foundation. This class is not officially endorsed by the Raspberry Pi Foundation.

WEEKS 10 & 11 - Capstone Project

2-Week Student Project guided by Instructor and TA's

WEEK 12 - Interview Preparation, Students Virtual Job Fair & Interview Arrangements

Network and promote yourself to our many hiring partners in New York City on our digital hiring platform. Leverage a network of mentors, alumni, and partner companies. If the firm is interested in your projects, you will be scheduled to interview with them through our platform.

We will focus on practice interviews, professional resume feedback, presentation coaching.

jermspeaks/data-science-bootcamp.md

12-Week Data Science Bootcamp in NYC

WEEKS 1 & 2 - Data Science with R: Data Analysis

Week 1 Morning: Data Science with R: Data Analysis Part I

Day 1-3: Basic Programming Elements

Day 4-5: Primary Statistical Methods

Week 1 Afternoon: Source Code Control with Git, Github

Week 2: Data Science with R: Data Analysis Part II

Day 1-3: Data Manipulation

Day 4-5: Data Visualization

WEEKS 3 & 4 - Data Science with R: Machine Learning

Week 3: Data Science with R: Machine Learning Part I

Day 1-2: Introducing Data mining

Day 3: Performance Measures and Dimension Reduction

Day 4-5: KNN and Naive Bayes models

Week 4: Data Science with R: Machine Learning Part II

Day 1-3: Tree models and SVMs

Day 4-5: The Association Rule and More Models

Week 4 Afternoon(Optional): Data Visualization with D3.js

D3 Class

WEEK 5 - Most Popular and Useful R Toolkits

WEEK 6 - Data Science with Python: Data Analysis

Week 6: Data Science with Python: Data Analysis Part I

Day 1: The Python Programming Language

Day 2: Computational Statistics

Day 3: Data Analysis with Pandas

Day 4: Getting Data from the Web

Day 5: Introduction to Machine Learning

WEEK 7 - Data Science with Python: Machine Learning

Day 1 – Introduction

Day 2 – Regression and Classification

Day 3 – Resampling and Model selection

Day 4 – Support Vector Machines and Decision Trees

Day 5 – Unsupervised Learning

Algorithms we cover in week 7:

WEEK 8 - Big Data with Hadoop: Data Engineering Professionals

Day 1

Day 2

Day 3

Day 4

Day 5

WEEK 9 - Big Data with Hadoop: 5 Real World Applications

Day 1

Day 2

Day 3

Day 4

Day 5

WEEKS 1 - 9 - Afternoon Hardware Project

Raspberry Pi Class

WEEKS 10 & 11 - Capstone Project

WEEK 12 - Interview Preparation, Students Virtual Job Fair & Interview Arrangements