- What is R?
- Why R?
- Why R?
- How to get help
- R language resources
- RStudio
- Installing and using packages
- Workspace
- Data Objects: Vectors, Matrices, Data Frames, and Lists
- Local data import/export
- Functions
- Control Statements
- Descriptive statistics
- Hypothesis testing
- Linear Regression
- Logistic Regression
- Introducing non-parametric statistics
Day 1: learn Git init, add, push, pull,merge
Day 2: learn Github features
Day 3: work in team using branching and merging
Day 4: make student portfolio page
- Data sorting
- Merging Data
- Remodeling Data
- String manipulation
- Dates and time stamps
- Web data capture
- API data sources
- Connecting to an external database
- Histograms
- Point graphics
- Columnar graphics
- Line charts
- Pie charts
- Box Plots
- Scatter plots
- Visualizing multivariate data
- Matrix-based visualizations
- Maps
- What is data mining and how to do it
- Steps to apply data mining to your data
- Supervised versus unsupervised learning
- Regression versus classification problems
- Review of linear models
- Simple linear regression
- Logistic regression
- Generalized linear models
- Evaluating model performance
- Confusion matrices
- Beyond accuracy
- Estimating future performance
- Extension of linear models
- Subset selection
- Shrinkage methods
- Dimension reduction methods
- The k-Nearest Neighbors model
- Understanding the kNN algorithm
- Calculating distance
- Choosing an appropriate k
- Case study
- Naive Bayes models
- Understanding joint probability
- The Naive Bayes algorithm
- The Laplace estimator
- Case study
- Tree models
- Regression trees and classification trees
- Tree models with party
- Tree models with rpart
- Random Forest models
- GBM models
- Support Vector Machines
- Maximal margin classifiers
- Support vector classifiers
- Support vector machines
- Market Basket Analysis
- Understanding association rules
- The a priori algorithm
- Case study
- Unsupervised learning
- K-means clustering
- Hierarchical clustering
- Case study
- Time series models
- Stationary time series
- The ARIMA model
- The seasonal model
Day 1: Development Tools,Scatter plots
Day 2: Loading Data,Bar charts – Grouped and Stacked
Day 3: Line charts,Brushing, Reusable charts
Day 4: Choropleth maps,Projections
The D3.js library is one of the more exciting visualization libraries to be released in the last few years. Based on the concept of building data-driven documents, D3 skills are highly useful for any data scientist that wants to build top-quality interactive visualizations for the web. This class will go over the basics of designing good visualizations and leveraging the browser to communicate data in a really effective manner. Students will have a chance to explore a variety of data sets using D3.js, including plotting dynamic, geographic data using a variety of projections. Finally, we’ll explore other libraries built on top of D3 that make building time-series data visualization really simple.
Day 1: Knitr – Dynamic and Reproducible Reporting
Day 2: Shiny – Make Web Applications
Day 3: rCharts – Bring R and D3.js
Day 4: QuantMod – R for Finance
Day 5: Slidify – Make html5 slides with R
- Overview of syntax, built in functions and data structures
- Introduction to the standard library
- Object oriented programming
- Review of probability and statistics
- Hypothesis testing
- Introduction to Pandas
- The exploratory data analysis process
- Working with real world data
- Data visualization with Matplotlib
- Web scraping
- Accessing APIs
- Building web applications
- What is machine learning?
- The Scikit-Learn API
- Image Processing / Text Classification
Mathematics review Linear Regression Multivariate linear regression Lab: Numpy/Scikit-Learn
Naive Bayes Classifiers k-Nearest Neighbors Logistic Regression Linear Discriminant Analysis Lab: Supervised Learning
Cross-validation Bootstrap Feature selection Lab: Model selection and regularization
Support Vector Machines Decision Trees Forests Lab: Decision Trees and SVMs
Principal Component Analysis Clustering with K-Means State Estimation Lab: PCA and clustering
Methods | Algorithms |
---|---|
Regression | linear_model.LinearRegression |
linear_model.Ridge | |
linear_model.Lasso | |
linear_model.ElasticNet | |
Classification(Discriminant Analysis) | lda.LDA |
qda.QDA | |
Classification(Tree based model) | tree.DecisionTreeClassifier |
ensemble.RandomForestClassifier | |
Classification(the others) | linear_model.LogisticRegression |
svm.SVC | |
Classification(Nearest Neighbors) | neighbors.KNeighborsClassifier |
neighbors.RadiusNeighborsClassifier | |
Classification(Naive Bayes) | naive_bayes.GaussianNB |
naive_bayes.MultinomialNB | |
naive_bayes.BernoulliNB | |
Unsupervised Learning | decomposition.PCA |
cluster.KMeans | |
cluster.AgglomerativeClustering | |
Feature Selection | feature_selection.VarianceThreshold |
feature_selection.SelectKBest | |
feature_selection.SelectPercentile | |
Cross-Validation | cross_validation.KFold |
cross_validation.StratifiedKFold | |
cross_validation.cross_val_score | |
cross_validation.train_test_split | |
Model Selection | linear_model.RidgeCV |
linear_model.LassoCV | |
linear_model.ElasticNetCV | |
grid_search.GridSearchCV |
- Introduction to the origin and functions of Hadoop
- How to build a Hadoop cluster on Amazon cloud
- The principle operations of Hadoop Distributed File System (HDFS)
- HDFS API programming.
- The principle system and working mechanisms of Map-Reduce
- Hadoop data flow
- Map-Reduce programming
- Connecting Eclipse to a Hadoop cluster
- Advanced Hadoop applications
- Installation and applications of Pig
- Architecture and installation of Hive
- Applications of HiveQL
- Data Mining with Mahout
- Architecture of HBase and Zookeeper
- Installation and management of HBase
- The data model of HBase
- Review of Hadoop basics
- Summary of Hadoop applications
- Analysis of high volume website log systems
- Retrieving KPI data (using Map-Reduce)
- LBS applications for telecommunication companies
- Analysis of trace of users’ mobile phones (using Map-Reduce)
- User analysis for telecommunication companies
- Labeling duplicate users by the fingerprint of calls (using Map-Reduce)
- Recommendation systems for E-commerce companies (using Map-Reduce)
- Complicated recommendation system applications (using Mahout)
- Social networks
- Distance between users
- Community detection (using Pig)
- Importance of nodes in a social network (using Map-Reduce)
- Application of clustering algorithms
- Analysis of VIP (using Map-Reduce, Mahout)
- Financial data analysis
- Retrieving reverse repurchase information from historical data (using Hive)
- Setting stock strategies with data analysis (using Map-Reduce, Hive)
- GPS applications
- Sign-in data analysis (using Pig)
- Implementation and optimization of sorting (using Map-Reduce)
- Middleware development
- Cooperation between multiple Hadoop clusters
Alan Perlis once wrote, "I think that it's extraordinarily important that we in computer science keep fun in computing." Learning about the Raspberry Pi is just about that: it’s supposed to be FUN! It's also an inexpensive ticket to discovering more about hardware hacking, operating systems, and programming languages. Need a small web server? Done. Want to build a small, amateur weather station? Done. Want to watch your home DVD collection? Done, all from the same device. We'll cover the basics of this credit-card sized computer as well as explore fun applications in software and hardware. This series will focus on the RPi Model B+ with kits provided for students. The first few classes will cover setup and install, while later classes will cover installing new programs and packages, and finally, students will interface the RPi to hobby electronics and sensors. You will create your own data-collecting machine and be able to leverage your new data science skills to make sense of it. Raspberry Pi is a trademark of the Raspberry Pi Foundation. This class is not officially endorsed by the Raspberry Pi Foundation.
2-Week Student Project guided by Instructor and TA's
Network and promote yourself to our many hiring partners in New York City on our digital hiring platform. Leverage a network of mentors, alumni, and partner companies. If the firm is interested in your projects, you will be scheduled to interview with them through our platform.
We will focus on practice interviews, professional resume feedback, presentation coaching.