Sadaf Azad sfaz

This gist contains out.tex, a tex file that adds a PDF outline ("bookmarks") to the freely available pdf file of the book

An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

http://www-bcf.usc.edu/~gareth/ISL/index.html

The bookmarks allow to navigate the contents of the book while reading it on a screen.

Linear Regression
[Linear Regression using Python - Towards Data Science](https://towardsdatascience.com/linear-regression-us

basic Machine Learning pre-requisites:

python http://docs.python-guide.org/en/latest/intro/learning/
Jupyter http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/
Scipy http://www.scipy-lectures.org/
Pandas https://pandas.pydata.org/pandas-docs/stable/10min.html
Scikit-Learn http://scikit-learn.org/stable/user_guide.html
Seaborn charts https://seaborn.pydata.org/tutorial.html
brief intro to Neural nets https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
Spark MLLib https://spark.apache.org/mllib/
Mathematical Foundations of Machine Learning: https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA

Scikit Learn

supports numpy array, scipy sparse matrix, pandas dataframe.
Estimator - learns from data: can be a classification, regression , clustering that extracts/filters useful features from raw data - implements set_params, fit(X,y), predict(T) , score (judge the quality of fit / predict), predict_proba (confidence level)
Transformer - transform (reduce dimensionality)/ inverse_transform, - clean (sklearn.preprocessing), reduce dimensions (sklearn.unsupervised _reduction), expand (sklearn.kernel_approximation) or generate feature representations (sklearn.feature_extraction).

sklearn.cluster

properties: labels_, cluster_centers_. distance metrics - maximize distance between samples in different classes, and minimizes it within each class: Euclidean distance (l2), Manhattan distance (l1) - good for sparse features, cosine distance - invariant to global scalings, or any precomputed affinity matrix.

dbscan - deterministicly separate areas of high density from

	def normalize(df):
	"""
	Function for min-max Scaling a pandas DataFrame
	@param:
	Takes a pandas DataFrame: df
	Returns: a normalized DataFrame
	along with a dict containing rescaling
	coef which can be used in below function.
	"""
	result = df.copy()

	# List unique values in a DataFrame column
	# h/t @makmanalp for the updated syntax!
	df['Column Name'].unique()

	# Convert Series datatype to numeric (will error if column has non-numeric values)
	# h/t @makmanalp
	pd.to_numeric(df['Column Name'])

	# Convert Series datatype to numeric, changing non-numeric values to NaN
	# h/t @makmanalp for the updated syntax!

	#!/usr/bin/python

	#
	# LICENSE: MIT
	#
	# Copyright (C) 2014 Samuel Stauffer
	#
	# Permission is hereby granted, free of charge, to any person obtaining a copy
	# of this software and associated documentation files (the "Software"), to
	# deal in the Software without restriction, including without limitation