Massimo Albani Mashimo

Software product manager. Book author. Passionate about everything data related. Curious.

24 followers · 2 following

Officina Mutante

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

Mashimo / Random Forest

Last active April 29, 2018 22:38

Random forest

	A single decision tree, tasked to learn a dataset might not be able to perform well due to the outliers, and the breadth and depth complexity of the data. So instead of relying on a single tree, random forests rely on a forest of cleverly grown decision trees. Each tree within the forest is allowed to become highly specialized in a specific area, but still retains some general knowledge about most areas. When a random forest classifier, it is actually each tree in the forest working together to cast votes on what label they think a specific sample should be assigned.
	Instead of sharing the entire dataset with each decision tree, the forest performs an operation which is essential a train / test split of the training data. Each decision tree in the forest randomly samples from the overall training data set. Through doing so, each tree exist in an independent subspace and the variation between trees is controlled. This technique is known as tree bagging, or bootstrap aggregating.
	In addition to the tree bagg

Mashimo / Decision Tree

Last active April 29, 2018 22:39

Decision Tree

	Decision trees are a supervised, probabilistic, machine learning classifier that are often used as decision support tools. Like any other classifier, they are capable of predicting the label of a sample, and the way they do this is by examining the probabilistic outcomes of your samples' features.
	Decision trees are one of the oldest and most used machine learning algorithms, perhaps even pre-dating machine learning. They're very popular and have been around for decades. Following through with sequential cause-and-effect decisions comes very naturally.
	Decision trees are a good tool to use when you want backing evidence to support a decision.

Mashimo / Support Vector Machines

Last active April 29, 2018 22:38

SVM and SVC

Support vector machines are a set of supervised learning algorithms that you can use for classification, regression and outlier detection purposes. SciKit-Learn has many classes for SVM usage, depending on your purpose. The one we'll be focusing on is Support Vector Classifier, SVC.

Mashimo / Regression

Last active April 29, 2018 22:37

Regression

Examples of regression models for prediction

Mashimo / Classification K-nearest neighbours

Last active April 29, 2018 22:37

Clustering supervised

	Clustering groups samples that are similar within the same cluster.
	Supervised: data samples have labels associated.
	Use the K-nearest algorithm.

Mashimo / Clustering unsupervised

Last active August 5, 2022 02:45

Clustering data

	Clustering groups samples that are similar within the same cluster.
	Unsupervised: no label provided in the data samples.
	Use the K-means algorithm.

Mashimo / Isomap

Last active November 16, 2020 10:54

Data dimensionality reduction via isomap

	Isomap is a nonlinear dimensionality reduction method.
	The algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate
	of each data point’s neighbours

Mashimo / PCA

Last active July 21, 2020 16:57

PCA - Principal component Analysis

Principal Component Analysis

Mashimo / data visualisation

Last active April 29, 2018 22:36

wheat seeds data visualisation

Check the data-visualisation-README file below.

Mashimo / readNHL.py

Last active April 29, 2018 22:39

Read NHL Historic Player Points Statistics

	import pandas as pd

	# Load up the table for the years 2014-2015, and extract the dataset out of it.
	#
	url = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2"
	table_df = pd.read_html(url, header=1)[0]

	# Columns get automatic names. Rename the columns so that they are similar to the
	# column definitions on the website.
	#