Skip to content

Instantly share code, notes, and snippets.

Last active August 29, 2015 14:16
Show Gist options
  • Save amueller/c4b38c37f136d0c2a9ef to your computer and use it in GitHub Desktop.
Save amueller/c4b38c37f136d0c2a9ef to your computer and use it in GitHub Desktop.
scipy scikit-learn tutorial draft

Tutorial Topic

This tutorial aims to provide an introduction to machine learning and scikit-learn "from the ground up". We will start with basic concepts of machine learning and implementing these using scikit-learn. Going in detail through the characteristics of several methods, we will discuss how to pick an algorithm for your application, how to set its parameters, and how to evaluate performance.

Please provide a more detailed abstract of your tutorial (again, see last years tutorials).

Machine learning is the task of extracting knowledge from data, often with the goal to generalize to new, unseen data. Applications of machine learning now touch nearly every aspect of everyday life, from the face detection in our phones and the streams of social media we consume to picking restaurants, partners, and movies. I has also become indispensable to many empirical sciences, from physics, astronomy and biology to social sciences.

Scikit-learn has emerged of one of the most popular toolkits for machine learning, and is now widely used in industry and academia. The goal of the tutorial is to enable participants to use the wide variety of machine learning algorithms available in scikit-learn on their own data sets, for their own domains.

This tutorial will comprise an introductory morning session and an advanced afternoon session. The morning part of the tutorial will cover basic concepts of machine learning, data representation and preprocessing. We will explain different problem settings, and which algorithms to use in each. We will then go through some simple sample applications using algorithms implemented in Scikit-Learn, including SVMs, Random Forests, K-Means, PCA, T-SNE and others.

In the afternoon session, we will discuss setting hyper-parameters, and how to prevent overfitting. We will go in-depth into the trade-off of model complexity and dataset size. We will also discuss complexity of learning algorithms and how to cope with very large datasets. We will also go through the process of building machine learning pipelines, consisting of feature extraction, preprocessing and supervised learning.

Copy link

I see a few small tweaks - let me try a PR. Overall it looks pretty good though!

Copy link

Bah, no PRs from gists :) . Here is my edit

Tutorial Topic

This tutorial aims to provide an introduction to machine learning and scikit-learn "from the ground up". We will start with core concepts of machine learning, some example uses of machine learning, and how to implement them using scikit-learn. Going in detail through the characteristics of several methods, we will discuss how to pick an algorithm for your application, how to set its parameters, and how to evaluate performance.

Please provide a more detailed abstract of your tutorial (again, see last years tutorials).

Machine learning is the task of extracting knowledge from data, often with the
goal of generalizing to new, unseen data. Applications of machine learning now
touch nearly every aspect of everyday life, from the face detection in our
phones and the streams of social media we consume to picking restaurants,
partners, and movies. It has also become indispensable to many empirical
sciences, including physics, astronomy, biology, and the social sciences.

Scikit-learn has emerged of one of the most popular toolkits for machine learning,
and is now widely used in industry and academia.
The goal of this tutorial is to enable participants to use the wide variety of
machine learning algorithms available in scikit-learn on their own data sets,
for their own domains.

This tutorial will comprise an introductory morning session and an advanced
afternoon session. The morning part of the tutorial will cover basic concepts
of machine learning, data representation and preprocessing. We will explain
different problem settings and which algorithms to use in each. We will then
go through some sample applications using algorithms implemented in
Scikit-learn, including SVMs, Random Forests, K-Means, PCA, t-SNE and others.

In the afternoon session, we will discuss setting hyper-parameters and how to
prevent overfitting. We will go in-depth into the trade-off of model complexity
and dataset size, as well as discussing complexity of learning algorithms and
how to cope with very large datasets. The session will conclude by stepping
through the process of building machine learning pipelines consisting of
feature extraction, preprocessing and supervised learning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment