Skip to content

Instantly share code, notes, and snippets.

@ricklentz
Created July 5, 2017 18:32
Show Gist options
  • Select an option

  • Save ricklentz/ead6ed87d712a08e42ce84cb7cd7293f to your computer and use it in GitHub Desktop.

Select an option

Save ricklentz/ead6ed87d712a08e42ce84cb7cd7293f to your computer and use it in GitHub Desktop.
Higher leverage activities instead of focusing on grunt work
best minds in the world to focus on the issue (IP development) access to top 3-5 winning solutions
Go on to hire them or continue on in consulting engagement
Branding, recruiting
leverage distribution
XGBoost library - part of 50% of winning solutions
Leaderboard forces question, why are these submissions above me better?
Anyone can sign up - no corrolation with winners and domain export knowledge (shared passion is for data)
Deep learning extended beyond computer vision problem
Compeditive until after the competition ends, open source solutions at end, outgoing video describing the solution
Incredibly useful for beginning student, learn from others, collagoration tooling - Kaggle scripts (R, Python, Julia)
Core skills:
Data programming language R, Python
Interactively exploring data and it's structure (ggplot2)
Rapid iteration and experimentation
Iterative loop is performed as fast as possible
Python - scientific to production code (last 5 years), Keras library (machine vision), XGBoost (rank ads, predict satisfaction)
R - machine learning - great exploration, a bit harder to take to prod
Understand problem (feature preprocessing, compeditive edge, understanding the distribution of the training datasets) Cross validation
Creatively thinking about the domain (effort and creativity)
Issues:
Overfitting on public leaderboard, learn from byproducts (e.g. submitting multiple times, overfit to public leaderboard)
Career:
Deep mind hired 4 Kaggle winners, run own job board (look at top 200-300 in competition pool), swap to more interesting work, executives use profile (in large companies) to find standouts within
Showcase the best of your abilities, collaborations showcase clean and well architected code, give helpful advice, share insights about the data
100s of compeditions, winners from outside USA, not machine learning Ph.D. students, 30 years in another field and become passionate about ML
Future:
Field still in infancy - 10 year time horizion this will change, it will be easy to create and use the technology
Help the world learn from data, integrate new data sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment