Skip to content

Instantly share code, notes, and snippets.

@influentcoder
Last active June 29, 2018 16:34
Show Gist options
  • Save influentcoder/7b006dae19f9595d18b7bb4d05a23b47 to your computer and use it in GitHub Desktop.
Save influentcoder/7b006dae19f9595d18b7bb4d05a23b47 to your computer and use it in GitHub Desktop.
ML Stuff

Data

Data are observation of real-world phenomena. E.g. stock market data might involve observations of daily stock prices, announcements of earnings by individual companies, opinion articles from pundits.

Tasks

Data can help us answer some questions. E.g. which stocks should I invest in? The tasks are how we get to the answers.

Models

Frequent characteristics of data: wrong, redundant, missing. A mathematical model of data describes the relationships between the different aspects of the data. E.g. a model that predicts stock prices might be a formula that maps a company's earnings history, past stock prices, and industry to the predicted stock price. E.g. a spam detection algorithm.

Features

A feature is a numeric representation of raw data. A feature is an individual measurable property or characteristic of a phenomenon being observed. Feature engineering is the process of formulating the most appropriate features given the data, the model and the task. E.g. in spam detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text.

The place of feature engineering in the machine learning workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment