Skip to content

Instantly share code, notes, and snippets.

@brianspiering
Last active November 17, 2020 16:19
Show Gist options
  • Save brianspiering/74b61fe8db08946dc119cfac5f139c33 to your computer and use it in GitHub Desktop.
Save brianspiering/74b61fe8db08946dc119cfac5f139c33 to your computer and use it in GitHub Desktop.
Data Science Workflow

Data Science Workflow

Step 1 - Ask

Based on domain experience, ask specific and meaningful questions.

Define all terms precisely within the context.

Step 2 - Acquire

Is the organization even collecting the right data?

Do I even have access to the relevant data for the problem?

Can I get the stored data to where the modeling takes place?

Step 3 - Process

Get all the data in a single data structure (typically a data frame).

Understand, organize, and munge.

Identify features and targets.

Step 4 - Model

Apply Statistics, Machine Learning, and Deep Learning algorithms.

Step 5 - Deliver

The goal is to add value. Is the value in a report, presentation, or code?

How are the models going to users? Going into production? Transfer code to software development team? PMML? Create API?

What does the system update when the data changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment