Skip to content

Instantly share code, notes, and snippets.

View brianspiering's full-sized avatar

Brian Spiering brianspiering

  • San Francisco, CA, USA
View GitHub Profile
@brianspiering
brianspiering / setup_env_py_3.8_tf.md
Last active August 17, 2021 17:30 — forked from Gclabbe/readme_conda_py3.8_tf.md
Setup for 4thBrain Anaconda Tensorflow CPU / GPU

How to setup Python 3.8 with TensorFlow

  • Create conda environment:
$ conda create --name 4thBrain python=3.8
$ conda activate 4thBrain    
  • pandas (will install numpy)
@brianspiering
brianspiering / pep8_example.py
Created June 1, 2021 20:02
An example of PEP 8 -- Style Guide for Python Code
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""This module's docstring summary line.
This is a multi-line docstring. Paragraphs are separated with blank lines.
Lines conform to 79-column limit.
Module and packages names should be short, lower_case_with_underscores.
Notice that this in not PEP8-cheatsheet.py
Seriously, use flake8. Atom.io with https://atom.io/packages/linter-flake8
is awesome!
See http://www.python.org/dev/peps/pep-0008/ for more PEP-8 details

Notes for the interviewer:

This problem can go in many directions. Allow 5 minutes to brainstorm ideas but then focus the person on taking actions. The goal is for them to run many queries to find out what is happening. There should be direct connection to business metrics. They should briefly list issues but not get mired in implementation details.

Positive Examples:

  • (Easier to solve) Are customers inappropriately sharing login information?
    • Frame it as Unsupervised - Clustering or anomaly detection
    • Solution - Turn on 2 factor authentication or other features.
  • (Harder to solve) Are the customers reposting videos on other websites?

Notes for the interviewer:

This problem can go in many directions. Allow 5 minutes to brainstorm ideas but then focus the person on architecting a system from end-to-end. The goal is for them to architect an end-to-end system that uses a single algorithm. Describe the specific inputs and outputs, focus on what the model will predict. Connect it to business metrics. They should briefly list issues but not get mired in implementation details.

Positive Examples:

  • Frame it as a density estimation problem. What is the distribution of likely delivery times, conditional on features?
  • Frame it as a regression problem of accurately estimating arrival time.

Problem statement for the interviewee:

You have just been hired as the first data scientist and are tasked with finding fraud. The company sells a boutique video subscription service. The company currently has ~2,000 customers and each customer's Annual Recurring Revenue (ARR) is ~$10,000.

How would you approach the problem from a data science perspective?

What are possible types of fraud?
How would emperical find them?

Problem statement for the interviewee:

You are a lead machine learning engineer (MLE) at a food delivery service, something like Uber Eats or DoorDash.

The #1 problem is that the delivery drivers are not able to deliver the food at the scheduled time. The customers complain and customer service agents give the complaining customers the food for free and credit for the next delivery. Currently, the company is losing at least $10 million dollars per year from this specific issue.

Frame the estimated time-of-delivery (ETD) as a machine learning problem that your team can solve.

@brianspiering
brianspiering / instructions.md
Last active April 23, 2021 22:01
Interview instructions

The goals is to write working Python code in coderpad. Additionally, you must answer the following questions: What the time and space complexity of your solution? What is optimal time and space complexity?

Decide who is interviewer and who is interviewee. The interviewee can look at the problem statement. Only the interviewer can look at the solution.

Problem #1 Statement - https://gist.github.com/brianspiering/ee99f83c02321d0eb7bf47c45aec0f31
Problem #1 Solution - https://gist.github.com/brianspiering/dcb8cdcf1c45d23731d684d2e8326f28

Time box for trying for 7 minutes. Cover solution in 3 minutes.

@brianspiering
brianspiering / fetch_data.py
Created March 9, 2021 00:10
Look for local data. If not found, download it.
import pandas as pd
filename = 'loans.csv'
remote_location = 'https://raw.githubusercontent.com/DeltaAnalytics/machine_learning_for_good_data/master/'
try:
# Local version
df = pd.read_csv(filename)
except FileNotFoundError or ParserError:
# Grab the remote file and save it
@brianspiering
brianspiering / filter_rows_from_both_X_y_in_scikit-learn.ipynb
Last active February 23, 2021 03:08
Filter rows from both X y in scikit-learn
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@brianspiering
brianspiering / advice_for_breaks_in_MSDS_program.md
Created December 22, 2020 16:09
What should I do during breaks in the USF MSDS program

What should I do during breaks in the USF MSDS program

Here is my advice (make a version that works for you):

Take a break to recharge. You are only part way through the program and this is the longest break. Assuming you are not going to burnout...

Don't take any more classes. You have enough classes through University of San Francisco.

Focus on “active recall” and “applied problem solving”. Active recall means being able to explain a concept when prompted. This best way to review previous course materials. Applied problem solving is the ability to recognize concepts in real-world problems and use them to accomplish a task. It is also the best way to recognize gaps in your knowledge. Then you can fill them in. The standard of knowing something is the ability to explain it and use it to solve challenges. I find flashcards useful. My flashcards are here.