- Create conda environment:
$ conda create --name 4thBrain python=3.8
$ conda activate 4thBrain
- pandas (will install numpy)
#! /usr/bin/env python | |
# -*- coding: utf-8 -*- | |
"""This module's docstring summary line. | |
This is a multi-line docstring. Paragraphs are separated with blank lines. | |
Lines conform to 79-column limit. | |
Module and packages names should be short, lower_case_with_underscores. | |
Notice that this in not PEP8-cheatsheet.py | |
Seriously, use flake8. Atom.io with https://atom.io/packages/linter-flake8 | |
is awesome! | |
See http://www.python.org/dev/peps/pep-0008/ for more PEP-8 details |
Notes for the interviewer:
This problem can go in many directions. Allow 5 minutes to brainstorm ideas but then focus the person on taking actions. The goal is for them to run many queries to find out what is happening. There should be direct connection to business metrics. They should briefly list issues but not get mired in implementation details.
Positive Examples:
Notes for the interviewer:
This problem can go in many directions. Allow 5 minutes to brainstorm ideas but then focus the person on architecting a system from end-to-end. The goal is for them to architect an end-to-end system that uses a single algorithm. Describe the specific inputs and outputs, focus on what the model will predict. Connect it to business metrics. They should briefly list issues but not get mired in implementation details.
Positive Examples:
Problem statement for the interviewee:
You have just been hired as the first data scientist and are tasked with finding fraud. The company sells a boutique video subscription service. The company currently has ~2,000 customers and each customer's Annual Recurring Revenue (ARR) is ~$10,000.
How would you approach the problem from a data science perspective?
What are possible types of fraud?
How would emperical find them?
You are a lead machine learning engineer (MLE) at a food delivery service, something like Uber Eats or DoorDash.
The #1 problem is that the delivery drivers are not able to deliver the food at the scheduled time. The customers complain and customer service agents give the complaining customers the food for free and credit for the next delivery. Currently, the company is losing at least $10 million dollars per year from this specific issue.
Frame the estimated time-of-delivery (ETD) as a machine learning problem that your team can solve.
The goals is to write working Python code in coderpad. Additionally, you must answer the following questions: What the time and space complexity of your solution? What is optimal time and space complexity?
Decide who is interviewer and who is interviewee. The interviewee can look at the problem statement. Only the interviewer can look at the solution.
Problem #1 Statement - https://gist.github.com/brianspiering/ee99f83c02321d0eb7bf47c45aec0f31
Problem #1 Solution - https://gist.github.com/brianspiering/dcb8cdcf1c45d23731d684d2e8326f28
Time box for trying for 7 minutes. Cover solution in 3 minutes.
import pandas as pd | |
filename = 'loans.csv' | |
remote_location = 'https://raw.githubusercontent.com/DeltaAnalytics/machine_learning_for_good_data/master/' | |
try: | |
# Local version | |
df = pd.read_csv(filename) | |
except FileNotFoundError or ParserError: | |
# Grab the remote file and save it |
Here is my advice (make a version that works for you):
Take a break to recharge. You are only part way through the program and this is the longest break. Assuming you are not going to burnout...
Don't take any more classes. You have enough classes through University of San Francisco.
Focus on “active recall” and “applied problem solving”. Active recall means being able to explain a concept when prompted. This best way to review previous course materials. Applied problem solving is the ability to recognize concepts in real-world problems and use them to accomplish a task. It is also the best way to recognize gaps in your knowledge. Then you can fill them in. The standard of knowing something is the ability to explain it and use it to solve challenges. I find flashcards useful. My flashcards are here.