This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.tree import DecisionTreeClassifier | |
import pandas as pd | |
# Get the most valuable customers, from step 2 | |
df = pd.read_csv('high_value_customers.csv') | |
# Churned is our target. Why did they/didn't they churn? | |
X, y = df.drop('Churned', axis=1), df['Churned'] | |
model = DecisionTreeClassifier() | |
model.fit(X, y) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
from sklearn.cluster import KMeans | |
df = pd.read_csv('user_history.csv') | |
# Pandas' cut method groups continuous values into equal-sized bins | |
df['Frequency'] = pd.cut(df['RequestsPerMonth'], bins=4) | |
# Since lower recency is better, we need to reverse the order of the bins |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Don't do this | |
def get_data_bad(query_text): | |
db = SQLDB() | |
return db.get(query_text) | |
# What if you need to use a DocDB instance? Or a DynamoDB instance? | |
# Do this instead | |
def get_data(db, query_text): | |
return db.get(query_text) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class QueryBuilder: | |
def __init__(self): | |
self.select_value = '' | |
self.from_table_name = '' | |
self.where_value = '' | |
self.groupby_value = '' | |
def select(self, select_arg): | |
self.select_value = select_arg | |
return self |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from time import time | |
def log_time(func): | |
"""Logs the time it took for func to execute""" | |
def wrapper(*args, **kwargs): | |
start = time() | |
val = func(*args, **kwargs) | |
end = time() | |
duration = end - start | |
print(f'{func.__name__} took {duration} seconds to run') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from azure.datalake.store import core, lib, multithread | |
import pandas as pd | |
class ADLSHelper: | |
def __init__(self, store_name='mystorename'): | |
""" | |
When initializing this helper, it will prompt you to do an interactive login to connect to your data lake store. | |
It uses Azure Active Directory for authentication, and you use the token returned from | |
your login process to connect to your Azure Data Lake instance. | |
You can also authenticate with username/password or ServicePrincipal for production. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def remove_null_cols(df): | |
_df = df.copy() | |
_df = df.dropna(how='all', axis=1) | |
return _df | |
def set_category_types(df, columns): | |
_df = df.copy() | |
for col in columns: | |
_df[col] = df[col].astype('category') | |
return _df |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM ubuntu:latest | |
RUN apt-get update && \ | |
apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr \ | |
flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev python-pip -y && \ | |
pip install textract |