Skip to content

Instantly share code, notes, and snippets.

View aflansburg's full-sized avatar
🔧

Abe Flansburg aflansburg

🔧
View GitHub Profile
@aflansburg
aflansburg / extract_runtime.py
Created August 5, 2021 14:46
Check GridSearchCV fit Runtime
# import time - not the abstract construct of 'time'
# but rather a library built into Python for
# dealing with time
from time import time
# ML stuff
ada_tuned_clf = AdaBoostClassifier(random_state=1)
# some canned params for hypertuning
parameters = {
@aflansburg
aflansburg / gridsearch_runtime.py
Last active August 1, 2021 18:52
Calculate GridSearchCV runtime
# runtime info based on solution below and fit_time results of the gridsearchcv return object
# based on a response on StackExchange Data Science - Naveen Vuppula
# https://datascience.stackexchange.com/a/93524/41883
# from time import time
def gridsearch_runtime(grid_obj, X_train, y_train):
'''
Parameters:
grid_obj: GridSearchCV return object that has not yet been fit to training data
X_train: split training data independent variables
y_train: split training data containing dependent variable
@aflansburg
aflansburg / dual_plot.py
Last active July 14, 2021 18:14
Dual histogram + boxplot with KDE for univariate analysis + Mean & Median Lines
# import libs
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# this function will create a boxplot + histogram plot using Seaborn's Jointgrid
# we'll also provide Type annotations to provide hints to future users
def dual_plot(series: pd.Series, figsize: tuple = (16,8),
bins: int = None, return_plot: bool = False,
@aflansburg
aflansburg / 1_typing_docstrings_documentation.py
Last active July 14, 2021 14:44
Typing & Docstrings: Python
# function to iterate over specified variables and view their value counts
# add typing to help understand our function if reused elsewhere
from typing import List
def value_count_rep(columns: List, df: pd.DataFrame) -> None:
'''
Parameters: List of columns to iterate over
Returns: No return value. Prints the value counts of each column(feature) to stdout
'''
for column in columns:
@aflansburg
aflansburg / structural-pattern-matching.py
Created June 18, 2021 15:04
Naive Use Case of Structural Pattern Matching (Python 3.10)
# https://www.python.org/dev/peps/pep-0634/
# PEP 634 proposed (and was accepted) Structural Pattern Matching (switch/case)
# for Python (available in 3.10) - as of this Gist,
# prerelease 3.10.0b2 is available
import inspect
F = [
lambda x,y: x+y,
lambda x: x+1,
@aflansburg
aflansburg / skewness_plot.py
Last active April 22, 2021 15:17
Plots a series using the Seaborn histogram plot with kde and, mean, mode, and median lines in Jupyter / iPython
## handy multi-plot function for showing mode, median, and mean lines in a distplot
## (but using histplot since distplot is deprecated)
## Author - Abram Flansburg
## Intended for use in Jupyter / iPython
def skewness_plot(series):
"""
Plots a series using the histogram plot with kde and plots, mean, mode, and median lines.
*** Dependencies ***
Series must be a pandas.Series
Seaborn must be imported as sns
@aflansburg
aflansburg / lambda_cloudwatch_slack.js
Created April 20, 2021 13:59
Node.js Lambda to decompress and parse Cloudwatch Log Event gzipped data and send to Slack
// In this scenario - a Cloudwatch Subscription Filter is hooked up to a Node.js lambda's ARN
// (see AWS docs around Log Group subscription filters)
// Will need a webhook for the appropriate Slack channel
const https = require('https');
const zlib = require('zlib');
const options = {
hostname: 'hooks.slack.com',
path: 'SLACK_WEBHOOK_URL',
@aflansburg
aflansburg / null_value_display.py
Last active April 15, 2021 17:01
Tabular Null Value Check display Function for Pandas Dataframe
# Not my original function - looking for citation
def missing_check(df):
null_val_sum = df.isnull().sum()
total = df.isnull().sum().sort_values(ascending=False) # total null values
percent = (null_val_sum/df.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
return missing_data
'''
Example:
missing_data(df_with_nulls)
@aflansburg
aflansburg / plot.rb
Last active April 7, 2021 20:58
Ruby Modules, require, module_function, and Rails
# imagine this lives at app/lib/complicated_data
module ComplicatedData
def generate_complicated_plot_data(start_date: 30.days.ago, end_date: Date.today, type: nil)
# implementation
end
# module_function ensures the method cannot be overridden or extended
module_function :generate_complicated_plot_data
end
@aflansburg
aflansburg / haversine.rb
Last active April 6, 2021 19:07
Haversine - Straight Line Distance Between Two Sets of Geographical Coordinates (Latitude, Longitude)
# radius of earth in meters
R = 6371000;
def haversine(coord1:, coord2:)
# first add our own radians function since Ruby Math does not have one
radians = -> (degrees) { degrees * (Math::PI / 180)}
# convert latitude degrees to radians
phi_1 = radians.call(coord1[:latitude]);
phi_2 = radians.call(coord2[:latitude]);