Skip to content

Instantly share code, notes, and snippets.

View GDBSD's full-sized avatar

Gary Biggs GDBSD

View GitHub Profile
@GDBSD
GDBSD / compare_bq_table_schemas.py
Created December 12, 2022 18:54
Compare BigQuery table schemas
def compare_table_schemas(client, project: str, dataset: str,
table_a: str, table_b: str) -> bool:
"""Compare the schemas of two BigQuery tables. Useful for instance
to confirm that there hasn't been a drift in the schemas for the
production and development tables.
:param client: BigQuery client object
:param project: string - GCP project ID
:param dataset: string - dataset name
:param table_a: string - table name
@GDBSD
GDBSD / calc_fbeta.py
Created June 11, 2022 18:21
Calculate F-Beta
def calc_beta_f1(beta, precision, recall):
"""Calculate F-beta given the values for precision and recall and the beta value"""
beta_sq = pow(beta, 2)
num = (1 + beta_sq)*precision*recall
denom = beta_sq*precision+recall
return num/denom
@GDBSD
GDBSD / nonprint-char_remover.py
Created October 25, 2021 14:24
Remove non-printing characters from a Pandas dataframe
def remove_non_printing_chars(df):
"""Clean a dataframe column to remove any non-printing characters.
We've encountered values like tabs in some of the data.
:param df: Pandas dataframe
:return: Pandas dataframe
"""
clean_df = df.copy(deep=True)
clean_df = clean_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
for col in list(clean_df.columns):
@GDBSD
GDBSD / path_setter.py
Last active July 9, 2021 17:25
jupter-easy-import-modules-from-higher-directories
import os
import sys
"""Utility function to update sys.path so in our notebooks you can import modules from
any folder in the application. It will also allow you to import any module in your virtual
environment. Note that in my project the virtual environment is named "venv".
In the notebook, in the first cell, import this script. It will run
automatically
@GDBSD
GDBSD / pytest_compare_arrays_floats.py
Created January 12, 2021 23:35
PyTest - comparing arrays and floats
# Use Case: Here we have a dict "stats" with four keys with arrays and floats as values, both of which can trip you up.
# We solve it by using Numpy .all() and PyTest approx()
assert type(stats['observed']) is np.ndarray
assert type(stats['expected']) is np.ndarray
assert (stats['observed'] == [[1, 2, 3], [4, 5, 6], [7, 8, 9]]).all()
assert (stats['expected'] == [[1.6, 2.0, 2.4], [4.0, 5.0, 6.0], [6.4, 8.0, 9.6]]).all()
assert stats['G'] == pytest.approx(0.49173089057312613)
assert stats['p'] == pytest.approx(0.9743009689044624)
@GDBSD
GDBSD / get_global.py
Last active November 16, 2020 17:18
"global" makes a previously declared variable global
# Consider this code
x = 5
def func1():
print(x)
func1()
# Output
5
# Since x is declared before the function call, func1 can access it.
# However, if you try to change it:
@GDBSD
GDBSD / compare_dicts.py
Created October 17, 2020 23:35
Python - Compare Dictionaries
import numpy as np
def test_dict_equality(dict_1, dict_2):
false_matches = 0
for key in dict_1:
if key in dict_2:
if not np.array_equal(dict_2[key], dict_2[key]):
false_matches += 1
return false_matches == 0
@GDBSD
GDBSD / gcp_jupyter_setup.txt
Last active October 15, 2020 00:56
GCP VM - Working With Jupyter Notebook On Your Local Device
# I've seen a lot of posts with instructions for opening a Jupyter Notebook on your
# local device with the Juyter server running on a GCP VM. They make it seem really
# complicated. It ain't that hard folks!
1. On the GCP Compute Engine UI click on the drop-down menu on the upper left side
under Remote access. Select "view gcloud command" and copy the command.
2. To that command append -- -L localhost:8887:127.0.0.1:8889
Example:
gcloud beta compute ssh --zone "<zone-name>" "<vm-instanve-name>" --project "<project-name>" -- -L localhost:8887:127.0.0.1:8889
@GDBSD
GDBSD / compress_dict.py
Last active September 19, 2020 15:20
Compress and decompress a Python dictionary
import gzip
import json
source_dict = {
"New Year's Day": "Fri, Jan 1, 2021",
"Martin Luther King Jr. Day": "Mon, Jan 18, 2021",
"Washington's Birthday": "Mon, Feb 15, 2021",
"Arbor Day": "Fri, Apr 30, 2021",
"Memorial Day": "Mon, May 31, 2021",