This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def compare_table_schemas(client, project: str, dataset: str, | |
| table_a: str, table_b: str) -> bool: | |
| """Compare the schemas of two BigQuery tables. Useful for instance | |
| to confirm that there hasn't been a drift in the schemas for the | |
| production and development tables. | |
| :param client: BigQuery client object | |
| :param project: string - GCP project ID | |
| :param dataset: string - dataset name | |
| :param table_a: string - table name |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def calc_beta_f1(beta, precision, recall): | |
| """Calculate F-beta given the values for precision and recall and the beta value""" | |
| beta_sq = pow(beta, 2) | |
| num = (1 + beta_sq)*precision*recall | |
| denom = beta_sq*precision+recall | |
| return num/denom |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def remove_non_printing_chars(df): | |
| """Clean a dataframe column to remove any non-printing characters. | |
| We've encountered values like tabs in some of the data. | |
| :param df: Pandas dataframe | |
| :return: Pandas dataframe | |
| """ | |
| clean_df = df.copy(deep=True) | |
| clean_df = clean_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x) | |
| for col in list(clean_df.columns): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import sys | |
| """Utility function to update sys.path so in our notebooks you can import modules from | |
| any folder in the application. It will also allow you to import any module in your virtual | |
| environment. Note that in my project the virtual environment is named "venv". | |
| In the notebook, in the first cell, import this script. It will run | |
| automatically |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Use Case: Here we have a dict "stats" with four keys with arrays and floats as values, both of which can trip you up. | |
| # We solve it by using Numpy .all() and PyTest approx() | |
| assert type(stats['observed']) is np.ndarray | |
| assert type(stats['expected']) is np.ndarray | |
| assert (stats['observed'] == [[1, 2, 3], [4, 5, 6], [7, 8, 9]]).all() | |
| assert (stats['expected'] == [[1.6, 2.0, 2.4], [4.0, 5.0, 6.0], [6.4, 8.0, 9.6]]).all() | |
| assert stats['G'] == pytest.approx(0.49173089057312613) | |
| assert stats['p'] == pytest.approx(0.9743009689044624) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Consider this code | |
| x = 5 | |
| def func1(): | |
| print(x) | |
| func1() | |
| # Output | |
| 5 | |
| # Since x is declared before the function call, func1 can access it. | |
| # However, if you try to change it: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| def test_dict_equality(dict_1, dict_2): | |
| false_matches = 0 | |
| for key in dict_1: | |
| if key in dict_2: | |
| if not np.array_equal(dict_2[key], dict_2[key]): | |
| false_matches += 1 | |
| return false_matches == 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # I've seen a lot of posts with instructions for opening a Jupyter Notebook on your | |
| # local device with the Juyter server running on a GCP VM. They make it seem really | |
| # complicated. It ain't that hard folks! | |
| 1. On the GCP Compute Engine UI click on the drop-down menu on the upper left side | |
| under Remote access. Select "view gcloud command" and copy the command. | |
| 2. To that command append -- -L localhost:8887:127.0.0.1:8889 | |
| Example: | |
| gcloud beta compute ssh --zone "<zone-name>" "<vm-instanve-name>" --project "<project-name>" -- -L localhost:8887:127.0.0.1:8889 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import gzip | |
| import json | |
| source_dict = { | |
| "New Year's Day": "Fri, Jan 1, 2021", | |
| "Martin Luther King Jr. Day": "Mon, Jan 18, 2021", | |
| "Washington's Birthday": "Mon, Feb 15, 2021", | |
| "Arbor Day": "Fri, Apr 30, 2021", | |
| "Memorial Day": "Mon, May 31, 2021", |