Skip to content

Instantly share code, notes, and snippets.

@Mlawrence95
Mlawrence95 / confusion_matrix.py
Last active March 26, 2024 10:25
Python: create a confusion matrix across two columns in a Pandas dataframe having only categorical data
import pandas as pd
def confusion_matrix(df: pd.DataFrame, col1: str, col2: str):
"""
Given a dataframe with at least
two categorical columns, create a
confusion matrix of the count of the columns
cross-counts
use like:
@Mlawrence95
Mlawrence95 / dataframes_to_latex_columns.py
Last active November 5, 2019 23:17
Converts two Pandas dataframes into a two-column latex table. Supports adding a title for each column and an overall title.
import pandas as pd
def latex_two_column_table(title: str, l_caption: str, r_caption: str, l_df: pd.DataFrame, r_df: pd.DataFrame):
"""
Use to print out two-columned LaTeX code (or route into a file) for a set of dataframes
"""
l_df_tex = l_df.to_latex()
r_df_tex = r_df.to_latex()
@Mlawrence95
Mlawrence95 / shallow_flatten_directory.py
Last active October 21, 2019 23:38
** DESTRUCTIVE CODE -- DON'T COPY AND PASTE WITHOUT READING** Unpacks folders at the specified location to one level. Can be applied recursively to flatten everything if desired.
import os
import shutil
def flatten_directory(directory, delete_after=False):
"""
Flattens all folders in directory, deleting the empty folders after.
**WARNING**
This code WILL DELETE YOUR FILES
if used naively. Seriously.
@Mlawrence95
Mlawrence95 / get_word_counts.py
Last active November 5, 2019 19:13
Takes a document (string) or iterable of documents and returns a Pandas dataframe containing the number of occurrences of each unique word. Note that this is not efficient enough to replace Scikit's CountVectorizer class for a bag of words transformer.
import numpy as np
import pandas as pd
def get_word_counts(document: str) -> pd.DataFrame:
"""
Turns a document into a dataframe of word, counts
Use preprocessing/lowercasing before this step for best results.
If passing many documents, use document = '\n'.join(iterable_of_documents)
@Mlawrence95
Mlawrence95 / clone_private_repo.txt
Created December 5, 2019 23:45
Trying to access a private repo? Use this format to pull it down. (Yes, it asks for your password at the command line. Only do this in low-risk environments)
git clone https://[insert username]:[insert password]@github.com/[insert organisation name]/[insert repo name].git
@Mlawrence95
Mlawrence95 / make_old_pickles_openable.py
Created December 5, 2019 23:51
Old pickle files can be a pain to work with. This can make SliceTypes and ObjectType exceptions go away in certain circumstances.
import pickle
import dill
dill._dill._reverse_typemap['SliceType'] = slice
dill._dill._reverse_typemap['ObjectType'] = object
@Mlawrence95
Mlawrence95 / pyplot_set_params.py
Created December 16, 2019 17:24
matplotlib allows you to set plot parameters via a param dict. Here's one such example
import matplotlib.pyplot as plt
params = {'legend.fontsize': 'x-large',
'figure.figsize': (15, 15),
'axes.labelsize': 'x-large',
'axes.titlesize': 'x-large',
'xtick.labelsize': 'x-large',
'ytick.labelsize': 'x-large'}
plt.rcParams.update(params)
@Mlawrence95
Mlawrence95 / open_files.py
Created March 26, 2020 18:03
Helpers to open common file types to python data analysis, json and pickle. Great addition to your startup.ipy file in ~/.ipython/profile_default/startup/
import json
import pickle
def openJSON(path):
"""
Safely opens json file at 'path'
"""
with open(path, 'r') as File:
data = json.load(File)
@Mlawrence95
Mlawrence95 / get_timestamp.py
Created March 31, 2020 22:18
Use python's time library to print the date as a single string in m/d/y format, GMT. Useful for adding timestamps to filenames
import time
def get_timestamp():
"""
Print the date in m/d/y format, GMT
>>> get_timestamp()
'3_31_2020'
"""
t = time.gmtime()
@Mlawrence95
Mlawrence95 / mp3_to_plot.py
Created April 21, 2020 21:30
[python] convert .mp3 file into a .wav, then visualize the sound using a matplotlib plot
import matplotlib.pyplot as plt
import soundfile as sf
from pydub import AudioSegment
# we want to convert source, mp3, into dest, a .wav file
source = "./recordings/test.mp3"
dest = "./recordings/test.wav"
# conversion - check!