Skip to content

Instantly share code, notes, and snippets.

View wassname's full-sized avatar
🙃

wassname (Michael J Clark) wassname

🙃
View GitHub Profile
@wassname
wassname / ordered_quantile_loss_for_ml.py
Last active October 4, 2022 13:08
ordered quantile loss for machine learning
"""
Sometimes we want to use quantiles loss in machine learning, but the outputs are not ordered. This is sometimes called the quantile crossover problem.
Surely it would help to impose the constraint that the quantiles must be ordered?
What's the best way to do this?
Well it seems me that we should predict differences from the median,
and apply a softplus to make sure the differences are only in one direction.
Note this will NOT work for very small target values. Because we are using a softplus the model must output very large
logits to get very small numbers. This means it will have difficulty with small y values.
@wassname
wassname / main.py
Last active August 4, 2023 00:03
mikes notebook starter
# autoreload import your package
%load_ext autoreload
%autoreload 2
## secrets
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
## numeric, plotting
import numpy as np
@wassname
wassname / dilate_fast.py
Last active August 9, 2022 14:54
DILATE_cuda
"""
DILATE_fast
DILATE cuda implementation
DILATE: DIstortion Loss with shApe and tImE
WARNING:
- does NOT work for larger batch sizes
- if you're dumpster diving for loss functions in other peoples dirty gists, then you deserve what you get
# load mask
maskp = '../data/raw/NBIA/ProstateX/PROSTATEx_masks/Files/lesions/Masks/For_dcm2niix_files/ADC/\
ProstateX-0142-Finding3-ep2d_diff_tra_DYNDIST_ADC0_ROI.nii.gz'
y = nib.load(ypath).get_fdata()
# 1. reverse what nii2dcm did
y = y.transpose((1, 0, 2))[::-1, :, ::-1]
# load dicom
@wassname
wassname / pdshow.py
Last active September 22, 2021 18:48
show a pandas data frame in full
from IPython.display import display
import pandas as pd
def pdshow(df):
"""
This shows a pandas dataframe in full/
Also consider .to_html() and https://pbpython.com/dataframe-gui-overview.html
@wassname
wassname / pandas_cache.py
Last active June 18, 2023 03:21
a simple pandas and pickle cache for complex situations, like deep learning where you can't easily cachebust based on the model
"""
Implements on disk caching of transformed dataframes
Used on a function that returns a single pandas object,
this decorator will execute the function, cache the dataframe as a pickle
file using the hash of function and subdirectory, and the arguments and filename.
The next time the function runs, if the hashes match what is on disk, the decoratored function will simply load and return
the pickled pandas object.
This can result in speedups of 10 to 100 times, or more, depending on the
complexity of the function that creates the dataframe.
@wassname
wassname / dcm2df.py
Last active June 9, 2021 01:12
dicoms2df.py
"""
read dicom header and cache
url https://gist.github.com/wassname/a2cdf0b9b511f8a4769cbbe040a87900
"""
from diskcache import Cache, JSONDisk
import pandas as pd
from tqdm.contrib.concurrent import thread_map, process_map
import logging
from pathlib import Path
from functools import partial
@wassname
wassname / split_by_unique_col.py
Last active May 9, 2021 04:01
split_by_unique_col
from sklearn.model_selection import train_test_split
import pandas as pd
def shuffle_df(df, random_seed=42):
return df.sample(frac=1, random_state=random_seed, replace=False)
def split_by_unique_col(df, col='patient_id', stratify_cols=[], random_seed=42):
"""
Make a dataframe of unique ids, with our stratification data
@wassname
wassname / dicom_over_http.py
Last active April 10, 2021 06:24
how to read only metadata from a dicom url
"""
how to read only metadata from a dicom url to save bandwith
- note the server must support HTTP Range, e.g. s3 buckets or azure blobs.
- note that if you don't mind reading the whole thing, it's easier to just read the whole thing, then pass it into pydicom as io.BytesIO
url: https://gist.github.com/wassname/70106b2d66a7c6e83e4b0300c9d1d4d3
"""
@wassname
wassname / rebalance_df.py
Last active October 16, 2022 02:06
split stratify pandas by unique
"""
If you want to split and sample at the same time use something else.
but in timeseries sometimes you want to split by time, then resample to get balanced weights
@url:https://gist.github.com/wassname/f34321d4797a356a82802bdfb935e6cd/edit
@author:wassname
@lic: meh
"""