Skip to content

Instantly share code, notes, and snippets.

View wassname's full-sized avatar
🙃

wassname (Michael J Clark) wassname

🙃
  • I'm just a guy who likes to machine learn
  • IAU #90377 (Sedna), IAU #137924 (2000 BD19), IAU #85770 (1998 UP1), IAU #66063 (1998 RO1), and minor planet 1992 TY₁. from 2025 Feb 25.
  • X @wassname
  • LinkedIn in/mclark52
View GitHub Profile
@wassname
wassname / twohot.md
Last active July 22, 2025 00:38
two-hot encoding notes

What is two-hot encoding?

Description

Two hot encoding was introduced in 2017 in "Marc G Bellemare et all "A distributional perspective on reinforcement learning" but the clearest description is in the 2020 paper "Dreamer-v3" by Danijar Hafner et al.) where it is used for reward and value distributions.

two-hot encoding is a generalization of onehot encoding to continuous values. It produces a vector of length |B| where all elements are 0 except for the two entries closest to the encoded continuous number, at positions k and k + 1. These two entries sum up to 1, with more weight given to the entry that is closer to the encoded number

Code samples

@wassname
wassname / torch_scalar.py
Created December 27, 2023 01:06
wrap sklearn scalars for torch
"""
how to wrap a scikit-learn scalar like RobustScaler for pytorch
"""
import torch
import numpy as np
from einops import rearrange
from sklearn.preprocessing import StandardScaler, RobustScaler
class TorchRobustScaler(RobustScaler):
@wassname
wassname / style_df.py
Created December 23, 2023 22:57
How to style dataframes in vscode
"""
you cannot display, you need to specify html
- see also https://pandas.pydata.org/docs/user_guide/style.html#Builtin-Styles
"""
import pandas as pd
from IPython.display import display, HTML
df = pd.DataFrame({
"strings": ["Adam", "Mike"],
"ints": [1, 3],
@wassname
wassname / justfile
Last active April 11, 2025 01:02
justfile cheatsheet
# see https://cheatography.com/linux-china/cheat-sheets/justfile/
# we can set sehll
set shell := ["zsh", "-cu"]
# settings
set dotenv-load
# Export all just variables as environment variables.
set export
@wassname
wassname / argparse_in_jupyter.py
Last active March 5, 2025 06:41
argparse in jupyter?
"""
sometimes you want to run or adapt a cli script from jupyter, here a decent way to do it
# https://gist.github.com/wassname/f1c23636cc04b39176bd82f45c6e398a
"""
def jupyter_argparse(s: str) -> list:
"""
Usage
s = '''
--rank 16
@wassname
wassname / gpt4v_on_public_eng_docs.ipynb
Created November 7, 2023 00:29
gpt4v on public domain engineering docs
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@wassname
wassname / emojis.json
Created November 4, 2023 01:00
Emoji's and their uses according to llama uncensored
This file has been truncated, but you can view the full file.
{
"🦆": {
"tags": [
"Waterfowl",
"Bird",
"Quack"
],
"usage": [
"🦆🌊: swimming duck",
"🦆🍞: feeding ducks",
@wassname
wassname / STOP_DOING_MATH.md
Last active April 4, 2025 22:55
It turns out LLM's can generate the STOP DOING MATH meme https://knowyourmeme.com/memes/stop-doing-math

STOP DOING MATH

  • NUMBERS WERE NOT SUPPOSED TO BE GIVEN NAMES
  • YEARS OF COUNTING yet NO REAL-WORLD USE FOUND for going higher than your FINGERS
  • Wanted to go higher anyway for a laugh? We had a tool for that: It was called "GUESSING"
  • "Yes please give me ZERO of something. Please give me INFINITE of it" - Statements dreamed up by the utterly Deranged

LOOK at what Mathematicians have been demanding your Respect for all this time, with all the calculators & abacus we built for them

@wassname
wassname / split_by_token.py
Last active October 7, 2023 01:28
Perfect text splitter for LLM's
"""
When splitting text for Language Models, aim for two properties:
- Limit tokens to a maximum size (e.g., 400)
- Use natural boundaries for splits (e.g. ".")
Many splitters don't enforce a token size limit, causing errors like "device assert" or "out of memory." Others focus on character length rather than token length. To address these issues:
- Use RecursiveCharacterTextSplitter from the langchain library
- Set the last separator to an empty string '' to ensure there is always a splitting point, thus maintaining token limits
@wassname
wassname / cuda_11.8_installation_on_Ubuntu_22.04
Last active October 6, 2023 04:20 — forked from MihailCosmin/cuda_11.8_installation_on_Ubuntu_22.04
Instructions for CUDA v11.8 and cuDNN 8.7 installation on Ubuntu 22.04 for PyTorch 2.0.0
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check