This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# /// script | |
# requires-python = ">=3.10" | |
# dependencies = [ | |
# "tiktoken", | |
# "typer", | |
# "numpy", | |
# ] | |
# /// | |
""" | |
Verify and analyze a JSONL dataset for fine-tuning with OpenAI models. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import typing | |
import datasets | |
import torch | |
import torch.utils.data | |
def load_imagenet_v2( | |
split: typing.Literal[ | |
"threshold0.7", "top-images", "matching-frequency" | |
] = "threshold0.7" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# %% | |
import collections | |
import random | |
def predict_next_letter(model, five_gram): | |
if len(five_gram) != 5: | |
raise ValueError("five_gram must be of length 5") | |
m = model[tuple(five_gram)] | |
return m[True] > m[False] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
pdffile="$1" | |
prefix=$(basename "${pdffile}" .pdf) | |
convert -density 300 "${pdffile}" "${prefix}-%03d.png" | |
mogrify -background white -flatten "${prefix}-*.png" | |
total_pages=$(ls ${prefix}-*.png | wc -l) | |
for ((i=0; i<$total_pages; i+=2)); do | |
# Format page numbers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
MIT License | |
Copyright (c) 2023 Andreas 'blackhc' Kirsch | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# !pip install rich markdownify | |
# https://chat.openai.com/share/5f3f2019-2051-4217-93ea-c926fa3c2749 | |
import markdownify | |
from IPython.core.getipython import get_ipython | |
from IPython.display import HTML | |
from rich import print | |
from rich.console import Console | |
from rich.markdown import Markdown | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Latency Comparison Numbers (~2023) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Snappy 3,000 ns 3 µs | |
Read 1 MB sequentially from memory 20,000 ns 20 µs ~50GB/sec DDR5 | |
Read 1 MB sequentially from NVMe 100,000 ns 100 µs ~10GB/sec NVMe, 5x memory |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# PoC to cache prompts. Drop in your code. | |
# Andreas 'blackhc' Kirsch, 2023 | |
from typing import List, Optional | |
import langchain | |
from langchain import OpenAI | |
from langchain.cache import SQLiteCache | |
from langchain.schema import ( | |
AIMessage, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Keybase proof | |
I hereby claim: | |
* I am blackhc on github. | |
* I am akirsch (https://keybase.io/akirsch) on keybase. | |
* I have a public key ASAL244d-fNnU5c_WrWOrPiQYhXXjYWWL9chxLqzoCIgkgo | |
To claim this, I am signing this object: |
NewerOlder