Aidan Do aidando73

Training GPT-2 124M

I tried the GPT-2 124M parameter model on 3 different GPU setups - following karpathy/llm.c#481. Here's how long each run it took and how much it cost:

	1x A10G	1x A100 40GB	8x A100 40GB
training time	48h	15h	2h
cost	$163	$27	$45

Most of the completions are fairly nonsensical, but here are some interesting ones:

vislog.py

Python script version of

https://github.com/karpathy/llm.c/blob/7ecd8906afe6ed7a2b2cdb731c042f26d525b820/dev/vislog.ipynb

Avoids needing a jupyter server

Run from root of llm.c repo

Run python train_gpt2.py
Copy paste the .cu snippet below right above this line here:

https://github.com/karpathy/llm.c/blob/7ecd8906afe6ed7a2b2cdb731c042f26d525b820/train_gpt2.cu#L1761

Use https://tiktokenizer.vercel.app/?model=gpt2 to generate tokens
Copy and paste the values into tokens array. (Keep the -1 at the end - marks the end of the array)
Run the .sh snippet listed below, linking your model binary

After booting a new instance:

curl -H 'Cache-Control: no-cache' https://gist.githubusercontent.com/aidando73/2876aabaae7ded0dec68d04d34b70086/raw/new_instance.bash | bash

If you want to use conda:

source ~/miniconda3/bin/activate

https://www.reddit.com/r/LocalLLaMA/comments/16whnun/how_do_you_account_for_varying_llm_output_with/
https://huggingface.co/blog/open-llm-leaderboard-mmlu
Inspect a few eval datasets, mmlu: https://huggingface.co/datasets/lukaemon/mmlu, hella swag
Look into: https://crfm.stanford.edu/helm/lite/latest/
Look into how https://github.com/EleutherAI/lm-evaluation-harness/tree/main compute which answer the LLM chose. Aparently they gather a set of tokens and then compute probabilities - https://huggingface.co/blog/open-llm-leaderboard-mmlu

Commands run were: https://github.com/aidando73/OpenHands/pull/1/files
Ran on OpenHands + CodeAct 2.1 - ecff5c67fb7f1995556f0f36f5050f33dc0953d2
Ran on SWE bench lite (300 instances)
Provider: OpenRouter

Llama 3.3 70B Instruct

Score: 0.047
model: "openrouter/meta-llama/llama-3.3-70b-instruct"

	Trailing
	~!@#$%^&*()_+{}\|:"<>?[]\',./`
	👾 🙇 💁 🙅 🙆 🙋 🙎 🙍
	Ṱ̺̺̕o͞ ̷i̲̬͇̪͙n̝̗͕v̟̜̘̦͟o̶̙
	<sc<script>ript>alert(13)</sc</script>ript>