Skip to content

Instantly share code, notes, and snippets.

View bigsnarfdude's full-sized avatar

BigsnarfDude bigsnarfdude

View GitHub Profile
@bigsnarfdude
bigsnarfdude / grpo.ipynb
Created February 2, 2025 01:56
grpo.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bigsnarfdude
bigsnarfdude / infer.py
Created February 2, 2025 01:55
infer.py
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from random import randint
# Load model and tokenizer
model_id = "philschmid/qwen-2.5-3b-r1-countdown"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load dataset
@bigsnarfdude
bigsnarfdude / phil_grpo_working.py
Last active February 2, 2025 00:59
phil_grpo_working.py
from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
import re
from trl import GRPOConfig, GRPOTrainer
from peft import LoraConfig, TaskType
import torch
# Dataset and Tokenizer Setup
dataset_id = "Jiayi-Pan/Countdown-Tasks-3to4"
dataset = load_dataset(dataset_id, split="train")
@bigsnarfdude
bigsnarfdude / phil_grpo.py
Last active February 1, 2025 01:28
phil_grpo.py
# Install Pytorch & other libraries, make sure to match your GPU driver version
%pip install "torch==2.5.1" tensorboard "setuptools<71.0.0" --index-url https://download.pytorch.org/whl/cu121
# Install flash-attn
%pip install flash-attn
# Install Hugging Face libraries
%pip install --upgrade \
"transformers==4.48.1" \
"datasets==3.1.0" \
@bigsnarfdude
bigsnarfdude / grpo_experiment_found_online.py
Last active February 1, 2025 01:59
grpo_experiment_found_online.py
import logging
import os
from dataclasses import dataclass
from datetime import datetime
import logging
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
import random
import re
import torch
@bigsnarfdude
bigsnarfdude / currently_working.py
Created January 30, 2025 18:25
currently_working.py
import os
import sys
import logging
import time
import traceback
import gc
import functools
from typing import Optional, List, Tuple
from dataclasses import dataclass
import numpy as np
@bigsnarfdude
bigsnarfdude / binary_memory_cold_start_working_ppo.py
Created January 30, 2025 15:51
binary_memory_cold_start_working_ppo.py
import os
import sys
import logging
import time
import traceback
import gc
import functools
from typing import Optional, List, Tuple
from dataclasses import dataclass
import numpy as np
@bigsnarfdude
bigsnarfdude / working_ppo.py
Last active January 31, 2025 05:12
working_ppo.py
import os
import sys
import logging
import time
import traceback
import gc
import functools
from typing import Optional, List, Tuple
from dataclasses import dataclass
import numpy as np
@bigsnarfdude
bigsnarfdude / ppo.py
Last active January 30, 2025 06:50
ppo.py
import os
import sys
import logging
import time
import traceback
import gc
import functools
from typing import Optional, List, Tuple
from dataclasses import dataclass
@bigsnarfdude
bigsnarfdude / gist:94414c0a9511664b56fef6dd32416152
Last active January 28, 2025 04:31
deepseek r1 question that gets it to talk
>>> why did the chinese man stop the tank
<think>
Okay, so I came across this question: "Why did the Chinese man stop the tank?" At first glance, it seems like a historical reference because of
the mention of tanks. Maybe it's about some event in China involving tanks and civilians.
I remember hearing something about the Tiananmen Square protests in 1989, where there were massive demonstrations in Beijing, and the
government responded with military force. There was this iconic image of a man standing in front of a column of tanks, which became a symbol of
resistance and courage.
So, perhaps that's what the question is referring to. The Chinese man stopping the tank could be about that moment when one individual stood up