Skip to content

Instantly share code, notes, and snippets.

View brishtiteveja's full-sized avatar

Andy Khan (Ananda) brishtiteveja

View GitHub Profile
@willccbb
willccbb / grpo_demo.py
Last active April 17, 2025 09:34
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
#!/usr/bin/python3.5
# -*-coding:Utf-8 -*
import random
import operator
import time
import matplotlib.pyplot as plt
temps1 = time.time()
@smartnose
smartnose / spark-internals-through-code.md
Last active October 29, 2024 06:03
Spark internal notes

Spark internals through code

Nothing gives you more detail about spark internals than actually reading it source code. In addition, you get to learn many design techniques and improve your scala coding skills. These are the random notes I make while reading the spark code. The best way to comprehend the notes is to load spark code into an IDE, e.g. IntelliJ, and navigate the code on the side.

Genesis - creation of a spark cluster

The scripts for creating a spark cluster are: start-master.sh and start-slave.sh. Read them carefully, and you can see that both scripts are very similar except the values for $CLASS variable. For start-master.sh, the value is CLASS="org.apache.spark.deploy.master.Master", while the value for start-slave.sh is shown below with more context.

# NOTE: This exact class name is matched downstream by SparkSubmit.