Cameron R. Wolfe wolfecameron

AI Researcher @ Netflix • Writer of Deep (Learning) Focus • PhD in CS from Rice University

wolfecameron / olmo_trace.py

Last active June 19, 2025 23:35

Tracing text using the algorithm proposed by OLMoTrace: https://arxiv.org/abs/2504.07096

	import ast
	import math
	import random

	from infini_gram.engine import InfiniGramEngine
	from transformers import AutoTokenizer

	def compute_longest_prefix(query, doc):
	"""helper function for computing longest prefix of query that exists
	within a document"""

wolfecameron / olmo_trace_index.py

Created May 17, 2025 04:30

Build an infini-gram index (and store other relevant info) for use in OLMoTrace.

	import os
	import json
	from collections import Counter
	import tempfile

	from transformers import AutoTokenizer

	# load tokenizer / data
	enc = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", add_bos_token=False, add_eos_token=False)
	data_rows = [{'text': 'here is some training data'}, ...]

wolfecameron / cross_attention.py

Last active April 2, 2025 03:54

An implementation of cross-attention in PyTorch.

	import math
	import torch
	from torch import nn
	import torch.nn.functional as F

	class CrossAttention(nn.Module):

	def __init__(self, d):
	"""
	Arguments:

wolfecameron / bidir_self_attn.py

Last active March 29, 2025 17:36

An implementation of bidirectional self-attention in PyTorch.

	import math
	import torch
	from torch import nn
	import torch.nn.functional as F

	class SelfAttention(nn.Module):

	def __init__(self, d):
	"""
	Arguments:

wolfecameron / moe_block.py

Created March 6, 2025 22:24

MoE block for an MoE-based decoder-only transformer model in PyTorch.

wolfecameron / expert_layer.py

Created March 6, 2025 21:17

PyTorch implementation of a feed-forward expert layer within an MoE.

	"""
	Based upon ColossalAI OpenMoE
	"""

	from torch import nn

	class MOELayer(nn.Module):
	def __init__(
	self,
	d,

wolfecameron / router_z_loss.py

Created March 6, 2025 18:52

An implementation of the MoE router z-loss in PyTorch.

	"""
	Computes ST-MoE router z loss (https://arxiv.org/abs/2202.08906)
	See equation (5) on page 7
	"""

	import torch

	# constants
	B = 16 # batch size
	C = 256 # sequence length

wolfecameron / load_balancing_loss.py

Created March 6, 2025 18:31

An implementation of the MoE load balancing loss in PyTorch.

	"""
	Computes Switch Transformer auxiliary loss (https://arxiv.org/abs/2101.03961)
	See equations (4)-(6) on page 7
	"""

	import torch
	import torch.nn.functional as F

	# constants
	B = 16 # batch size

wolfecameron / full_softmax_router.py

Last active March 6, 2025 21:15

Implementation of a fully-functional softmax routing mechanism with expert capacity.

	import math

	import torch
	from torch import nn
	from torch.nn import functional as F

	class Router(nn.Module):
	def __init__(
	self,
	d,

wolfecameron / basic_softmax_router.py

Last active March 6, 2025 18:47

Implementation of a basic softmax routing mechanism for an MoE.

	import torch
	from torch import nn
	from torch.nn import functional as F

	class BasicSoftmaxRouter(nn.Module):
	def __init__(
	self,
	d,
	n_exp = 8,
	top_k = 2,