Joseph Catrambone JosephCatrambone

Overview:

Problem: Models are big and it's hard to change facts that they learn.
In a nutshell: mend generates a weight delta that's decomposed into a low-rank matrix like LoRA.
"local, reliable, and general"
"Local" means unrelated output is not changed. "Reliable" means the model takes the desired corrections. "General" meaning variations on similar questions which would need correction also are corrected.
Works even on very large models.

Differences to Prior Art:

ENN encodes editability into parameters of the model itself. MEND provides editability through independent model. (ENN is closer to fine-tune? MEND closer to LoRA?)
KE uses raw edit example as input and produces a single rank-1 mask and rank-1 offset over fine tune grad. MEND maps model grads into model edits.

	/*
	RTree.rs
	Author: Joseph Catrambone <me at josephcatrambone.com>
	License: MIT / GPL at User's Discretion
	Description: A simple high-level n-dimensional tree that's made to work with any indexable data type.
	*/

	use std::mem;
	use std::ops::Index;

	def square_iterator(x: int, y: int):
	"""Generate all the points on a square 2d lattice, spiraling out from the center. Modified from a solution by Nikita Rybak. https://stackoverflow.com/a/3706260"""
	dx = 1
	dy = 0
	segment_length = 1
	leg_iterations = 0

	while True:
	x += dx
	y += dy

	import itertools
	import sys

	import numpy
	import torch
	from scipy.cluster.vq import kmeans, kmeans2
	from transformers import AutoTokenizer, GPT2Model

	def magnitude(x):
	return numpy.dot(x, x)**0.5


	mod ml_thread;

	use gdnative::prelude::{godot_print, methods, Method, NativeClass, Node as GDNode, InitHandle, godot_init};
	use ml_thread::start_language_model_thread;
	use std::sync::mpsc::{channel, Receiver, RecvError, Sender, SendError};


	const MAX_INPUT_LENGTH: usize = 512;
	const BATCH_SIZE: usize = 1;

	class UnionFind:
	def __init__(self):
	self.parents = dict()

	def add(self, node_id:int):
	self.parents[node_id] = node_id

	def union(self, node_a:int, node_b:int):
	min_parent = min(self.parents[node_a], self.parents[node_b])
	self.parents[node_a] = min_parent

	#include <BleKeyboard.h>
	#include <TinyPICO.h>

	TinyPICO tinypico = TinyPICO();
	BleKeyboard keyboard; // keyboard("Name", "manufacturer", battery_level_100)

	// a .-
	// b -...
	// c -.-.
	// d -..

	# Sample Usage:
	# var day_features = [
	# # freezing, raining, foggy, sunny
	# [0, 1, 0, 0],
	# [0, 0, 0, 1],
	# [0, 0, 0, 0],
	# [0, 0, 1, 0],
	# [1, 0, 0, 1],
	# [1, 1, 0, 0],
	# [1, 0, 1, 0],


	//use serde::{Serialize, Deserialize};
	use std::collections::HashMap;

	// If you have serde, serialization is available with this derive chain:
	//#[derive(Clone, Debug, Default, Serialize, Deserialize)]
	#[derive(Clone, Debug, Default)]
	pub struct DecisionTree {
	feature: usize,
	threshold: f32,

	import numpy
	import random

	class Graph:
	def __init__(self):
	self.nodes = list()
	self.last_tape = None
	def forward(self, variables, output_node):
	for n in self.nodes:
	n.forward(variables)