Douglas Hanley iamlemec

iamlemec / nb2md

Created September 27, 2015 17:52

Markdown diffs for jupyter notebooks. Requires nbconvert package.

	#!/usr/bin/env bash

	# Step 1: put this file in your path and make executable
	# Step 2: add the following to your .gitattributes file
	# *.ipynb diff=nb2md
	# Step 3: add the following to your .git/config
	# [diff "nb2md"]
	# textconv = nb2md
	# or to it globally with
	# git config --global diff.nb2md.textconv nb2md

iamlemec / vector.css

Created August 3, 2016 20:09

Wikipedia CSS (for vector theme) that makes things look super modern and awesome.

	@import url(//fonts.googleapis.com/css?family=Open+Sans:400,700,400italic,700italic);

	body {
	background-color: white;
	font-family: 'Open Sans', sans-serif;
	}

	#content {
	width: 700px;
	margin-top: 50px;

iamlemec / cwdiff

Last active October 18, 2017 20:47

Diff between two PDFs. Crude, but useful for revisions. Requires wdiff and pdftotext. See diff.sh for usage.

	#!/bin/sh

	# Use this instead of diff[1] to get colored[2] word-based diffs.
	# Useful for text documents that have reflowed paragraphs.
	# Requires that wdiff is installed in your $PATH.
	#
	# [1] All diff options are ignored. Only replaces simplest usage.
	# [2] Colors are always emitted. If piping into less, use "-R" or set LESS=-R

	# Iain Murray, February 2009, Tweaked in June 2011

iamlemec / check_tokenizer.py

Created February 22, 2024 18:38

Compare tokenization results between `llama-cpp-python` and Huggingface `tokenizers`.

	def check_tokenizer(mod_ll, mod_hf, data, max_rows=None):
	from llama_cpp import Llama
	from transformers import AutoTokenizer
	from Levenshtein import editops
	from termcolor import cprint

	# load models
	if type(mod_ll) is str:
	mod_ll = Llama(mod_ll, verbose=False)
	if type(mod_hf) is str:

iamlemec / attention.py

Created June 8, 2024 21:40

Using KV cache with mixed causal/non-causal attention.

	import torch
	from transformers.models.roberta import RobertaConfig, RobertaModel, RobertaTokenizer

	# load model and tokenizer
	tokenizer = RobertaTokenizer.from_pretrained('FacebookAI/roberta-base')
	model = RobertaModel.from_pretrained('FacebookAI/roberta-base', is_decoder=True).to('cuda')

	# tokenize inputs
	text = 'hello world, this is a test'
	inputs = tokenizer(text, return_tensors='pt').to('cuda')

iamlemec / demean_jax_.py

Created January 13, 2025 19:18

Trying out some different JAX options for demeaning.

	from functools import partial

	import jax
	import jax.numpy as jnp
	import numpy as np
	from jax import config

	def _apply_factor(x, f, w, ng):
	"""Process a single factor."""
	wx = x * w[:, None]