This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# Step 1: put this file in your path and make executable | |
# Step 2: add the following to your .gitattributes file | |
# *.ipynb diff=nb2md | |
# Step 3: add the following to your .git/config | |
# [diff "nb2md"] | |
# textconv = nb2md | |
# or to it globally with | |
# git config --global diff.nb2md.textconv nb2md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@import url(//fonts.googleapis.com/css?family=Open+Sans:400,700,400italic,700italic); | |
body { | |
background-color: white; | |
font-family: 'Open Sans', sans-serif; | |
} | |
#content { | |
width: 700px; | |
margin-top: 50px; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# Use this instead of diff[1] to get colored[2] word-based diffs. | |
# Useful for text documents that have reflowed paragraphs. | |
# Requires that wdiff is installed in your $PATH. | |
# | |
# [1] All diff options are ignored. Only replaces simplest usage. | |
# [2] Colors are always emitted. If piping into less, use "-R" or set LESS=-R | |
# Iain Murray, February 2009, Tweaked in June 2011 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def check_tokenizer(mod_ll, mod_hf, data, max_rows=None): | |
from llama_cpp import Llama | |
from transformers import AutoTokenizer | |
from Levenshtein import editops | |
from termcolor import cprint | |
# load models | |
if type(mod_ll) is str: | |
mod_ll = Llama(mod_ll, verbose=False) | |
if type(mod_hf) is str: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
from transformers.models.roberta import RobertaConfig, RobertaModel, RobertaTokenizer | |
# load model and tokenizer | |
tokenizer = RobertaTokenizer.from_pretrained('FacebookAI/roberta-base') | |
model = RobertaModel.from_pretrained('FacebookAI/roberta-base', is_decoder=True).to('cuda') | |
# tokenize inputs | |
text = 'hello world, this is a test' | |
inputs = tokenizer(text, return_tensors='pt').to('cuda') |