Skip to content

Instantly share code, notes, and snippets.

View sam-writer's full-sized avatar

Sam Havens sam-writer

View GitHub Profile
@sam-writer
sam-writer / spacy_match_replace_schema.json
Created June 12, 2019 21:18
Spacy Match and Replace Schema
{
"title": "Schema for validation Qordoba Spacy Match/Replace format",
"type": "object",
"definitions": {
"spacyMatch": {
"type": "array",
"items": {
"$ref": "#/definitions/spacyAttribute"
},
"minItems": 1
@sam-writer
sam-writer / fitbert-intro.py
Last active January 30, 2020 16:32
announcing-fitbert
from fitbert import FitBert
# currently supported models: bert-large-uncased and distilbert-base-uncased
# this takes a while and loads a whole big BERT into memory
fb = FitBert()
masked_string = "Why Bert, you're looking ***mask*** today!"
options = ['buff', 'handsome', 'strong']
@sam-writer
sam-writer / fitb_mask.py
Created October 3, 2019 00:40
fitbert-fitb_mask.py
unmasked_string = "Why Bert, you're looks handsome today!"
span_to_mask = (17, 22)
filled_in = fb.mask_fitb(unmasked_string, span_to_mask)
# >>> "Why Bert, you're looking handsome today!"
@sam-writer
sam-writer / fitbert-burritos.py
Created October 3, 2019 01:17
fitbert-adjectives
masked_string = "Your 17 ***mask*** burritos are on their way !"
options = ['hot', 'cold', 'sweet', 'delicious', 'artisanal']
fb.fitb(masked_string, options=options)
# >>> 'Your 17 delicious burritos are on their way !'
@sam-writer
sam-writer / fitbert-syntax1.py
Last active October 3, 2019 01:39
fitbert-syntax
# example from "Targeted Syntactic Evaluation of Language Models"
# https://arxiv.org/abs/1808.09031
masked_string = "the author that the guard likes ***mask***"
options = ['laugh', 'laughs']
fb.rank_with_prob(masked_string, options)
# >>> (['laughs', 'laugh'], [4.14195717654553e-12, 3.3748110100755013e-13])
@sam-writer
sam-writer / mkpoetryproj.sh
Created May 14, 2020 22:29
Make Poetry and VSCode play nicely
mkpoetryproj ()
{
if [ $# -eq 1 ]; then
poetry new "$1"
cd "$1" || exit
# get gitignore
curl https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore -o .gitignore
{
echo ""
echo ".vscode/"
@sam-writer
sam-writer / ucp_to_u16.py
Created May 14, 2021 22:36
Converting Python character indices to UTF-16 indices
import itertools
from typing import Tuple
def ucp_to_utf16_charmap(s: str):
"""
mostly copied from
https://stackoverflow.com/questions/56280011/keeping-java-string-offsets-with-unicode-consistent-in-python
converts from python indices (unicode code points) to indices
@sam-writer
sam-writer / t5_encoder_classifier.py
Created January 26, 2022 20:35
Use T5 Encoder for Sequence Classification with small linear head
import torch
from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from transformers.modeling_outputs import SequenceClassifierOutput
class T5EncoderClassificationHead(nn.Module):
"""Head for sentence-level classification tasks."""
def __init__(self, config):