Skip to content

Instantly share code, notes, and snippets.

#Asyncrhonously processes text in a document stored in an S3 bucket. For set up information, see https://docs.aws.amazon.com/textract/latest/dg/async.html
import boto3
import json
import sys
import time
class ProcessType:
DETECTION = 1
ANALYSIS = 2
@Felflare
Felflare / XLNet_generate_text
Created February 10, 2020 02:37
Sample function to generate text from XLNet model implemented by huggingface.
from transformers import XLNetTokenizer, XLNetLMHeadModel
import torch
import torch.nn.functional as F
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
model = XLNetLMHeadModel.from_pretrained('xlnet-large-cased')
# We show how to setup inputs to predict a next token using a bi-directional context.
encoded_text = tokenizer.encode("Quick brown fox jumped over the lazy <mask>.", add_special_tokens=True)
input_ids = torch.tensor(encoded_text).unsqueeze(0) # We will predict the masked token
print(f'Input squence -- {encoded_text}')
@Felflare
Felflare / XLNet_sequence_classification
Created February 10, 2020 02:46
Demonstration of XLNet with Classification head on top, implementation of XLNet follows huggingface's pytorch build.
from transformers import XLNetTokenizer, XLNetForSequenceClassification
import torch
tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
model = XLNetForSequenceClassification.from_pretrained('xlnet-large-cased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, logits = outputs[:2]
print(f'Current Loss at -- {loss.tolist()}')
# Current Loss at -- 1.1906177997589111
@Felflare
Felflare / XLNet_span_selection_squad
Created February 10, 2020 03:03
XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`). Simple demo of loss and logits.
from transformers import XLNetTokenizer, XLNetForQuestionAnsweringSimple
import torch
tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
model = XLNetForQuestionAnsweringSimple.from_pretrained('xlnet-base-cased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
print(f'Encoded sequence ids -- {input_ids.tolist()[0]}')
# Encoded sequence ids -- [17, 11368, 19, 94, 2288, 27, 10920, 4, 3]
start_positions = torch.tensor([1])
end_positions = torch.tensor([3])
@Felflare
Felflare / Bert Abstractive summarization
Last active January 31, 2021 07:16
This Snippet of code incorporates [Text Summarization with Pretrained Encoders](https://arxiv.org/pdf/1908.08345.pdf) by Yang Liu and Mirella Lapata.
# Pull and install Huggingface Transformers Repo
git clone https://github.com/huggingface/transformers && cd transformers
pip install .
pip install nltk py-rouge
cd examples/summarization
#------------------------------
# Download original Summarization Datasets. The code downloads from Google drive on Linux
wget --save-cookies cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/Code: \1\n/p'
wget --load-cookies cookies.txt --no-check-certificate 'https://drive.google.com/uc?export=download&confirm=<CONFIRMATION CODE HERE>&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ' -O cnn_stories.tgz
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Felflare
Felflare / sentence_similarity_mult.ipynb
Created May 26, 2020 20:38
This Snippet of code demonstates cross-language sentence embeddings system used for similarity search & match beating LASER embeddings [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/pdf/2004.09813.pdf) by Nils Reimers and Iryna Gurevych.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.