Felflare’s gists

Felflare / sentence_similarity_mult.ipynb

Created May 26, 2020 20:38

This Snippet of code demonstates cross-language sentence embeddings system used for similarity search & match beating LASER embeddings [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/pdf/2004.09813.pdf) by Nils Reimers and Iryna Gurevych.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Felflare / sentence_similarity.ipynb

Last active August 11, 2022 15:41

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Felflare / Bert Abstractive summarization

Last active January 31, 2021 07:16

This Snippet of code incorporates [Text Summarization with Pretrained Encoders](https://arxiv.org/pdf/1908.08345.pdf) by Yang Liu and Mirella Lapata.

	# Pull and install Huggingface Transformers Repo
	git clone https://github.com/huggingface/transformers && cd transformers
	pip install .
	pip install nltk py-rouge
	cd examples/summarization

	#------------------------------
	# Download original Summarization Datasets. The code downloads from Google drive on Linux
	wget --save-cookies cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ' -O- \| sed -rn 's/.confirm=([0-9A-Za-z_]+)./Code: \1\n/p'
	wget --load-cookies cookies.txt --no-check-certificate 'https://drive.google.com/uc?export=download&confirm=<CONFIRMATION CODE HERE>&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ' -O cnn_stories.tgz

Felflare / XLNet_span_selection_squad

Created February 10, 2020 03:03

XLNet Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`). Simple demo of loss and logits.

	from transformers import XLNetTokenizer, XLNetForQuestionAnsweringSimple
	import torch
	tokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')
	model = XLNetForQuestionAnsweringSimple.from_pretrained('xlnet-base-cased')
	input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
	print(f'Encoded sequence ids -- {input_ids.tolist()[0]}')
	# Encoded sequence ids -- [17, 11368, 19, 94, 2288, 27, 10920, 4, 3]

	start_positions = torch.tensor([1])
	end_positions = torch.tensor([3])

Felflare / XLNet_sequence_classification

Created February 10, 2020 02:46

Demonstration of XLNet with Classification head on top, implementation of XLNet follows huggingface's pytorch build.

	from transformers import XLNetTokenizer, XLNetForSequenceClassification
	import torch
	tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
	model = XLNetForSequenceClassification.from_pretrained('xlnet-large-cased')
	input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
	labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
	outputs = model(input_ids, labels=labels)
	loss, logits = outputs[:2]
	print(f'Current Loss at -- {loss.tolist()}')
	# Current Loss at -- 1.1906177997589111

Felflare / XLNet_generate_text

Created February 10, 2020 02:37

Sample function to generate text from XLNet model implemented by huggingface.

	from transformers import XLNetTokenizer, XLNetLMHeadModel
	import torch
	import torch.nn.functional as F
	tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
	model = XLNetLMHeadModel.from_pretrained('xlnet-large-cased')

	# We show how to setup inputs to predict a next token using a bi-directional context.
	encoded_text = tokenizer.encode("Quick brown fox jumped over the lazy <mask>.", add_special_tokens=True)
	input_ids = torch.tensor(encoded_text).unsqueeze(0) # We will predict the masked token
	print(f'Input squence -- {encoded_text}')

Felflare / textract_async_analyze.py

Created October 25, 2019 04:28

	#Asyncrhonously processes text in a document stored in an S3 bucket. For set up information, see https://docs.aws.amazon.com/textract/latest/dg/async.html

	import boto3
	import json
	import sys
	import time

	class ProcessType:
	DETECTION = 1
	ANALYSIS = 2

... Felflare