Skip to content

Instantly share code, notes, and snippets.

View astariul's full-sized avatar
🧠
100% powered by 3 braincells

Astariul astariul

🧠
100% powered by 3 braincells
View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@astariul
astariul / pseudocode
Created July 13, 2022 00:09
Part 2 of the Fleksy NLP challenge
// First, define some constants
VOCAB = ["_", " ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "à", "ç", "è", "é", "í", "ñ", "ò", "ó", "ú"]
// Then, define our helper functions
function greedy_decoding(nn_output):
// This function use a greedy approach to decode a word from the output of a neural network trained with CTC.
// Here, nn_output is the output of the neural network.
// In the context of this challenge, it's the content of the CSV file.
// I assume it's a matrix of size [n, v], where n is the number of character positions (40 in this example) and v is the size of the vocabulary (37 in this example)
@astariul
astariul / README.md
Last active January 4, 2024 19:25
Solution for Facebook coding problem "Missing Mail"

Missing Mail

You are the manager of a mail room which is frequently subject to theft. A period of N days is about to occur, such that on the i-th day, the following sequence of events will occur in order:

  • A package with a value of Vi dollars will get delivered to the mail room (unless Vi = 0, in which case no package will get delivered).
  • You can choose to pay C dollars to enter the mail room and collect all of the packages there (removing them from the room), and then leave the room.
  • With probability S, all packages currently in the mail room will get stolen (and therefore removed from the room).

Note that you're aware of the delivery schedule V1..N, but can only observe the state of the mail room when you choose to enter it, meaning that you won't immediately be aware of whether or not packages were stolen at the end of any given day.

import time
import torch
import argparse
from tqdm import tqdm
from transformers import BartForConditionalGeneration, BartTokenizer, pipeline
HF_MODEL = "HuggingFace"
PIPELINE_MODEL = "Pipeline"
@astariul
astariul / benchmark.py
Created March 6, 2020 02:06
Quick benchmark to see the performance of BART between the FairSeq implementation and the HuggingFace implementation
import time
import torch
import argparse
from tqdm import tqdm
from transformers import BartForConditionalGeneration, BartTokenizer
FS_MODEL = "FairSeq"
HF_MODEL = "HuggingFace"
"""Official evaluation script for SQuAD version 2.0.
In addition to basic functionality, we also compute additional statistics and
plot precision-recall curves if an additional na_prob.json file is provided.
This file is expected to map question ID's to the model's predicted probability
that a question is unanswerable.
"""
import argparse
import collections
import json
import json
# Read entire file
posts = []
i = 1
j = 0
with open('tifu_all_tokenized_and_filtered.json', 'r') as fp:
for line in fp:
print("{} / 79,949".format(i))
i += 1
def _beam_search(self, batch, beam_width, max_len):
""" Beam search for predicting a sentence."""
batch_size = batch['input_t'].size(1)
with torch.no_grad():
encoder_hidden, encoder_final = self.model.encode(
batch['input_t'].transpose(0, 1),
batch['input_mask'].transpose(0, 1),
batch['input_len'])
prev_y = torch.ones(batch_size, 1).fill_(START_TOKEN_ID).type_as(
def _process_tensors(self, data):
# Truncate it to padding len
article_seq = [d.article_seq[:self.enc_max_len] for d in data]
abstract_seq = [d.abstract_seq[:self.dec_max_len - 2] for d in data]
# -2 is for [START] and [STOP]
# Add [START] and [STOP] to the target abstract
for s in abstract_seq:
s.insert(0, START_TOKEN_ID)
s.append(STOP_TOKEN_ID)