Skip to content

Instantly share code, notes, and snippets.

View epwalsh's full-sized avatar

Pete Walsh epwalsh

  • Central Oregon
  • 22:57 (UTC -07:00)
  • X @epwalsh
View GitHub Profile
@epwalsh
epwalsh / ci.yml
Last active November 3, 2020 10:39
Python GitHub Actions with pip cache
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.7
- uses: actions/cache@v2
with:
@epwalsh
epwalsh / ci.yml
Last active September 24, 2020 18:43
Python GitHub Actions
name: PR
on:
pull_request:
branches:
- master
jobs:
checks:
name: Checks
runs-on: ubuntu-latest
steps:
@epwalsh
epwalsh / transformer_qa_multi_gpu.jsonnet
Last active August 19, 2020 21:46
Transformer QA multi GPU
local transformer_model = 'bert-base-cased';
local epochs = 3;
local batch_size = 8;
local train_path = "https://allennlp.s3.amazonaws.com/datasets/squad/squad-train-v1.1.json";
local dev_path = "https://allennlp.s3.amazonaws.com/datasets/squad/squad-dev-v1.1.json";
{
"dataset_reader": {
"type": "transformer_squad",
"transformer_model_name": transformer_model,
@epwalsh
epwalsh / transformer_qa_single_gpu.jsonnet
Last active August 19, 2020 21:46
Transformer QA single GPU
local transformer_model = 'bert-base-cased';
local epochs = 3;
local batch_size = 8;
local train_path = "https://allennlp.s3.amazonaws.com/datasets/squad/squad-train-v1.1.json";
local dev_path = "https://allennlp.s3.amazonaws.com/datasets/squad/squad-dev-v1.1.json";
{
"dataset_reader": {
"type": "transformer_squad",
"transformer_model_name": transformer_model,
@epwalsh
epwalsh / step2.py
Created August 13, 2020 18:19
How to upload transformer weights and tokenizers from AllenNLP models to HuggingFace's model hub: step 2
transformer_dir = "~/my-cool-transformer"
transformer_embedder.transformer_model.save_pretrained(transformer_dir)
tokenizer.tokenizer.save_pretrained(transformer_dir)
@epwalsh
epwalsh / step1.py
Last active August 14, 2020 16:43
How to upload transformer weights and tokenizers from AllenNLP models to HuggingFace's model hub: step 1
from allennlp.common.params import Params
from allennlp.common.plugins import import_plugins
from allennlp.data.tokenizers import Tokenizer, PretrainedTransformerTokenizer
from allennlp.models import load_archive
from allennlp.modules.token_embedders import PretrainedTransformerEmbedder
# Change this to your serialization directory.
serialization_dir = "~/my-trained-model"
# Make sure all of the classes our model and tokenizer use are registered.
@epwalsh
epwalsh / dataset_reader.py
Created July 1, 2020 18:25
Dataset Reader API
"""
Proposal for new DatasetReader API.
For this to work, all `Instance`s would have to be efficiently serializable.
So `TextField`s, for example, shouldn't contain `TokenIndexer`s.
The flow of data would look like this (boxes represent separate Python processes):
```
+-----------------------------------------------+
local transformer_model = 'bert-base-cased';
local epochs = 1;
local batch_size = 8;
{
"dataset_reader": {
"type": "transformer_squad",
"transformer_model_name": transformer_model,
"skip_invalid_examples": true,
@epwalsh
epwalsh / github-labels.sh
Created January 13, 2020 17:13
Create good default labels for a repository. Adapted from https://github.com/amatkivskiy/github-labels-creator.
#!/bin/bash
label_names=(
'Status: Changes Requested'
'Status: Do Not Merge'
'Status: Help Wanted'
'Status: In Progress'
'Status: Mergeable'
'Status: Review Needed'
'Type: Bug'
class CopyNetSeq2Seq(Model):
# snip...
def _get_ll_contrib(self,
generation_scores: torch.Tensor,
generation_scores_mask: torch.Tensor,
copy_scores: torch.Tensor,
target_tokens: torch.Tensor,
target_to_source: torch.Tensor,