Skip to content

Instantly share code, notes, and snippets.

View yuchenlin's full-sized avatar
:octocat:

(Bill) Yuchen Lin yuchenlin

:octocat:
View GitHub Profile
@yoavg
yoavg / stochastic-critique.md
Last active November 9, 2023 04:32
A criticism of Stochastic Parrots

A criticism of "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big"

Yoav Goldberg, Jan 23, 2021.

The FAccT paper "On the Dangers of Stochastic Parrots: Can Languae Models be Too Big" by Bender, Gebru, McMillan-Major and Shmitchell has been the center of a controversary recently. The final version is now out, and, owing a lot to this controversary, would undoubtly become very widely read. I read an earlier draft of the paper, and I think that the new and updated final version is much improved in many ways: kudos for the authors for this upgrade. I also agree with and endorse most of the content. This is important stuff, you should read it.

However, I do find some aspects of the paper (and the resulting discourse around it and around technology) to be problematic. These weren't clear to me when initially reading the first draft several months ago, but they became very clear to me now. These points are for the most part

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
from torch.nn import CrossEntropyLoss
from tqdm import trange
max_length = 24
batch_size = 200
@yuchenlin
yuchenlin / gpt_sent_prob.py
Last active June 14, 2024 17:57
Compute sentence probability using GPT-2 with huggingface transformers
import torch
from transformers import OpenAIGPTTokenizer, OpenAIGPTLMHeadModel
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import numpy as np
from scipy.special import softmax
def model_init(model_string, cuda):
if model_string.startswith("gpt2"):
tokenizer = GPT2Tokenizer.from_pretrained(model_string)
model = GPT2LMHeadModel.from_pretrained(model_string)
@yuchenlin
yuchenlin / masked_word_prediction_bert.py
Last active November 5, 2024 14:23
A simple example script for predicting masked words in a sentence using BERT.
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM
import logging
logging.basicConfig(level=logging.INFO)# OPTIONAL
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
@chuanconggao
chuanconggao / prefixspan.py
Last active January 22, 2024 06:00
The original minimal 15 lines implementation of PrefixSpan. Full library at https://github.com/chuanconggao/PrefixSpan-py.
from collections import defaultdict
def frequent_rec(patt, mdb):
results.append((len(mdb), patt))
occurs = defaultdict(list)
for (i, startpos) in mdb:
seq = db[i]
for j in range(startpos + 1, len(seq)):
l = occurs[seq[j]]
@peterjc123
peterjc123 / build.ps1
Last active November 12, 2018 16:29
Setup script for Windows PyTorch
# Prerequisites
# 1. MSVC 2017 C++ Build Tools
# 2. CMAKE 3.0 or up
# 3. 64 bits of Windows
# 4. Anaconda / MiniConda 64 bits
# Prerequisites for CUDA
# 1. CUDA 8.0 or up
# 2. NVTX( in CUDA as Visual Studio Integration. if fail to install, you can extract
# the CUDA installer exe and found the NVTX installer under the CUDAVisualStudioIntegration)
@abhishekcs10
abhishekcs10 / install-gcc-5.4.0.sh
Last active September 11, 2024 02:17 — forked from jtilly/install-gcc-4.9.3.sh
Install GCC 5.4.0
#!/bin/bash
# this script installs GCC 5.4.0
# to use it navigate to your home directory and type:
# sh install-gcc-5.4.0.sh
# download and install gcc 4.9.3
wget https://github.com/gcc-mirror/gcc/archive/gcc-5_4_0-release.tar.gz
tar xzf gcc-5_4_0-release.tar.gz
cd gcc-5_4_0-release
@Tushar-N
Tushar-N / pad_packed_demo.py
Last active October 27, 2024 15:17
How to use pad_packed_sequence in pytorch<1.1.0
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
@WeiTang114
WeiTang114 / nvv.sh
Created March 13, 2017 06:43
Show username after each process in nvidia-smi.
#!/bin/bash
# Show username after each process in nvidia-smi
# like:
# ...
# +------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |======================================================|
# | 0 150752 C python 830MiB | User: user1
# | 1 2185 C /usr/bin/python 1090MiB | User: user2
@thousandlemons
thousandlemons / how-to-setup-shadowsocks-on-your-ubuntu-server.md
Last active November 20, 2021 00:14
How to setup Shadowsocks on your Ubuntu server