Motoki Wu tokestermw

Resisting the urge to finetune

tokestermw / birnnlm_pytorch.py

Last active May 30, 2020 08:29

Simple example of Bidirectional RNN Language Model in PyTorch. (blog post: https://medium.com/@plusepsilon/the-bidirectional-language-model-1f3961d1fb27)

	import torch, torch.nn as nn
	from torch.autograd import Variable

	text = ['BOS', 'How', 'are', 'you', 'EOS']
	seq_len = len(text)
	batch_size = 1
	embedding_size = 1
	hidden_size = 1
	output_size = 1

tokestermw / preprocess_nlc.py

Created December 19, 2017 20:40

Preprocess NLC data. https://github.com/stanfordmlgroup/nlc/issues/6

tokestermw / machine_learned_index.py

Last active April 9, 2019 08:41

Using deep learning to approximate a B-Tree index from this paper: https://arxiv.org/abs/1712.01208 (The Case for Learned Index Structures)

	import click

	import torch
	import torch.autograd
	import torch.nn.functional as F
	from torch.autograd import Variable

	import os
	import random
	import math

tokestermw / cantor_set.py

Created November 28, 2017 00:32

Implementation of Cantor set explained here: http://natureofcode.com/book/chapter-8-fractals/

tokestermw / beam_search.py

Created November 13, 2017 23:31

Simple attempt at beam search.

tokestermw / morphological_word_embeddings.py

Created September 5, 2017 06:05

# yo

tokestermw / self_attention.py

Last active March 3, 2025 11:36

Implementation of self-attention in the paper "Attention Is All You Need" in TensorFlow.

	"""Example TensorFlow code for Self-Attention mechanism.

	Refs:
	Attention Is All You Need
	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
	https://arxiv.org/abs/1706.03762

	Transformer: A Novel Neural Network Architecture for Language Understanding
	https://research.googleblog.com/2017/08/transformer-novel-neural-network.html

tokestermw / tf_dataset_api_text.py

Last active December 18, 2018 06:32

Using the new Dataset API from TensorFlow 1.2.0, return padded and batched tensors from text data where each line is a sentence.

	import numpy as np
	import tensorflow as tf

	_major_version, _minor_version, _ = map(int, tf.__version__.split('-')[0].split('.'))
	assert _major_version >= 1 and _minor_version >= 2, "requires TensorFlow 1.2.0 and above"

	text_data_path = "./z_sentences.txt"

	MAX_SEQUENCE_LENGTH = 10

tokestermw / pdftotext_w_japanese.sh

Last active May 7, 2018 09:26

Make pdftotext compatible with Japanese text on Mac OS.

	# -- set up repos
	brew install Caskroom/cask/xquartz

	# -- install xpdf
	brew install xpdf

	# -- download japanese package
	wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-japanese.tar.gz

	# -- open

tokestermw / tf_ed_vi_tutorial.py

Last active July 19, 2019 01:18

Variational inference and Bayesian deep learning tutorial (w/ uncertainty intervals) using TensorFlow and Edward.

	""" Some description.
	"""
	from __future__ import absolute_import
	from __future__ import division
	from __future__ import print_function

	import sys
	import json
	import tqdm