Skip to content

Instantly share code, notes, and snippets.

View tokestermw's full-sized avatar

Motoki Wu tokestermw

View GitHub Profile
import random
def process_line(line):
columns = line.split('\t')
if len(columns) < 6:
return None
n_corrections = columns[0]
serial_number = columns[1]
url = columns[2]
@tokestermw
tokestermw / machine_learned_index.py
Last active April 9, 2019 08:41
Using deep learning to approximate a B-Tree index from this paper: https://arxiv.org/abs/1712.01208 (The Case for Learned Index Structures)
import click
import torch
import torch.autograd
import torch.nn.functional as F
from torch.autograd import Variable
import os
import random
import math
@tokestermw
tokestermw / cantor_set.py
Created November 28, 2017 00:32
Implementation of Cantor set explained here: http://natureofcode.com/book/chapter-8-fractals/
from copy import deepcopy
class Line:
def __init__(self, length: int, x: int):
self.length = length
self.x = x
def __len__(self):
return self.length
@tokestermw
tokestermw / beam_search.py
Created November 13, 2017 23:31
Simple attempt at beam search.
import numpy as np
import heapq
VOCAB_SIZE = 1000
HIDDEN_DIM = 128
vocab = {
'the': 5,
'fox': 35,
'jumped': 144,
@tokestermw
tokestermw / self_attention.py
Last active March 3, 2025 11:36
Implementation of self-attention in the paper "Attention Is All You Need" in TensorFlow.
"""Example TensorFlow code for Self-Attention mechanism.
Refs:
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
https://arxiv.org/abs/1706.03762
Transformer: A Novel Neural Network Architecture for Language Understanding
https://research.googleblog.com/2017/08/transformer-novel-neural-network.html
@tokestermw
tokestermw / tf_dataset_api_text.py
Last active December 18, 2018 06:32
Using the new Dataset API from TensorFlow 1.2.0, return padded and batched tensors from text data where each line is a sentence.
import numpy as np
import tensorflow as tf
_major_version, _minor_version, _ = map(int, tf.__version__.split('-')[0].split('.'))
assert _major_version >= 1 and _minor_version >= 2, "requires TensorFlow 1.2.0 and above"
text_data_path = "./z_sentences.txt"
MAX_SEQUENCE_LENGTH = 10
@tokestermw
tokestermw / pdftotext_w_japanese.sh
Last active May 7, 2018 09:26
Make pdftotext compatible with Japanese text on Mac OS.
# -- set up repos
brew install Caskroom/cask/xquartz
# -- install xpdf
brew install xpdf
# -- download japanese package
wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-japanese.tar.gz
# -- open
@tokestermw
tokestermw / tf_ed_vi_tutorial.py
Last active July 19, 2019 01:18
Variational inference and Bayesian deep learning tutorial (w/ uncertainty intervals) using TensorFlow and Edward.
""" Some description.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import json
import tqdm
@tokestermw
tokestermw / restore_tf_models.py
Created February 21, 2017 21:09
Restoring frozen models are hard in TensorFlow.
"""
Play with saving .
Closest:
https://github.com/tensorflow/tensorflow/issues/616#issuecomment-205620223
"""
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile