Skip to content

Instantly share code, notes, and snippets.

View jonnyli1125's full-sized avatar

Jonny Li jonnyli1125

View GitHub Profile
@jonnyli1125
jonnyli1125 / preprocess_lang8.py
Last active October 20, 2021 02:04
Preprocessing script for Japanese entries in NAIST Lang8 Corpus
import os
import argparse
import json
import re
import unicodedata
invalid_bytes_re = re.compile(r'[\x00-\x1F]+')
sline_re = re.compile(r'\[sline\].*?\[/sline\]')
color_tags = ['[f-blue]','[/f-blue]',
@jonnyli1125
jonnyli1125 / .spark-cluster.md
Last active September 16, 2023 20:28
One node Hadoop, Spark, Livy cluster setup with Docker compose that uses the official images for apache/hadoop and apache/spark.

One-node Spark cluster

This is a Docker compose definition of a Hadoop + Spark + Livy cluster with one node only. You can use it for development or testing purposes.

Example usage:

docker compose build
docker compose up -d

docker exec -it  /bin/bash
@jonnyli1125
jonnyli1125 / rnn_transducer.py
Last active October 9, 2023 17:56
RNN Transducer in ~100 lines of NumPy code. Paper: https://arxiv.org/abs/1211.3711
from dataclasses import dataclass
import numpy as np
vocab = [None, 'a', 'b', 'c']
null_idx = 0
V = len(vocab)
@dataclass
class LSTMWeights:
xi: list[list[float]] # (V-1, V)
@jonnyli1125
jonnyli1125 / neural_network_from_scratch.ipynb
Last active July 22, 2024 00:39
neural_network_from_scratch.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jonnyli1125
jonnyli1125 / tictactoe.py
Last active November 30, 2024 06:31
Self-play Reinforcement Learning (Q-Learning) for Tic Tac Toe in 100 lines of code
import argparse
import random
from collections import defaultdict
import numpy as np
from tqdm import tqdm
BOARD_SIZE = 3