Skip to content

Instantly share code, notes, and snippets.

View kylebgorman's full-sized avatar

Kyle Gorman kylebgorman

View GitHub Profile
@kylebgorman
kylebgorman / LING78100-lecture01.ipynb
Last active September 12, 2018 16:05
LING78100 Lecture 1
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kylebgorman
kylebgorman / anagram.py
Created July 8, 2018 13:31
Estimates the ambiguity of anagrammed English
#!/usr/bin/env python
import collections
import functools
import itertools
import re
# Grab this like so:
# curl -O http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b
@kylebgorman
kylebgorman / interpret.py
Created July 6, 2018 18:18
Prints human-readable summary of an FST channel model
#!/usr/bin/env python
"""Prints human-readable summary of an FST channel model."""
import math
import sys
import unicodedata
import pywrapfst
@kylebgorman
kylebgorman / spanish.tsv
Last active June 27, 2018 00:29
Spanish g2p covering grammar
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 1 column, instead of 2 in line 7.
## Spanish g2p covering grammar, adapted from:
##
## https://en.wikipedia.org/wiki/Spanish_orthography
## https://en.wikipedia.org/wiki/Spanish_phonology
##
## We don't encode any conditioning information here, though it's present in the
## articles.
b b
b β
c θ
@kylebgorman
kylebgorman / function_words.py
Created June 22, 2018 18:57
Function words
"""English function words.
Sets of English function words, based on
E.O. Selkirk. 1984. Phonology and syntax: The relationship between
sound and structure. Cambridge: MIT Press. (p. 352f.)
The categories are of my own creation.
"""
@kylebgorman
kylebgorman / normcheck.py
Created May 27, 2018 20:49
Checks encoding/normalization on a file
#!/usr/bin/env python
"""Applies a given normalization form to file and detects changes.
This script reads text files line by line, decoding them into Unicode using a
specified encoding (by default, UTF-8), and then applying a specified Unicode
normalization (by default, NFC). If, for any line this normalization is not
no-op (i.e., if it changes the line) it logs a fatal error with the filename and
affected line number.
@kylebgorman
kylebgorman / dsfst_template.py
Created March 18, 2018 17:40
DSFST template building in Pynini
"""Delimited subsequential finite-state transducer template."""
import pynini
EPSILON = 0
LEFT_DELIMITER = 2 # [STX].
RIGHT_DELIMITER = 3 # [ETX].
@kylebgorman
kylebgorman / EditTransducerTutorial.ipynb
Created January 25, 2018 19:58
Python string processing with edit transducers
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kylebgorman
kylebgorman / z408.py
Last active May 25, 2021 14:46
Zodiac cipher 408: freestanding Python 3 script for converting the plaintext and ciphertext to OpenFst assets
#!/usr/bin/env python
#
# Constructs resources for Zodiac cipher 408:
#
# * Plaintext and ciphertext FARs
# * Unweighted "key" FSTs and "channel" (hypothesis space) FSTs
# * A textual symbol table for plaintext and ciphertext
#
# Requires: Pynini and OpenFst with the FAR extension.
@kylebgorman
kylebgorman / rer.c
Last active July 13, 2021 12:29
Relative error reduction calculation
// Computes relative error reduction given two percentages.
//
// This computes relative error reduction (RER) given two percentages, the
// "before" and "after" accuracy.
//
// This is given by:
//
// RER = 1 - (1 - new_accuracy) / (1 - old_accuracy)
//
// To compile: gcc -O3 -std=c99 -o rer rer.c