quadrismegistus / gensim_word2vec_procrustes_align.py

Last active November 16, 2023 01:57

Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…

	def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
	"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
	Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
	(With help from William. Thank you!)

	First, intersect the vocabularies (see `intersection_align_gensim` documentation).
	Then do the alignment on the other_embed model.
	Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
	Return other_embed.

mbollmann / state_transfer_lstm.py

Created June 18, 2016 08:59

StateTransferLSTM for Keras 1.x

	# Source:
	# https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/layers/state_transfer_lstm.py

	from keras import backend as K
	from keras.layers.recurrent import LSTM

	class StateTransferLSTM(LSTM):
	"""LSTM with the ability to transfer its hidden state.

	This layer behaves just like an LSTM, except that it can transfer (or

basaundi / multi_bleu.py

Last active September 20, 2020 07:28

python rewrite of Moses' multi-bleu.perl; usable as a library

	#!/usr/bin/env python
	# Ander Martinez Sanchez

	from __future__ import division, print_function
	from math import exp, log
	from collections import Counter


	def ngram_count(words, n):
	if n <= len(words):

MajorTal / add_spelling_noise.py

Created March 24, 2016 08:32

	from numpy.random import choice as random_choice, randint as random_randint, rand
	MAX_INPUT_LEN = 40
	AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN
	CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")

	def add_noise_to_string(a_string, amount_of_noise):
	"""Add some artificial spelling mistakes to the string"""
	if rand() < amount_of_noise * len(a_string):
	# Replace a character with a random character
	random_char_position = random_randint(len(a_string))

odashi / mert.py

Last active May 1, 2016 14:17

Minimum error-rate training for statistical machine translation

	#!/usr/bin/python3

	import math
	import random
	import sys
	from argparse import ArgumentParser
	from collections import defaultdict
	from util.functions import trace

	def parse_args():

odashi / bleu.py

Last active September 20, 2019 06:46

BLEU calculator

	# usage (single sentence):
	# ref = ['This', 'is', 'a', 'pen', '.']
	# hyp = ['There', 'is', 'a', 'pen', '.']
	# stats = get_bleu_stats(ref, hyp)
	# bleu = calculate_bleu(stats) # => 0.668740
	#
	# usage (multiple sentences):
	# stats = defaultdict(int)
	# for ref, hyp in zip(refs, hyps):
	# for k, v in get_bleu_stats(ref, hyp).items():

entron / imdb_cnn_kim_small_embedding.py

Last active September 16, 2023 16:23

Keras implementation of Kim's paper "Convolutional Neural Networks for Sentence Classification" with a very small embedding size. The test accuracy is 0.853.

	'''This scripts implements Kim's paper "Convolutional Neural Networks for Sentence Classification"
	with a very small embedding size (20) than the commonly used values (100 - 300) as it gives better
	result with much less parameters.

	Run on GPU: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python imdb_cnn.py

	Get to 0.853 test accuracy after 5 epochs. 13s/epoch on Nvidia GTX980 GPU.
	'''

	from __future__ import print_function

odashi / chainer_encoder_decoder.py

Last active January 2, 2025 19:25

Training and generation processes for neural encoder-decoder machine translation.

	#!/usr/bin/python3

	import datetime
	import sys
	import math
	import numpy as np
	from argparse import ArgumentParser
	from collections import defaultdict

	from chainer import FunctionSet, Variable, functions, optimizers

kachayev / concurrency-in-go.md

Last active May 4, 2025 05:48

Channels Are Not Enough or Why Pipelining Is Not That Easy

Channels Are Not Enough

... or Why Pipelining Is Not That Easy

Golang Concurrency Patterns for brave and smart.

By @kachayev

Intro

syllog1sm / gist:10343947

Last active September 19, 2024 23:54

A simple Python dependency parser

	"""A simple implementation of a greedy transition-based parser. Released under BSD license."""
	from os import path
	import os
	import sys
	from collections import defaultdict
	import random
	import time
	import pickle

	SHIFT = 0; RIGHT = 1; LEFT = 2;

jim geovedi geovedi

Channels Are Not Enough

Intro