This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
from __future__ import division | |
import struct | |
import sys | |
FILE_NAME = "GoogleNews-vectors-negative300.bin" | |
MAX_VECTORS = 200000 # This script takes a lot of RAM (>2GB for 200K vectors), if you want to use the full 3M embeddings then you probably need to insert the vectors into some kind of database | |
FLOAT_SIZE = 4 # 32bit float |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Tiny example of 3-layer nerual network with dropout in 2nd hidden layer | |
# Output layer is linear with L2 cost (regression model) | |
# Hidden layer activation is tanh | |
import numpy as np | |
n_epochs = 100 | |
n_samples = 100 | |
n_in = 10 | |
n_hidden = 5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
# A little demo illustrating the effect of momentum in neural network training. | |
# Try using different values for MOMENTUM constant below (e.g. compare 0.0 with 0.9). | |
# This neural network is actually more like logistic regression, but I have used | |
# squared error to make the error surface more interesting. | |
import numpy as np | |
import pylab |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import division | |
import numpy as np | |
import matplotlib.pyplot as plt | |
np.random.seed(1) | |
# Replace X with your own data if you wish | |
########################################## | |
N = 300 # num samples |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
from __future__ import division | |
import struct | |
import sys | |
import gzip | |
FILE_NAME = "GoogleNews-vectors-negative300.bin.gz" # outputs GoogleNews-vectors-negative300.bin.gz.txt | |
MAX_VECTORS = 100000 # Top words to take | |
FLOAT_SIZE = 4 # 32bit float |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
from __future__ import division | |
from nltk.tokenize import word_tokenize | |
import models | |
import data | |
import theano | |
import tornado.ioloop |