- word2vec https://arxiv.org/abs/1310.4546
- sentence2vec, paragraph2vec, doc2vec http://arxiv.org/abs/1405.4053
- tweet2vec http://arxiv.org/abs/1605.03481
- tweet2vec https://arxiv.org/abs/1607.07514
- author2vec http://dl.acm.org/citation.cfm?id=2889382
- item2vec http://arxiv.org/abs/1603.04259
- lda2vec https://arxiv.org/abs/1605.02019
- illustration2vec http://dl.acm.org/citation.cfm?id=2820907
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -e | |
CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ "$1" stdout -l eng | xml esc) | |
hex=$((cat <<EOF | |
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> | |
<plist version="1.0"> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
inp = Input(shape=(maxlen,), name="text_input") # featureized text comes in here | |
x = Embedding(embedding_matrix.shape[0], embed_size, weights=[embedding_matrix], trainable=True)(inp) | |
x = Dense(some_num_here, activation="relu")(x) | |
extra_data = Input(shape=(1,), name="extra_data") # your continous features comes in here | |
combined = concatenate([x, extra_data]) | |
# maybe some ReLu + Dropout here |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def smart_procrustes_align_gensim(base_embed, other_embed, words=None): | |
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models). | |
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>. | |
(With help from William. Thank you!) | |
First, intersect the vocabularies (see `intersection_align_gensim` documentation). | |
Then do the alignment on the other_embed model. | |
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version. | |
Return other_embed. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// `promise` is some operation that may succeed (fulfill) or fail (reject) | |
var newPromise = promise.then( | |
function () { | |
return delay(1000); | |
}, | |
writeError | |
); | |
// If `promise` fulfills, `newPromise` will fulfill in 1000 ms. | |
// If `promise` rejects and writing to the error log succeeds, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
PHANTOM_JS="phantomjs-2.1.1-linux-x86_64" | |
if [[ $EUID -ne 0 ]]; then | |
echo "You must be a root user" 2>&1 | |
exit 1 | |
else | |
apt-get update | |
apt-get install -y build-essential chrpath libssl-dev libxft-dev | |
apt-get install -y libfreetype6 libfreetype6-dev | |
apt-get install -y libfontconfig1 libfontconfig1-dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SUMMARY | |
================================================================================ | |
This dataset was constructed to support participants in the Netflix Prize. See | |
http://www.netflixprize.com for details about the prize. | |
The movie rating files contain over 100 million ratings from 480 thousand | |
randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The | |
data were collected between October, 1998 and December, 2005 and reflect the | |
distribution of all ratings received during this period. The ratings are on a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SUMMARY | |
================================================================================ | |
This dataset was constructed to support participants in the Netflix Prize. See | |
http://www.netflixprize.com for details about the prize. | |
The movie rating files contain over 100 million ratings from 480 thousand | |
randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The | |
data were collected between October, 1998 and December, 2005 and reflect the | |
distribution of all ratings received during this period. The ratings are on a |