Skip to content

Instantly share code, notes, and snippets.

View loretoparisi's full-sized avatar
🐍
NightShift

Loreto Parisi loretoparisi

🐍
NightShift
View GitHub Profile
#!/bin/bash
set -e
CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ "$1" stdout -l eng | xml esc)
hex=$((cat <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
inp = Input(shape=(maxlen,), name="text_input") # featureized text comes in here
x = Embedding(embedding_matrix.shape[0], embed_size, weights=[embedding_matrix], trainable=True)(inp)
x = Dense(some_num_here, activation="relu")(x)
extra_data = Input(shape=(1,), name="extra_data") # your continous features comes in here
combined = concatenate([x, extra_data])
# maybe some ReLu + Dropout here
@loretoparisi
loretoparisi / gensim_word2vec_procrustes_align.py
Created January 3, 2018 20:36 — forked from quadrismegistus/gensim_word2vec_procrustes_align.py
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
@loretoparisi
loretoparisi / 00.howto_install_phantomjs.md
Created April 7, 2016 12:39 — forked from julionc/00.howto_install_phantomjs.md
How to install PhantomJS on Debian/Ubuntu

How to install PhantomJS on Ubuntu

Version: 1.9.8

Platform: x86_64

First, install or update to the latest system software.

sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@loretoparisi
loretoparisi / example.js
Created April 6, 2016 10:15 — forked from domenic/example.js
Promise chaining example
// `promise` is some operation that may succeed (fulfill) or fail (reject)
var newPromise = promise.then(
function () {
return delay(1000);
},
writeError
);
// If `promise` fulfills, `newPromise` will fulfill in 1000 ms.
// If `promise` rejects and writing to the error log succeeds,
#!/bin/bash
PHANTOM_JS="phantomjs-2.1.1-linux-x86_64"
if [[ $EUID -ne 0 ]]; then
echo "You must be a root user" 2>&1
exit 1
else
apt-get update
apt-get install -y build-essential chrpath libssl-dev libxft-dev
apt-get install -y libfreetype6 libfreetype6-dev
apt-get install -y libfontconfig1 libfontconfig1-dev
@loretoparisi
loretoparisi / README
Created March 6, 2016 23:11
netflix-prize
SUMMARY
================================================================================
This dataset was constructed to support participants in the Netflix Prize. See
http://www.netflixprize.com for details about the prize.
The movie rating files contain over 100 million ratings from 480 thousand
randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The
data were collected between October, 1998 and December, 2005 and reflect the
distribution of all ratings received during this period. The ratings are on a
@loretoparisi
loretoparisi / README
Created March 6, 2016 23:11
netflix-prize
SUMMARY
================================================================================
This dataset was constructed to support participants in the Netflix Prize. See
http://www.netflixprize.com for details about the prize.
The movie rating files contain over 100 million ratings from 480 thousand
randomly-chosen, anonymous Netflix customers over 17 thousand movie titles. The
data were collected between October, 1998 and December, 2005 and reflect the
distribution of all ratings received during this period. The ratings are on a