#A Collection of NLP notes
##N-grams
###Calculating unigram probabilities:
P( wi ) = count ( wi ) ) / count ( total number of words )
In english..
# usage: redfin-images "http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567" | |
function redfin-images() { | |
wget -O - $1 | grep "full:" | awk -F \" '{print $4}' | xargs wget - | |
} |
#A Collection of NLP notes
##N-grams
###Calculating unigram probabilities:
P( wi ) = count ( wi ) ) / count ( total number of words )
In english..
#!/usr/bin/env fish | |
# similar script in Fish | |
# still under construction, need to quiet `git status` more effectively | |
function update -d 'Update git repo' | |
git stash --quiet | |
git pull | |
git stash apply --quiet | |
end |
#!/usr/bin/env PYTHONIOENCODING=utf-8 python | |
# encoding: utf-8 | |
"""Git pre-commit hook which lints Python, JavaScript, SASS and CSS""" | |
from __future__ import absolute_import, print_function, unicode_literals | |
import os | |
import subprocess | |
import sys |
What I did to get Python 3.4.2 on Ubuntu 14.04. The stock version of Python 3 on Ubuntu is 3.4.0. Which is missing some of the best parts! (asyncio, etc). Luckily I discovered pyenv which solved my problem.
Pyenv (not to be confused with pyvenv) is the Python equivelant of rbenv. It lets you configure which Python environment/version is available per directory, user, or other session variables.
I followed the instructions here to install pyenv in my home directory. Verbatem, those instructions are:
sudo apt-get install git python-pip make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev
""" | |
The MIT License (MIT) | |
Copyright (c) 2015 Alec Radford | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is |
import random | |
import math | |
# Configure paths to your dataset files here | |
DATASET_FILE = 'data.csv' | |
FILE_TRAIN = 'train.csv' | |
FILE_VALID = 'validation.csv' | |
FILE_TESTS = 'test.csv' | |
# Set to true if you want to copy first line from main |
""" | |
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy) | |
BSD License | |
""" | |
import numpy as np | |
# data I/O | |
data = open('input.txt', 'r').read() # should be simple plain text file | |
chars = list(set(data)) | |
data_size, vocab_size = len(data), len(chars) |
This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.
Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject
Mapper
algorithm.