Cpop cpoptic

#A Collection of NLP notes

##N-grams

###Calculating unigram probabilities:

P( w_i ) = count ( w_i ) ) / count ( total number of words )

In english..

What I did to get Python 3.4.2 on Ubuntu 14.04. The stock version of Python 3 on Ubuntu is 3.4.0. Which is missing some of the best parts! (asyncio, etc). Luckily I discovered pyenv which solved my problem.

Install pyenv

Pyenv (not to be confused with pyvenv) is the Python equivelant of rbenv. It lets you configure which Python environment/version is available per directory, user, or other session variables.

I followed the instructions here to install pyenv in my home directory. Verbatem, those instructions are:

sudo apt-get install git python-pip make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources

Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.

	#!/usr/bin/env fish
	# similar script in Fish
	# still under construction, need to quiet `git status` more effectively

	function update -d 'Update git repo'
	git stash --quiet
	git pull
	git stash apply --quiet
	end

	#!/usr/bin/env PYTHONIOENCODING=utf-8 python
	# encoding: utf-8
	"""Git pre-commit hook which lints Python, JavaScript, SASS and CSS"""

	from __future__ import absolute_import, print_function, unicode_literals

	import os
	import subprocess
	import sys

	"""
	The MIT License (MIT)

	Copyright (c) 2015 Alec Radford

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction, including without limitation the rights
	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	copies of the Software, and to permit persons to whom the Software is

	import random
	import math

	# Configure paths to your dataset files here
	DATASET_FILE = 'data.csv'
	FILE_TRAIN = 'train.csv'
	FILE_VALID = 'validation.csv'
	FILE_TESTS = 'test.csv'

	# Set to true if you want to copy first line from main

	"""
	Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
	BSD License
	"""
	import numpy as np

	# data I/O
	data = open('input.txt', 'r').read() # should be simple plain text file
	chars = list(set(data))
	data_size, vocab_size = len(data), len(chars)

	# usage: redfin-images "http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567"
	function redfin-images() {
	wget -O - $1 \| grep "full:" \| awk -F \" '{print $4}' \| xargs wget -
	}

	library(MASS) # Boston dataset
	library(neuralnet) # Neuralnet
	library(plyr) # Progress bar

	#-------------------------------------------------------------------------------
	# Cross validating function

	crossvalidate <- function(data,hidden_l=c(5))
	{
	# @params