Skip to content

Instantly share code, notes, and snippets.

@sidgan
sidgan / gist:2a61575c6027262f80e4036696dc1cea
Last active January 29, 2018 04:19 — forked from ttezel/gist:4138642
Natural Language Processing Notes

#A Collection of NLP notes

##N-grams

###Calculating unigram probabilities:

P( wi ) = count ( wi ) ) / count ( total number of words )

In english..

@sidgan
sidgan / pascalvoc.py
Created November 2, 2017 22:46 — forked from kastnerkyle/pascalvoc.py
Wrapper to read pascal voc data
# Author: Kyle Kastner # License: BSD 3-Clause # For a reference on parallel processing in Python see tutorial by David Beazley # http://www.slideshare.net/dabeaz/an-introduction-to-python-concurrency # Loosely based on IBM example # http://www.ibm.com/developerworks/aix/library/au-threadingpython/ # If you want to download all the PASCAL VOC data, use the following in bash... """ #! /bin/bash # 2008 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2008/VOCtrainval_14-Jul-2008.tar # 2009 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2009/VOCtrainval_11-May-2009.tar # 2010 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar # 2011 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2011/VOCtrainval_25-May-2011.tar # 2012 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar # Latest devkit wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCdevkit_18-May-2011.tar """ try: import Queue except ImportError: import queue as Queue import threading import ti
@sidgan
sidgan / gist:75cb2a53b065549e1cf3c1b42988033c
Created October 8, 2016 16:59 — forked from rygorous/gist:9124356
On "Understanding Sources of Inefficiency in General-Purpose Chips"
My problems with the paper:
- There is no comparison of resulting video quality. The amount of encode time (and power
expended) to produce a H.264 bit stream *dramatically* depends on the desired quality level;
e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the
difference between the fastest and best quality settings is close to 2 orders of magnitude
in both speed and power use. This is not negligible!
[NOTE: This is excluding quality-presets like "placebo", which are more demanding still.
Even just comparing between different settings usable for real-time encoding, we still have
at least an order of magnitude difference.]
- They have their encoder, which is apparently based on JM 8.6 (*not* a good encoder!), for
@sidgan
sidgan / KeyValueMemNN.md
Created July 5, 2016 21:52 — forked from shagunsodhani/KeyValueMemNN.md
Summary of paper "Key-Value Memory Networks for Directly Reading Documents"

Key-Value Memory Networks for Directly Reading Documents

Introduction

  • Knowledge Bases (KBs) are effective tools for Question Answering (QA) but are often too restrictive (due to fixed schema) and too sparse (due to limitations of Information Extraction (IE) systems).
  • The paper proposes Key-Value Memory Networks, a neural network architecture based on Memory Networks that can leverage both KBs and raw data for QA.
  • The paper also introduces MOVIEQA, a new QA dataset that can be answered by a perfect KB, by Wikipedia pages and by an imperfect KB obtained using IE techniques thereby allowing a comparison between systems using any of the three sources.
  • Link to the paper.

Related Work

tmux cheatsheet

As configured in my dotfiles.

start new:

tmux

start new with session name:

@sidgan
sidgan / gist:735cc2c4ee1b1e32edcc700c84527003
Created April 2, 2016 06:17 — forked from arvearve/gist:4158578
Mathematics: What do grad students in math do all day?

Mathematics: What do grad students in math do all day?

by Yasha Berchenko-Kogan

A lot of math grad school is reading books and papers and trying to understand what's going on. The difficulty is that reading math is not like reading a mystery thriller, and it's not even like reading a history book or a New York Times article.

The main issue is that, by the time you get to the frontiers of math, the words to describe the concepts don't really exist yet. Communicating these ideas is a bit like trying to explain a vacuum cleaner to someone who has never seen one, except you're only allowed to use words that are four letters long or shorter.

What can you say?

@sidgan
sidgan / moods.md
Created October 31, 2015 02:08 — forked from kylemcdonald/moods.md
List of moods sorted by a TSP using euclidean distance in word2vec space.

old rushed dashed squashed crushed smothered suffocated trapped rescued saved