Igor Brigadir igorbrigadir

Key-Value Memory Networks for Directly Reading Documents

Introduction

Knowledge Bases (KBs) are effective tools for Question Answering (QA) but are often too restrictive (due to fixed schema) and too sparse (due to limitations of Information Extraction (IE) systems).
The paper proposes Key-Value Memory Networks, a neural network architecture based on Memory Networks that can leverage both KBs and raw data for QA.
The paper also introduces MOVIEQA, a new QA dataset that can be answered by a perfect KB, by Wikipedia pages and by an imperfect KB obtained using IE techniques thereby allowing a comparison between systems using any of the three sources.
Link to the paper.

Related Work

How to add an image to a gist

Create or find a gist that you own.

Clone your gist (replace <hash> with your gist's hash):

# with ssh
git clone [email protected]:<hash>.git mygist

# with https

git clone https://gist.github.com/.git mygist

Interactive Machine Learning

Taught by Brad Knox at the MIT Media Lab in 2014. Course website. Lecture and visiting speaker notes.

Power to the People: The Role of Humans in Interactive Machine Learning by Knox, Cakmak, Kulesza, Amershi, and Lau
A Few Useful Things to Know about Machine Learning by Domingos
Machine Learning that Matters by Wagstaff
Beyond Concise and Colorful: Learning Intelligible Rules by Pazzani et al.
[Designing Games with a Purpose] (https://www.cs.cmu.edu/~biglou/GWAP_CACM.pdf) by von Ahn and Dabbish
[Human Model Evaluation in Interactive Supervised

The basics

This is a collection of basic "recipes", many using twurl (the Swiss Army Knife for the Twitter API!) and jq to query the Twitter API and format the results. Also, some scripts to test or automate common actions.

An idea that I proved unable to express in the number of characters on Twitter:

Train two word2vec models on the same corpus with 100 dimensions apiece; one with window size 5, and one with window size 15 (say).

Now you have 2 100-dimensional vector spaces with the same words in each.

That's the same as 1 200-dimensional vector space: you just append each of the vectors to each other.

That vector space has all the information from each of the original models in it: you can just use linear algebra to flatten it out along either of the original 100 degree vectors.

	#!/usr/bin/env python

	"""

	Twitter's API doesn't allow you to get replies to a particular tweet. Strange
	but true. But you can use Twitter's Search API to search for tweets that are
	directed at a particular user, and then search through the results to see if
	any are replies to a given tweet. You probably are also interested in the
	replies to any replies as well, so the process is recursive. The big caveat
	here is that the search API only returns results for the last 7 days. So

	n <- 200
	m <- 40
	set.seed(1)
	x <- runif(n, -1, 1)
	library(rafalib)
	bigpar(2,2,mar=c(3,3,3,1))
	library(RColorBrewer)
	cols <- brewer.pal(11, "Spectral")[as.integer(cut(x, 11))]
	plot(x, rep(0,n), ylim=c(-1,1), yaxt="n", xlab="", ylab="",
	col=cols, pch=20, main="underlying data")

	The MIT License (MIT)

	Copyright (c) 2016 Jim Kang

	Permission is hereby granted,
	free of charge,
	to any person obtaining a copy
	of this software and associated documentation files (the "Software"),
	to deal
	in the Software without restriction,

	# Check URLs in a document

	## This code will extract URLs from a text document using regex,
	## then execute an HTTP HEAD request on each and report whether
	## the request failed, whether a redirect occurred, etc. It might
	## be useful for cleaning up linkrot.

	if (!require("httr")) {
	install.packages("httr", repos = "http://cran.rstudio.com/")
	}