Created
August 21, 2016 15:19
Revisions
-
shagunsodhani created this gist
Aug 21, 2016 .There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,94 @@ # WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia ## Introduction * Large scale natural language understanding task - predict text values given a knowledge base. * Accompanied by a large dataset generated using Wikipedia * [Link to the paper](http://www.aclweb.org/anthology/P/P16/P16-1145.pdf) ## Dataset * WikiReading dataset built using Wikidata and Wikipedia. * Wikidata consists of statements of the form (property, value) about different items * 80M statements, 16M items and 884 properties. * These statements are grouped by items to get (item, property, answer) tuples where the answer is a set of values. * Items are further replaced by their Wikipedia documents to generate 18.58M statements of the form (document, property, answer). * Task is to predict answer given document and property. * Properties are divided into 2 classes: * **Categorical properties** - properties with a small number of possible answers. Eg gender. * **Relational properties** - properties with unique answers. Eg date of birth. * This classification is done on the basis of the entropy of answer distribution. * Properties with entropy less than 0.7 are classified as categorical properties. * Answer distribution has a small number of very high-frequency answers (head) and a large number of answers with very small frequency (tail). * 30% of the answers do not appear in the training set and must be inferred from the document. ## Models ### Answer Classification * Consider WikiReading as classification task and treat each answer as a class label. #### Baseline * Linear model over Bag of Words (BoW) features. * Two BoW vectors computed - one for the document and other for the property. These are concatenated into a single feature vector. #### Neural Networks Method * Encode property and document into a joint representation which is fed into a softmax layer. * **Average Embeddings BoW** * Average the BoW embeddings for documents and property and concatenate to get joint representation. * **Paragraph Vectors** * As a variant of the previous method, encode document as a paragraph vector. * **LSTM Reader** * LSTM reads the property and document sequence, word-by-word, and uses the final state as joint representation. * **Attentive Reader** * Use attention mechanism to focus on relevant parts of the document for a given property. * **Memory Networks** * Maps a property p and list of sentences x<sub>1</sub>, x<sub>2</sub>, ...x<sub>n</sub> in a joint representation by attention over the sentences in the document. ### Answer Extraction * For relational properties, it makes more sense to model the problem as information extraction than classification. * **RNNLabeler** * Use an RNN to read the sequence of words and estimate if a given word is part of the answer. * **Basic SeqToSeq (Sequence to Sequence)** * Similar to LSTM Reader but augmented with a second RNN to decode answer as a sequence of words. * **Placeholder SeqToSeq** * Extends Basic SeqToSeq to handle OOV (Out of Vocabulary) words by adding placeholders to the vocabulary. * OOV words in the document and answer are replaced by placeholders so that input and output sentences are a mixture of words and placeholders only. * **Basic Character SeqToSeq** * Property encoder RNN reads the property, character-by-character and transforms it into a fixed length vector. * This becomes the initial hidden state for the second layer of a 2-layer document encoder RNN. * Final state of this RNN is used by answer decoder RNN to generate answer as a character sequence. * **Character SeqToSeq with pretraining** * Train a character-level language model on input character sequence from the training set and use the weights to initiate the first layer of encoder and decoder. ## Experiments * Evaluation metric is F1 score (harmonic mean of precision and accuracy). * All models perform well on categorical properties with neural models outperforming others. * In the case of relational properties, SeqToSeq models have a clear edge. * SeqToSeq models also show a great deal of balance between relational and categorical properties. * Language model pretraining enhances the performance of character SeqToSeq approach. * Results demonstrate that end-to-end SeqToSeq models are most promising for WikiReading like tasks.