- Build a supervised reading comprehension data set using news corpus.
- Compare the performance of neural models and state-of-the-art natural language processing model on reading comprehension task.
- Link to the paper
- Estimate conditional probability p(a|c, q), where c is a context document, q is a query related to the document, and a is the answer to that query.
- Use online newspapers (CNN and DailyMail) and their matching summaries.
- Parse summaries and bullet points into Cloze style questions.
- Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
- Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
- The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.
- Majority Baseline
- Picks the most frequently observed entity in the context document.
- Exclusive Majority
- Picks the most frequently observed entity in the context document which is not observed in the query.
-
Frame-Semantic Parsing
- Parse the sentence to find predicates to answer questions like "who did what to whom".
- Extracting entity-predicate triples (e1,V, e2) from query q and context document d
- Resolve queries using rules like
exact match
,matching entity
etc.
-
Word Distance Benchmark
- Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
- Sum the distance of every word in q to their nearest aligned word in d
-
Deep LSTM Reader
- Test the ability of Deep LSTM encoders to handle significantly longer sequences.
- Feed the document query pair as a single large document, one word at a time.
- Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.
-
Attentive Reader
- Employ attention model to overcome the bottleneck of fixed width hidden vector.
- Encode the document and the query using separate bidirectional single layer LSTM.
- Query encoding is obtained by concatenating the final forward and backwards outputs.
- Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
- The weights can be interpreted as the degree to which the network attends to a particular token in the document.
- Model completed by defining a non-linear combination of document and query embedding.
-
Impatient Reader
- As an add-on to the attentive reader, the model can re-read the document as each query token is read.
- Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.
- Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
- Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
- Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
- Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
- The paper also includes heat maps over the context documents to visualise the attention mechanism.
Thank you for sharing a good piece of work.
Let me also ask if you had found a link to an implementation of the Attentive and Impatient Readers?