- Semantic parsing is the problem of mapping natural language utterances into logical forms that can be executed on a Knowledge Base (KB).
- The paper presents a new approach to semantic parsing that uses paraphrasing to leverage the large amount of text which is not covered by the KB.
- Link to the paper
-
Given an input utterance x, construct a set of logical forms Zx using a different kind of logical form templates.
-
For each logical form z in Zx, generate a small set Cz, of canonical utterances using Freebase description of the type, entity and property involved in the logical form.
-
Note: Both the steps above are performed with a small, simple set of deterministic rules which the authors found sufficient for their datasets.
-
For each z in Z and each c in Cz, use a paraphrase model to score pairs (c, z) given x.
-
The paraphrase model has two parts:
-
Association Model
- For each pair of (x, c), the model goes through all spans of x and c and identifies pairs of potential paraphrases (associations).
- To determine the associations, the model uses
- Phrase pairs from a phrase table, constructed using Paralex corpus.
- Linguistic features like lemma, POS tag and Wordnet derivations.
- During training, the model learns to weight the associations appropriately.
-
Vector Space Model
- Assign vector representations to x and c by averaging over the word2vec representations corresponding to the different words in these utterances.
- Estimate paraphrase score for (x, c) via weighted combination of their vector representations.
-
-
The two paraphrase models are complementary to each other in terms of the information they capture.
-
Dataset
- WEBQUESTIONS dataset - 5810 question answer pairs.
- FREE917 dataset - 917 questions (annoted with logical form).
-
Learning
- Given the question-answer pair (xi, yi), the objective function minimizes the log-likelihood of the correct answer along with the restriction of L1 regularization.
-
Results
- The proposed model improves the accuracy on WEBQUESTIONS by 12% and matches the best results on FREE917.
- Removing the association model results in a much larger degradation of performance as compared to removing the VS model.
- Error analysis suggests that the model:
- can not handle temporal relations.
- suffers from ambiguity in entity recognition.
- counts multiple associations multiple times and assigns inflated scores to such associations.
-
Comments
- The core idea of using paraphrasing for semantic parsing seems promising and would further benefit from advanced models like skip-thought vectors which provide a more natural vector representation for the sentences, thereby helping to reduce the dependence on handcrafted features.
Is there any good implementation of this available as open source?
I am trying sempre which is based on this same concept.
I am however stuck there with lack of good documentation about how to create paraphrase models for custom domain.