Created
January 11, 2017 16:06
-
-
Save srush/6dd95785a5cf6fbb8732dd7c704a9f0a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------------------------------------------------------------------------- | |
Reviewer's Scores | |
--------------------------------------------------------------------------- | |
Appropriateness: 5 | |
Clarity: 5 | |
Originality: 3 | |
Soundness / Correctness: 4 | |
Impact of Ideas / Results: 4 | |
Meaningful Comparison: 4 | |
Substance: 4 | |
Replicability: 4 | |
Recommendation: 4 | |
--------------------------------------------------------------------------- | |
Comments | |
--------------------------------------------------------------------------- | |
This paper conducts sentence-level abstractive summarization/rewriting with a | |
neural network approach. The method in principle can handle not just deletion | |
but also rewriting (using words not in the original sentences), while minimally | |
relies on linguistic analysis. The paper is easy to follow and is written | |
clearly. Advantages of the proposed models are supported by the experimental | |
results. I recommend for publication. | |
I have a couple of suggestions. First, (probably I missed something), will a | |
good neural-network-based deletion model be good enough to capture the benefit | |
seen in this paper? The approach proposed in this paper can generate summaries | |
with words not seen in the original sentences, but how much does that benefit? | |
(although the COMPRESS model is discussed in section 7.2, which used a very | |
different approach and still has some gaps). From summarization evaluation view | |
points, ROUGE may not value too much “unseen” words, compared with other | |
metrics like Pyramid, so the benefit of the proposed model may not be fully | |
showed. While Section 5 try to compromise towards ROUGE, some discussions may | |
help think of the issue in another way. | |
The paper used a trick (section 5) to tune the model towards ROUGE. If a little | |
more discussion can be added, it may help some readers further understand why | |
directly tuning an objective, e.g., ROUGE, is not feasible here, Is it because | |
that ROUGE may not be accurate on a small data set (like BLUE on individual | |
sentences), because of computational concerns, or because of other reasons. | |
============================================================================ | |
REVIEWER #2 | |
============================================================================ | |
--------------------------------------------------------------------------- | |
Reviewer's Scores | |
--------------------------------------------------------------------------- | |
Appropriateness: 5 | |
Clarity: 4 | |
Originality: 3 | |
Soundness / Correctness: 4 | |
Impact of Ideas / Results: 4 | |
Meaningful Comparison: 4 | |
Substance: 4 | |
Replicability: 3 | |
Recommendation: 4 | |
--------------------------------------------------------------------------- | |
Comments | |
--------------------------------------------------------------------------- | |
This paper talks about abstractive sentence summarization, specifically | |
headline generation using a neural language model. The work was evaluated on | |
the DUC 2004 dataset. The paper is well written and the work is interesting and | |
has been carefully evaluated. However, while the quantitative evaluation seems | |
reasonable, the actual summaries (from the examples) seems to have major | |
grammatical and repetition issues and does not look quite as good as the true | |
headline. Having said that, the idea is promising but it needs more work in | |
terms of soundness of generated sentences. | |
A few questions/comments: | |
Why was the model trained using only the first line in the text? What is the | |
intuition for this? Could the last line which summarizes the text be used as | |
well? It would be nice if this is discussed in the paper. | |
In Section 7.2 the authors talk about a capped ROUGE score, but the authors | |
don't explain how this is computed. Was this used in the DUC 2004 task, if so | |
please state and if not please provide the exact formula for reproduceability. | |
It seems like the authors tried to fit more content than the page limit would | |
allow as the bottom margin is completely off. Please fix this and make the | |
writing more concise. | |
============================================================================ | |
REVIEWER #3 | |
============================================================================ | |
--------------------------------------------------------------------------- | |
Reviewer's Scores | |
--------------------------------------------------------------------------- | |
Appropriateness: 4 | |
Clarity: 3 | |
Originality: 4 | |
Soundness / Correctness: 4 | |
Impact of Ideas / Results: 4 | |
Meaningful Comparison: 3 | |
Substance: 4 | |
Replicability: 3 | |
Recommendation: 4 | |
--------------------------------------------------------------------------- | |
Comments | |
--------------------------------------------------------------------------- | |
This paper utilizes neural language models to generate sentence summarization | |
word by word, which goes beyond previous sentence based extractive methods and | |
phrase based abstractive approaches for sentence summarization. More specific, | |
their Attention Based Summarization (ABS) approach is modeled off the attention | |
based encoder and a beam search decoder with extractive features, which can be | |
seen as a tradeoff between abstractive and extractive methods. | |
For the encoder models, they deploy four step-by-step models of which two are | |
only considering the input word information, and the other two combining the | |
embedded current context information. Thus, the latter two encoders, which are | |
simultaneously learning embeddings for the input with distributions based on | |
the current context, are capable to show interpretable alignment between the | |
summary and the input sentence. | |
The author conducted extensive experiments on several strong and well known | |
baseline models, achieving promising results. Especially, their tuned model | |
ABS+, which leverages the advantage of fluency by extractive features, scores | |
significantly the best on the tasks. While they introduced how to tune the | |
weight vector alpha, they don't report the real value of the alpha in the final | |
best performance. Such real values would be beneficial to examine the | |
importance of the extractive features. Therefore, I'm weakly hesitated for the | |
analysis of the degree of their attention based neural models. | |
I am just wondering how the grammar can be ensured with the proposed approach. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment