CLDec 22, 2021

Consistency and Coherence from Points of Contextual Similarity

arXiv:2112.11638v20.51 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of factual consistency evaluation in summarization for researchers and developers, but it is incremental as it builds on an existing method.

The authors tackled the limitation of the ESTIME measure, which was restricted to text-summary pairs with high dictionary overlap, by generalizing it to work with any text-summary pairs, while also analyzing BERT layers to find that useful information for consistency and fluency resides in layers close to the top, and for coherence and relevance, the pattern is more complex.

Factual consistency is one of important summary evaluation dimensions, especially as summary generation becomes more fluent and coherent. The ESTIME measure, recently proposed specifically for factual consistency, achieves high correlations with human expert scores both for consistency and fluency, while in principle being restricted to evaluating such text-summary pairs that have high dictionary overlap. This is not a problem for current styles of summarization, but it may become an obstacle for future summarization systems, or for evaluating arbitrary claims against the text. In this work we generalize the method, and make a variant of the measure applicable to any text-summary pairs. As ESTIME uses points of contextual similarity, it provides insights into usefulness of information taken from different BERT layers. We observe that useful information exists in almost all of the layers except the several lowest ones. For consistency and fluency - qualities focused on local text details - the most useful layers are close to the top (but not at the top); for coherence and relevance we found a more complicated and interesting picture.

View on arXiv PDF

Similar