Sensitivity of BLANC to human-scored qualities of text summaries
This work addresses the need for reliable automated evaluation metrics in natural language processing, specifically for text summarization, by validating BLANC against human assessments, though it is incremental as it focuses on parameter tuning rather than introducing a new method.
The paper investigated how well the BLANC summary quality estimator aligns with human judgments across five summary qualities (fluency, understandability, informativeness, compactness, and factual correctness), finding that with optimal parameters, BLANC's sensitivity to most qualities is comparable to that of a human annotator.
We explore the sensitivity of a document summary quality estimator, BLANC, to human assessment of qualities for the same summaries. In our human evaluations, we distinguish five summary qualities, defined by how fluent, understandable, informative, compact, and factually correct the summary is. We make the case for optimal BLANC parameters, at which the BLANC sensitivity to almost all of summary qualities is about as good as the sensitivity of a human annotator.