CLMar 18, 2023

Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain

arXiv:2303.10328v11 citationsh-index: 48
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating summarization quality in the biomedical domain for researchers and practitioners, but it is incremental as it focuses on validating existing metrics rather than introducing new ones.

The study assessed whether existing automatic summarization evaluation metrics, developed for general domains, are effective for biomedical question summarization by conducting human evaluations across four quality aspects. It identified key features for current metrics and systems and released a human-annotated dataset to support future research in this domain.

Automatic evaluation metrics have been facilitating the rapid development of automatic summarization methods by providing instant and fair assessments of the quality of summaries. Most metrics have been developed for the general domain, especially news and meeting notes, or other language-generation tasks. However, these metrics are applied to evaluate summarization systems in different domains, such as biomedical question summarization. To better understand whether commonly used evaluation metrics are capable of evaluating automatic summarization in the biomedical domain, we conduct human evaluations of summarization quality from four different aspects of a biomedical question summarization task. Based on human judgments, we identify different noteworthy features for current automatic metrics and summarization systems as well. We also release a dataset of our human annotations to aid the research of summarization evaluation metrics in the biomedical domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes