CLJun 14, 2022

CHQ-Summ: A Dataset for Consumer Healthcare Question Summarization

arXiv:2206.06581v225 citationsh-index: 55
Originality Synthesis-oriented
AI Analysis

This provides a resource for improving natural language understanding of consumer health posts on social media, but it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the challenge of summarizing verbose consumer health questions by introducing CHQ-Summ, a dataset of 1,507 expert-annotated question-summary pairs derived from online forums, and benchmarked it with state-of-the-art models to demonstrate its effectiveness.

The quest for seeking health information has swamped the web with consumers' health-related questions. Generally, consumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. To address this issue, we introduce a new dataset, CHQ-Summ that contains 1507 domain-expert annotated consumer health questions and corresponding summaries. The dataset is derived from the community question-answering forum and therefore provides a valuable resource for understanding consumer health-related posts on social media. We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes