CLMar 21, 2023

Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization

Amazon
arXiv:2303.11660v1269 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses the problem of opinion summarization for users needing to process large volumes of reviews, but it is incremental as it builds on existing unsupervised methods with simple modifications.

The paper tackles the challenge of generating aspect-specific and general opinion summaries without annotated data by proposing two unsupervised approaches that train on synthetic datasets, with the first method outperforming existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific summarization.

Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes