CLJul 22, 2020

Massive Multi-Document Summarization of Product Reviews with Weak Supervision

arXiv:2007.11348v115 citations
AI Analysis

This addresses the challenge of product review summarization for e-commerce and consumers by handling large-scale data, though it is incremental as it builds on standard summarization algorithms with weak supervision.

The paper tackles the problem of summarizing massive sets of product reviews (up to tens of thousands), showing that summarizing small samples can lose important information and lead to misleading evaluations. It proposes a weakly supervised schema that significantly improves ROUGE scores over baselines and exhibits strong coherence in manual assessments.

Product reviews summarization is a type of Multi-Document Summarization (MDS) task in which the summarized document sets are often far larger than in traditional MDS (up to tens of thousands of reviews). We highlight this difference and coin the term "Massive Multi-Document Summarization" (MMDS) to denote an MDS task that involves hundreds of documents or more. Prior work on product reviews summarization considered small samples of the reviews, mainly due to the difficulty of handling massive document sets. We show that summarizing small samples can result in loss of important information and provide misleading evaluation results. We propose a schema for summarizing a massive set of reviews on top of a standard summarization algorithm. Since writing large volumes of reference summaries needed for advanced neural network models is impractical, our solution relies on weak supervision. Finally, we propose an evaluation scheme that is based on multiple crowdsourced reference summaries and aims to capture the massive review collection. We show that an initial implementation of our schema significantly improves over several baselines in ROUGE scores, and exhibits strong coherence in a manual linguistic quality assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes