CLJan 20

Comparing Without Saying: A Dataset and Benchmark for Implicit Comparative Opinion Mining from Same-User Reviews

Thanh-Lam T. Nguyen, Ngoc-Quang Le, Quoc-Trung Phu, Thi-Phuong Le, Ngoc-Huyen Pham, Phuong-Nguyen Nguyen, Hoang-Quynh Le

arXiv:2601.13575v11.11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses the underexplored challenge of mining implicit comparisons in real-world reviews for applications like recommendation systems, though it is incremental as it focuses on dataset creation and benchmarking.

The paper tackles the problem of implicit comparative opinion mining from same-user reviews, where users express preferences across separate reviews without explicit comparative cues, by introducing the SUDO dataset with 4,150 annotated review pairs and showing that language model-based baselines outperform traditional methods but overall performance remains moderate.

Existing studies on comparative opinion mining have mainly focused on explicit comparative expressions, which are uncommon in real-world reviews. This leaves implicit comparisons - here users express preferences across separate reviews - largely underexplored. We introduce SUDO, a novel dataset for implicit comparative opinion mining from same-user reviews, allowing reliable inference of user preferences even without explicit comparative cues. SUDO comprises 4,150 annotated review pairs (15,191 sentences) with a bi-level structure capturing aspect-level mentions and review-level preferences. We benchmark this task using two baseline architectures: traditional machine learning- and language model-based baselines. Experimental results show that while the latter outperforms the former, overall performance remains moderate, revealing the inherent difficulty of the task and establishing SUDO as a challenging and valuable benchmark for future research.

View on arXiv PDF

Similar