CLCVMay 27, 2020

A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews

arXiv:2005.13362v2998 citations
AI Analysis

This addresses the problem of extracting detailed opinions from video reviews for applications in recommendation systems or market analysis, representing an incremental advance by extending existing text-based methods to multi-modal data.

The paper tackled fine-grained opinion mining from video reviews by proposing a multi-modal approach that uses audio, video, and text features to identify aspects and sentiment without time annotations, showing consistent performance gains over text-only baselines on two datasets.

Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews. In light of this issue, we propose a multi-modal approach for mining fine-grained opinions from video reviews that is able to determine the aspects of the item under review that are being discussed and the sentiment orientation towards them. Our approach works at the sentence level without the need for time annotations and uses features derived from the audio, video and language transcriptions of its contents. We evaluate our approach on two datasets and show that leveraging the video and audio modalities consistently provides increased performance over text-only baselines, providing evidence these extra modalities are key in better understanding video reviews.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes