CVAIApr 19, 2024

Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

arXiv:2404.12782v110 citationsh-index: 16Has CodeACM Trans. Multim. Comput. Commun. Appl.
Originality Incremental advance
AI Analysis

This addresses a gap in interactive live video commenting by incorporating sentiment diversity, which is incremental as it builds on existing transformer and VAE methods.

The paper tackles the problem of generating diverse sentiment-aware comments for live videos, proposing So-TVAE which outperforms state-of-the-art methods in comment quality and diversity.

Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at https://github.com/fufy1024/So-TVAE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes