CVJun 18, 2021

Multi-Granularity Network with Modal Attention for Dense Affective Understanding

arXiv:2106.09964v1
Originality Incremental advance
AI Analysis

This addresses video creation and recommendation needs by improving frame-level affective prediction, but it is incremental as it builds on existing challenge tasks.

The paper tackled dense affective understanding in videos by predicting frame-level evoked expressions, proposing a multi-granularity network with modal attention that achieved a correlation score of 0.02292 in the EEV challenge.

Video affective understanding, which aims to predict the evoked expressions by the video content, is desired for video creation and recommendation. In the recent EEV challenge, a dense affective understanding task is proposed and requires frame-level affective prediction. In this paper, we propose a multi-granularity network with modal attention (MGN-MA), which employs multi-granularity features for better description of the target frame. Specifically, the multi-granularity features could be divided into frame-level, clips-level and video-level features, which corresponds to visual-salient content, semantic-context and video theme information. Then the modal attention fusion module is designed to fuse the multi-granularity features and emphasize more affection-relevant modals. Finally, the fused feature is fed into a Mixtures Of Experts (MOE) classifier to predict the expressions. Further employing model-ensemble post-processing, the proposed method achieves the correlation score of 0.02292 in the EEV challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes