CVAIMar 13, 2025

Lightweight Models for Emotional Analysis in Video

arXiv:2503.10530v23 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses real-time emotional analysis for mobile and embedded computing, but it appears incremental as it builds on existing methods like MobileNetV4 and MLP-Mixer.

The study tackled efficient emotional analysis in video by proposing a lightweight model combining MobileNetV4 and a multi-scale 3D MLP-Mixer, achieving promising performance on the ABAW 8th competition dataset.

In this study, we present an approach for efficient spatiotemporal feature extraction using MobileNetV4 and a multi-scale 3D MLP-Mixer-based temporal aggregation module. MobileNetV4, with its Universal Inverted Bottleneck (UIB) blocks, serves as the backbone for extracting hierarchical feature representations from input image sequences, ensuring both computational efficiency and rich semantic encoding. To capture temporal dependencies, we introduce a three-level MLP-Mixer module, which processes spatial features at multiple resolutions while maintaining structural integrity. Experimental results on the ABAW 8th competition demonstrate the effectiveness of our approach, showing promising performance in affective behavior analysis. By integrating an efficient vision backbone with a structured temporal modeling mechanism, the proposed framework achieves a balance between computational efficiency and predictive accuracy, making it well-suited for real-time applications in mobile and embedded computing environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes