CV AIMar 13, 2025

Lightweight Models for Emotional Analysis in Video

Quoc-Tien Nguyen, Hong-Hai Nguyen, Van-Thong Huynh

arXiv:2503.10530v23 citationsh-index: 10

Originality Synthesis-oriented

AI Analysis

This work addresses real-time emotional analysis for mobile and embedded computing, but it appears incremental as it builds on existing methods like MobileNetV4 and MLP-Mixer.

The study tackled efficient emotional analysis in video by proposing a lightweight model combining MobileNetV4 and a multi-scale 3D MLP-Mixer, achieving promising performance on the ABAW 8th competition dataset.

In this study, we present an approach for efficient spatiotemporal feature extraction using MobileNetV4 and a multi-scale 3D MLP-Mixer-based temporal aggregation module. MobileNetV4, with its Universal Inverted Bottleneck (UIB) blocks, serves as the backbone for extracting hierarchical feature representations from input image sequences, ensuring both computational efficiency and rich semantic encoding. To capture temporal dependencies, we introduce a three-level MLP-Mixer module, which processes spatial features at multiple resolutions while maintaining structural integrity. Experimental results on the ABAW 8th competition demonstrate the effectiveness of our approach, showing promising performance in affective behavior analysis. By integrating an efficient vision backbone with a structured temporal modeling mechanism, the proposed framework achieves a balance between computational efficiency and predictive accuracy, making it well-suited for real-time applications in mobile and embedded computing environments.

View on arXiv PDF

Similar