CVLGDec 30, 2025

MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model

arXiv:2512.24231v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the generalization issue in FER for researchers and practitioners, though it is incremental as it adapts an existing foundation model to a specific task.

The paper tackles the problem of weak generalization in facial emotion recognition (FER) models across diverse datasets by introducing MotivNet, which uses the Meta-Sapiens backbone to achieve competitive performance without cross-domain training, making FER more viable for real-world applications.

In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization when tested on diverse data, leading to deteriorated performance in the real world and hindering FER as a research domain. Though researchers have proposed complex architectures to address this generalization issue, they require training cross-domain to obtain generalizable results, which is inherently contradictory for real-world application. Our model, MotivNet, achieves competitive performance across datasets without cross-domain training by using Meta-Sapiens as a backbone. Sapiens is a human vision foundational model with state-of-the-art generalization in the real world through large-scale pretraining of a Masked Autoencoder. We propose MotivNet as an additional downstream task for Sapiens and define three criteria to evaluate MotivNet's viability as a Sapiens task: benchmark performance, model similarity, and data similarity. Throughout this paper, we describe the components of MotivNet, our training approach, and our results showing MotivNet is generalizable across domains. We demonstrate that MotivNet can be benchmarked against existing SOTA models and meets the listed criteria, validating MotivNet as a Sapiens downstream task, and making FER more incentivizing for in-the-wild application. The code is available at https://github.com/OSUPCVLab/EmotionFromFaceImages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes