CVNov 26, 2025

FIELDS: Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision

arXiv:2511.21245v2
AI Analysis

This addresses the limitation of existing methods for facial expression analysis and reconstruction, though it appears incremental as it builds on self-supervised 2D approaches with added supervision.

The paper tackled the problem of 3D face reconstruction missing subtle emotional details by proposing FIELDS, which uses direct 3D expression supervision and an emotion recognition branch to bridge the 2D/3D domain gap, resulting in high-fidelity reconstructions that significantly improve in-the-wild facial expression recognition performance.

Facial expressions convey the bulk of emotional information in human communication, yet existing 3D face reconstruction methods often miss subtle affective details due to reliance on 2D supervision and lack of 3D ground truth. We propose FIELDS (Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision) to address these limitations by extending self-supervised 2D image consistency cues with direct 3D expression parameter supervision and an auxiliary emotion recognition branch. Our encoder is guided by authentic expression parameters from spontaneous 4D facial scans, while an intensity-aware emotion loss encourages the 3D expression parameters to capture genuine emotion content without exaggeration. This dual-supervision strategy bridges the 2D/3D domain gap and mitigates expression-intensity bias, yielding high-fidelity 3D reconstructions that preserve subtle emotional cues. From a single image, FIELDS produces emotion-rich face models with highly realistic expressions, significantly improving in-the-wild facial expression recognition performance without sacrificing naturalness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes