CVMar 7

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction

arXiv:2603.07093v1
Predicted impact top 8% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the problem of generating natural and socially aligned facial expressions for virtual agents, improving the realism of human-computer interaction.

This paper proposes a method for generating facial expressions for dyadic interactions that are aligned with human preference. It frames identity-independent expression generation as an action learning process and uses human feedback to produce contextually and emotionally appropriate expressions, outperforming existing benchmarks.

Achieving natural dyadic interaction requires generating facial expressions that are emotionally appropriate and socially aligned with human preference. Human feedback offers a compelling mechanism to guide such alignment, yet how to effectively incorporate this feedback into facial expression generation remains underexplored. In this paper, we propose a facial expression generation method aligned with human preference by leveraging human feedback to produce contextually and emotionally appropriate expressions for natural dyadic interaction. A key to our method is framing the generation of identity-independent facial expressions as an action learning process, allowing human feedback to assess their validity free from visual or identity bias. We establish a closed feedback loop in which listener expressions dynamically respond to evolving conversational cues of the speaker. Concretely, we train a vision-language-action model via supervised fine-tuning to map the speaker's multimodal signals into controllable low-dimensional expression representations of a 3D morphable model. We further introduce a human-feedback reinforcement learning strategy that integrates the imitation of high-quality expression response with critic-guided optimization. Experiments on two benchmarks demonstrate that our method effectively aligns facial expressions with human preference and achieves superior performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes