AMII: Adaptive Multimodal Inter-personal and Intra-personal Model for Adapted Behavior Synthesis
This addresses the problem of generating realistic non-verbal behavior for virtual agents in human-computer interaction, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the challenge of synthesizing adaptive facial gestures for Socially Interactive Agents (SIAs) that can act as speakers or listeners, proposing AMII, which uses modality memory encoding and attention mechanisms to capture intra-personal and inter-personal relationships, and validates it with objective evaluations against state-of-the-art approaches.
Socially Interactive Agents (SIAs) are physical or virtual embodied agents that display similar behavior as human multimodal behavior. Modeling SIAs' non-verbal behavior, such as speech and facial gestures, has always been a challenging task, given that a SIA can take the role of a speaker or a listener. A SIA must emit appropriate behavior adapted to its own speech, its previous behaviors (intra-personal), and the User's behaviors (inter-personal) for both roles. We propose AMII, a novel approach to synthesize adaptive facial gestures for SIAs while interacting with Users and acting interchangeably as a speaker or as a listener. AMII is characterized by modality memory encoding schema - where modality corresponds to either speech or facial gestures - and makes use of attention mechanisms to capture the intra-personal and inter-personal relationships. We validate our approach by conducting objective evaluations and comparing it with the state-of-the-art approaches.