CVNov 16, 2025

Hi-Reco: High-Fidelity Real-Time Conversational Digital Humans

arXiv:2511.12662v12 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of creating believable and responsive digital humans for immersive applications in communication, education, and entertainment, representing an incremental advancement by integrating existing components with novel coordination methods.

The paper tackles the challenge of achieving both visual realism and real-time responsiveness in conversational digital humans, presenting an integrated system that combines a realistic 3D avatar, expressive speech synthesis, and knowledge-grounded dialogue generation with minimal latency.

High-fidelity digital humans are increasingly used in interactive applications, yet achieving both visual realism and real-time responsiveness remains a major challenge. We present a high-fidelity, real-time conversational digital human system that seamlessly combines a visually realistic 3D avatar, persona-driven expressive speech synthesis, and knowledge-grounded dialogue generation. To support natural and timely interaction, we introduce an asynchronous execution pipeline that coordinates multi-modal components with minimal latency. The system supports advanced features such as wake word detection, emotionally expressive prosody, and highly accurate, context-aware response generation. It leverages novel retrieval-augmented methods, including history augmentation to maintain conversational flow and intent-based routing for efficient knowledge access. Together, these components form an integrated system that enables responsive and believable digital humans, suitable for immersive applications in communication, education, and entertainment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes