CVSep 28, 2025

SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss

arXiv:2509.24004v1
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge for applications in virtual reality or gaming by providing fine-grained, intuitive control over avatar expressions via text, though it appears incremental as it builds on existing single-image 3D generation methods.

The paper tackles the problem of generating expressive 3D head avatars from a single image with text-based control, achieving significant improvements in controllability and realism by outperforming competitive methods in identity preservation and expression fidelity.

Generating high-fidelity 3D head avatars from a single image is challenging, as current methods lack fine-grained, intuitive control over expressions via text. This paper proposes SIE3D, a framework that generates expressive 3D avatars from a single image and descriptive text. SIE3D fuses identity features from the image with semantic embedding from text through a novel conditioning scheme, enabling detailed control. To ensure generated expressions accurately match the text, it introduces an innovative perceptual expression loss function. This loss uses a pre-trained expression classifier to regularize the generation process, guaranteeing expression accuracy. Extensive experiments show SIE3D significantly improves controllability and realism, outperforming competitive methods in identity preservation and expression fidelity on a single consumer-grade GPU. Project page: https://blazingcrystal1747.github.io/SIE3D/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes