CVAISep 25, 2025

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

arXiv:2509.21245v114 citationsh-index: 10
Originality Highly original
AI Analysis

This work addresses the need for fine-grained, cross-modal controls in 3D asset generation for applications in games, film, and design, representing a novel method for a known bottleneck.

The paper tackles the problem of limited controllability in 3D asset generation by introducing Hunyuan3D-Omni, a unified framework that accepts multiple conditioning signals like point clouds and skeletal pose, resulting in improved generation accuracy and robustness for production workflows.

Recent advances in 3D-native generative models have accelerated asset creation for games, film, and design. However, most methods still rely primarily on image or text conditioning and lack fine-grained, cross-modal controls, which limits controllability and practical adoption. To address this gap, we present Hunyuan3D-Omni, a unified framework for fine-grained, controllable 3D asset generation built on Hunyuan3D 2.1. In addition to images, Hunyuan3D-Omni accepts point clouds, voxels, bounding boxes, and skeletal pose priors as conditioning signals, enabling precise control over geometry, topology, and pose. Instead of separate heads for each modality, our model unifies all signals in a single cross-modal architecture. We train with a progressive, difficulty-aware sampling strategy that selects one control modality per example and biases sampling toward harder signals (e.g., skeletal pose) while downweighting easier ones (e.g., point clouds), encouraging robust multi-modal fusion and graceful handling of missing inputs. Experiments show that these additional controls improve generation accuracy, enable geometry-aware transformations, and increase robustness for production workflows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes