Haotian Mao

2papers

2 Papers

83.9HCApr 15
LIVE-GS: LLM Powers Interactive VR Experience with Physics-Aware Gaussian Splatting

Haotian Mao, Hangyu Zhou, Zhuoxiong Xu et al.

As 3D Gaussian Splatting (3DGS) emerges as a leading approach for novel view synthesis and scene reconstruction, its potential in digital asset creation has gained significant attention. An increasing number of asset libraries based on GS are being established. However, generating physics-based dynamic assets remains a time-consuming and expertise-intensive task, especially for non-experts. In this paper, we propose LIVE-GS, a highly realistic Virtual Reality (VR) system powered by Large Language Models (LLMs), which enables rapid creation of dynamic Gaussian assets and real-time VR interactions. To inform our system design, we conducted interviews to examine challenges faced by current GS-based VR systems and the specific demands of users. Based on these insights, we employed GPT-4o to analyze key physical properties of objects that significantly impact user interactions, ensuring physics-based interactions in VR align with real-world phenomena. A key innovation of LIVE-GS is its ability to predict reasonable parameters in just 10 seconds from static Gaussian assets while maintaining high-quality VR interactions. To validate our approach, we invited participants experienced in physical simulation to manually adjust physical parameters, providing a baseline for comparison in both asset quality and authoring efficiency. We also conducted a comprehensive user study to evaluate system usability and user satisfaction. Experimental results demonstrate that LIVE-GS, leveraging LLMs' scene understanding capabilities, can achieve efficient physical scene creation and natural interactions without requiring manual design or annotation.

66.2GRMay 16
VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

Haotian Mao, Yuhan Huang, Jiatao Lin et al.

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.