Xiang Guo

h-index35

4papers

177citations

Novelty46%

AI Score30

Ranked #135,927 of 194,257 authors (top 70%)#44,799 in CV (top 76%)

4 Papers

21.9CVJun 15, 2022

Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis

Xiang Guo, Guanying Chen, Yuchao Dai et al.

Recently, Neural Radiance Fields (NeRF) is revolutionizing the task of novel view synthesis (NVS) for its superior performance. In this paper, we propose to synthesize dynamic scenes. Extending the methods for static scenes to dynamic scenes is not straightforward as both the scene geometry and appearance change over time, especially under monocular setup. Also, the existing dynamic NeRF methods generally require a lengthy per-scene training procedure, where multi-layer perceptrons (MLP) are fitted to model both motions and radiance. In this paper, built on top of the recent advances in voxel-grid optimization, we propose a fast deformable radiance field method to handle dynamic scenes. Our method consists of two modules. The first module adopts a deformation grid to store 3D dynamic features, and a light-weight MLP for decoding the deformation that maps a 3D point in the observation space to the canonical space using the interpolated features. The second module contains a density and a color grid to model the geometry and density of the scene. The occlusion is explicitly modeled to further improve the rendering quality. Experimental results show that our method achieves comparable performance to D-NeRF using only 20 minutes for training, which is more than 70x faster than D-NeRF, clearly demonstrating the efficiency of our proposed method.

0.5CLSep 6, 2023

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Aobo Xia, Shuyu Lei, Yushu Yang et al.

This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.

34.1CVApr 9, 2024

3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis

Zhicheng Lu, Xiang Guo, Le Hui et al.

In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. The project is available at https://npucvr.github.io/GaGS/

2.9HCFeb 27, 2022

Roadway Design Matters: Variation in Bicyclists' Psycho-Physiological Responses in Different Urban Roadway Designs

Xiang Guo, Arash Tavakoli, Erin Robartes et al.

As a healthier and more sustainable way of mobility, cycling has been advocated by literature and policy. However, current trends in bicyclist crash fatalities suggest deficiencies in current roadway design in protecting these vulnerable road users. The lack of cycling data is a common challenge for studying bicyclists' safety, behavior, and comfort levels under different design contexts. To understand bicyclists' behavioral and physiological responses in an efficient and safe way, this study uses a bicycle simulator within an immersive virtual environment (IVE). Off-the-shelf sensors are utilized to evaluate bicyclists' cycling performance (speed and lane position) and physiological responses (eye tracking and heart rate (HR)). Participants bike in a simulated virtual environment modeled to scale from a real-world street with a shared bike lane (sharrow) to evaluate how introduction of a bike lane and a protected bike lane with pylons may impact perceptions of safety, as well as behavioral and psycho-physiological responses. Results from 50 participants show that the protected bike lane design received the highest perceived safety rating and exhibited the lowest average cycling speed. Furthermore, both the bike lane and the protected bike lane scenarios show a less dispersed gaze distribution than the as-built sharrow scenario, reflecting a higher gaze focus among bicyclists on the biking task in the bike lane and protected bike lane scenarios, compared to when bicyclists share right of way with vehicles. Additionally, heart rate change point results from the study suggest that creating dedicated zones for bicyclists (bike lanes or protected bike lanes) has the potential to reduce bicyclists' stress levels.