Shaoyu Huang

AI
h-index24
4papers
4citations
Novelty56%
AI Score46

4 Papers

CVAug 4, 2025Code
StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion

Haoxin Yang, Weihong Chen, Xuemiao Xu et al.

Monocular 3D human pose estimation remains a challenging task due to inherent depth ambiguities and occlusions. Compared to traditional methods based on Transformers or Convolutional Neural Networks (CNNs), recent diffusion-based approaches have shown superior performance, leveraging their probabilistic nature and high-fidelity generation capabilities. However, these methods often fail to account for the spatial and temporal correlations across predicted frames, resulting in limited temporal consistency and inferior accuracy in predicted 3D pose sequences. To address these shortcomings, this paper proposes StarPose, an autoregressive diffusion framework that effectively incorporates historical 3D pose predictions and spatial-temporal physical guidance to significantly enhance both the accuracy and temporal coherence of pose predictions. Unlike existing approaches, StarPose models the 2D-to-3D pose mapping as an autoregressive diffusion process. By synergically integrating previously predicted 3D poses with 2D pose inputs via a Historical Pose Integration Module (HPIM), the framework generates rich and informative historical pose embeddings that guide subsequent denoising steps, ensuring temporally consistent predictions. In addition, a fully plug-and-play Spatial-Temporal Physical Guidance (STPG) mechanism is tailored to refine the denoising process in an iterative manner, which further enforces spatial anatomical plausibility and temporal motion dynamics, rendering robust and realistic pose estimates. Extensive experiments on benchmark datasets demonstrate that StarPose outperforms state-of-the-art methods, achieving superior accuracy and temporal consistency in 3D human pose estimation. Code is available at https://github.com/wileychan/StarPose.

CVMay 7
NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps

Dijia Zhan, Jinyi Li, Chenxi Zheng et al.

Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path planning problem on pre-built top-down maps, supported by our newly constructed R2R-TopDown dataset. To solve this, we introduce NavOne, a unified framework that directly predicts dense path probabilities over multi-modal maps in a single end-to-end forward pass. NavOne features a Top-Down Map Fuser for joint multi-modal map representation, and extends Attention Residuals for spatial-aware depth mixing. Extensive experiments on R2R-TopDown show that NavOne achieves state-of-the-art performance among map-based VLN methods, with a planning-stage speedup of 8x over existing map-based baselines and 80x over egocentric methods, enabling highly efficient global navigation.

AIApr 26
Transferable Human Mobility Network Reconstruction with neuroGravity

Jinming Yang, Shaoyu Huang, Zongyuan Huang et al.

Accurate modeling of human mobility is critical for tackling urban planning and public health challenges. In undeveloped regions, the absence of comprehensive travel surveys necessitates reconstructing mobility networks from publicly available data. Here we develop neuroGravity, a physics-informed deep learning model that reliably reconstructs mobility flows from limited observations and transfers to unobserved cities. Using only urban facility and population distributions, we find that neuroGravity's regional representations strongly correlate with socioeconomic and livability status, offering scalable proxies for costly surveys. Furthermore, we uncover that spatial income segregation plays a key role in model transferability: mobility networks are most reliably reconstructed when target cities share similar segregation levels with the source. We design an index to quantify this segregation and accurately predict transferability. Finally, we generate mobility flow proxies for over 1,200 cities worldwide, highlighting neuroGravity's potential to mitigate critical data shortages in resource-limited, underdeveloped areas.

AIJan 11, 2025
Where to Go Next Day: Multi-scale Spatial-Temporal Decoupled Model for Mid-term Human Mobility Prediction

Zongyuan Huang, Weipeng Wang, Shaoyu Huang et al.

Predicting individual mobility patterns is crucial across various applications. While current methods mainly focus on predicting the next location for personalized services like recommendations, they often fall short in supporting broader applications such as traffic management and epidemic control, which require longer period forecasts of human mobility. This study addresses mid-term mobility prediction, aiming to capture daily travel patterns and forecast trajectories for the upcoming day or week. We propose a novel Multi-scale Spatial-Temporal Decoupled Predictor (MSTDP) designed to efficiently extract spatial and temporal information by decoupling daily trajectories into distinct location-duration chains. Our approach employs a hierarchical encoder to model multi-scale temporal patterns, including daily recurrence and weekly periodicity, and utilizes a transformer-based decoder to globally attend to predicted information in the location or duration chain. Additionally, we introduce a spatial heterogeneous graph learner to capture multi-scale spatial relationships, enhancing semantic-rich representations. Extensive experiments, including statistical physics analysis, are conducted on large-scale mobile phone records in five cities (Boston, Los Angeles, SF Bay Area, Shanghai, and Tokyo), to demonstrate MSTDP's advantages. Applied to epidemic modeling in Boston, MSTDP significantly outperforms the best-performing baseline, achieving a remarkable 62.8% reduction in MAE for cumulative new cases.