Tianyang Zhao

CV
5papers
648citations
Novelty52%
AI Score42

5 Papers

66.2CVMar 18
PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Yijing Guo, Mengjun Chao, Luo Wang et al.

Panoramic imagery offers a full 360° field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.

LGApr 7, 2021
Trajectory Prediction with Latent Belief Energy-Based Model

Bo Pang, Tianyang Zhao, Xu Xie et al.

Human trajectory prediction is critical for autonomous platforms like self-driving cars or social robots. We present a latent belief energy-based model (LB-EBM) for diverse human trajectory forecast. LB-EBM is a probabilistic model with cost function defined in the latent space to account for the movement history and social context. The low-dimensionality of the latent space and the high expressivity of the EBM make it easy for the model to capture the multimodality of pedestrian trajectory distributions. LB-EBM is learned from expert demonstrations (i.e., human trajectories) projected into the latent space. Sampling from or optimizing the learned LB-EBM yields a belief vector which is used to make a path plan, which then in turn helps to predict a long-range trajectory. The effectiveness of LB-EBM and the two-step approach are supported by strong empirical results. Our model is able to make accurate, multi-modal, and social compliant trajectory predictions and improves over prior state-of-the-arts performance on the Stanford Drone trajectory prediction benchmark by 10.9% and on the ETH-UCY benchmark by 27.6%.

OCNov 6, 2019
Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach

Shuhan Yao, Jiuxiang Gu, Peng Wang et al.

Mobile energy storage systems (MESSs) provide mobility and flexibility to enhance distribution system resilience. The paper proposes a Markov decision process (MDP) formulation for an integrated service restoration strategy that coordinates the scheduling of MESSs and resource dispatching of microgrids. The uncertainties in load consumption are taken into account. The deep reinforcement learning (DRL) algorithm is utilized to solve the MDP for optimal scheduling. Specifically, the twin delayed deep deterministic policy gradient (TD3) is applied to train the deep Q-network and policy network, then the well trained policy can be deployed in on-line manner to perform multiple actions simultaneously. The proposed model is demonstrated on an integrated test system with three microgrids connected by Sioux Falls transportation network. The simulation results indicate that mobile and stationary energy resources can be well coordinated to improve system resilience.

LGApr 10, 2019
Energy-Based Continuous Inverse Optimal Control

Yifei Xu, Jianwen Xie, Tianyang Zhao et al.

The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the energy-based model simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the energy-based model. We demonstrate the proposed methods on autonomous driving tasks, and show that they can learn suitable cost functions for optimal control.

CVApr 9, 2019
Multi-Agent Tensor Fusion for Contextual Trajectory Prediction

Tianyang Zhao, Yifei Xu, Mathew Monfort et al.

Accurate prediction of others' trajectories is essential for autonomous driving. Trajectory prediction is challenging because it requires reasoning about agents' past movements, social interactions among varying numbers and kinds of agents, constraints from the scene context, and the stochasticity of human behavior. Our approach models these interactions and constraints jointly within a novel Multi-Agent Tensor Fusion (MATF) network. Specifically, the model encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. The model decodes recurrently to multiple agents' future trajectories, using adversarial loss to learn stochastic predictions. Experiments on both highway driving and pedestrian crowd datasets show that the model achieves state-of-the-art prediction accuracy.