Hengbo Ma

CV
h-index23
20papers
937citations
Novelty55%
AI Score34

20 Papers

LGOct 13, 2022
Learning Physical Dynamics with Subequivariant Graph Neural Networks

Jiaqi Han, Wenbing Huang, Hengbo Ma et al.

Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. However, they still encounter several challenges: 1) Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization and should be incorporated into the model design. Existing simulators either consider insufficient symmetry, or enforce excessive equivariance in practice when symmetry is partially broken by gravity. 2) Objects in the physical world possess diverse shapes, sizes, and properties, which should be appropriately processed by the model. To tackle these difficulties, we propose a novel backbone, Subequivariant Graph Neural Network, which 1) relaxes equivariance to subequivariance by considering external fields like gravity, where the universal approximation ability holds theoretically; 2) introduces a new subequivariant object-aware message passing for learning physical interactions between multiple objects of various shapes in the particle-based representation; 3) operates in a hierarchical fashion, allowing for modeling long-range and complex interactions. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall compared with state-of-the-art GNN simulators, while exhibiting strong generalization and data efficiency.

CVMar 5, 2022
Important Object Identification with Semi-Supervised Learning for Autonomous Driving

Jiachen Li, Haiming Gang, Hengbo Ma et al.

Accurate identification of important objects in the scene is a prerequisite for safe and high-quality decision making and motion planning of intelligent agents (e.g., autonomous vehicles) that navigate in complex and dynamic environments. Most existing approaches attempt to employ attention mechanisms to learn importance weights associated with each object indirectly via various tasks (e.g., trajectory prediction), which do not enforce direct supervision on the importance estimation. In contrast, we tackle this task in an explicit way and formulate it as a binary classification ("important" or "unimportant") problem. We propose a novel approach for important object identification in egocentric driving scenarios with relational reasoning on the objects in the scene. Besides, since human annotations are limited and expensive to obtain, we present a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data. Moreover, we propose to leverage the auxiliary tasks of ego vehicle behavior prediction to further improve the accuracy of importance estimation. The proposed approach is evaluated on a public egocentric driving dataset (H3D) collected in complex traffic scenarios. A detailed ablative study is conducted to demonstrate the effectiveness of each model component and the training strategy. Our approach also outperforms rule-based baselines by a large margin.

CVAug 10, 2022
EvolveHypergraph: Group-Aware Dynamic Relational Reasoning for Trajectory Prediction

Jiachen Li, Chuanbo Hua, Jinkyoo Park et al.

While the modeling of pair-wise relations has been widely studied in multi-agent interacting systems, its ability to capture higher-level and larger-scale group-wise activities is limited. In this paper, we propose a group-aware relational reasoning approach (named EvolveHypergraph) with explicit inference of the underlying dynamically evolving relational structures, and we demonstrate its effectiveness for multi-agent trajectory prediction. In addition to the edges between a pair of nodes (i.e., agents), we propose to infer hyperedges that adaptively connect multiple nodes to enable group-aware relational reasoning in an unsupervised manner without fixing the number of hyperedges. The proposed approach infers the dynamically evolving relation graphs and hypergraphs over time to capture the evolution of relations, which are used by the trajectory predictor to obtain future states. Moreover, we propose to regularize the smoothness of the relation evolution and the sparsity of the inferred graphs or hypergraphs, which effectively improves training stability and enhances the explainability of inferred relations. The proposed approach is validated on both synthetic crowd simulations and multiple real-world benchmark datasets. Our approach infers explainable, reasonable group-aware relations and achieves state-of-the-art performance in long-term prediction.

CVJul 19, 2024
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Seunggeun Chi, Hyung-gun Chi, Hengbo Ma et al.

We introduce the Multi-Motion Discrete Diffusion Models (M2D2M), a novel approach for human motion generation from textual descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M lies in its dynamic transition probability within the discrete diffusion model, which adapts transition probabilities based on the proximity between motion tokens, encouraging mixing between different modes. Complemented by a two-phase sampling strategy that includes independent and joint denoising steps, M2D2M effectively generates long-term, smooth, and contextually coherent human motion sequences, utilizing a model trained for single-motion generation. Extensive experiments demonstrate that M2D2M surpasses current state-of-the-art benchmarks for motion generation from text descriptions, showcasing its efficacy in interpreting language semantics and generating dynamic, realistic motions.

RODec 18, 2023
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion planning to address this problem. At the higher level, the skill abstraction module learns discrete, human-understandable skill representations from visual observations and language instructions. These learned skill embeddings are then used to condition the diffusion model to generate customized latent trajectories aligned with the skills. This allows generating diverse state trajectories that adhere to the learnable skills. By integrating skill learning with conditional trajectory generation, SkillDiffuser produces coherent behavior following abstract instructions across diverse tasks. Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser. More visualization results and information could be found on our website.

ROMar 26, 2024
CMP: Cooperative Motion Prediction with Multi-Agent Communication

Zehao Wang, Yuping Wang, Zhuoyuan Wu et al.

The confluence of the advancement of Autonomous Vehicles (AVs) and the maturity of Vehicle-to-Everything (V2X) communication has enabled the capability of cooperative connected and automated vehicles (CAVs). Building on top of cooperative perception, this paper explores the feasibility and effectiveness of cooperative motion prediction. Our method, CMP, takes LiDAR signals as model input to enhance tracking and prediction capabilities. Unlike previous work that focuses separately on either cooperative perception or motion prediction, our framework, to the best of our knowledge, is the first to address the unified problem where CAVs share information in both perception and prediction modules. Incorporated into our design is the unique capability to tolerate realistic V2X transmission delays, while dealing with bulky perception representations. We also propose a prediction aggregation module, which unifies the predictions obtained by different CAVs and generates the final prediction. Through extensive experiments and ablation studies on the OPV2V and V2V4Real datasets, we demonstrate the effectiveness of our method in cooperative perception, tracking, and motion prediction. In particular, CMP reduces the average prediction error by 12.3% compared with the strongest baseline. Our work marks a significant step forward in the cooperative capabilities of CAVs, showcasing enhanced performance in complex scenarios. More details can be found on the project website: https://cmp-cooperative-prediction.github.io.

ROJan 22, 2024
Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation

Jiachen Li, Chuanbo Hua, Jianpeng Yao et al.

Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. While modeling pairwise relations has been widely studied in multi-agent interacting systems, the ability to capture larger-scale group-wise activities is limited. In this paper, we propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures, and we demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation. In addition to the edges between pairs of nodes (i.e., agents), we propose to infer hyperedges that adaptively connect multiple nodes to enable group-wise reasoning in an unsupervised manner. Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states. Meanwhile, we propose to regularize the sharpness and sparsity of the learned relations and the smoothness of the relation evolution, which proves to enhance training stability and model performance. The proposed approach is validated on synthetic crowd simulations and real-world benchmark datasets. Experiments demonstrate that the approach infers reasonable relations and achieves state-of-the-art prediction performance. In addition, we present a deep reinforcement learning (DRL) framework for social robot navigation, which incorporates relational reasoning and trajectory prediction systematically. In a group-based crowd simulation, our method outperforms the strongest baseline by a significant margin in terms of safety, efficiency, and social compliance in dense, interactive scenarios. We also demonstrate the practical applicability of our method with real-world robot experiments. The code and videos can be found at https://relational-reasoning-nav.github.io/.

CVNov 5, 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

We study the problem of estimating the body movements of a camera wearer from egocentric videos. Current methods for ego-body pose estimation rely on temporally dense sensor data, such as IMU measurements from spatially sparse body parts like the head and hands. However, we propose that even temporally sparse observations, such as hand poses captured intermittently from egocentric videos during natural or periodic hand movements, can effectively constrain overall body motion. Naively applying diffusion models to generate full-body pose from head pose and sparse hand pose leads to suboptimal results. To overcome this, we develop a two-stage approach that decomposes the problem into temporal completion and spatial completion. First, our method employs masked autoencoders to impute hand trajectories by leveraging the spatiotemporal correlations between the head pose sequence and intermittent hand poses, providing uncertainty estimates. Subsequently, we employ conditional diffusion models to generate plausible full-body motions based on these temporally dense trajectories of the head and hands, guided by the uncertainty estimates from the imputation. The effectiveness of our method was rigorously tested and validated through comprehensive experiments conducted on various HMD setup with AMASS and Ego-Exo4D datasets.

SYJan 4, 2022
Learning Differentiable Safety-Critical Control using Control Barrier Functions for Generalization to Novel Environments

Hengbo Ma, Bike Zhang, Masayoshi Tomizuka et al.

Control barrier functions (CBFs) have become a popular tool to enforce safety of a control system. CBFs are commonly utilized in a quadratic program formulation (CBF-QP) as safety-critical constraints. A class $\mathcal{K}$ function in CBFs usually needs to be tuned manually in order to balance the trade-off between performance and safety for each environment. However, this process is often heuristic and can become intractable for high relative-degree systems. Moreover, it prevents the CBF-QP from generalizing to different environments in the real world. By embedding the optimization procedure of the exponential control barrier function based quadratic program (ECBF-QP) as a differentiable layer within a deep learning architecture, we propose a differentiable safety-critical control framework that enables generalization to new environments for high relative-degree systems with forward invariance guarantees. Finally, we validate the proposed control design with 2D double and quadruple integrator systems in various environments.

ROSep 13, 2021
Cross Domain Robot Imitation with Invariant Representation

Zhao-Heng Yin, Lingfeng Sun, Hengbo Ma et al.

Animals are able to imitate each others' behavior, despite their difference in biomechanics. In contrast, imitating the other similar robots is a much more challenging task in robotics. This problem is called cross domain imitation learning~(CDIL). In this paper, we consider CDIL on a class of similar robots. We tackle this problem by introducing an imitation learning algorithm based on invariant representation. We propose to learn invariant state and action representations, which aligns the behavior of multiple robots so that CDIL becomes possible. Compared with previous invariant representation learning methods for similar purpose, our method does not require human-labeled pairwise data for training. Instead, we use cycle-consistency and domain confusion to align the representation and increase its robustness. We test the algorithm on multiple robots in simulator and show that unseen new robot instances can be trained with existing expert demonstrations successfully. Qualitative results also demonstrate that the proposed method is able to learn similar representations for different robots with similar behaviors, which is essential for successful CDIL.

CVAug 3, 2021
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

Jiachen Li, Fan Yang, Hengbo Ma et al.

Motion forecasting plays a significant role in various domains (e.g., autonomous driving, human-robot interaction), which aims to predict future motion sequences given a set of historical observations. However, the observed elements may be of different levels of importance. Some information may be irrelevant or even distracting to the forecasting in certain situations. To address this issue, we propose a generic motion forecasting framework (named RAIN) with dynamic key information selection and ranking based on a hybrid attention mechanism. The general framework is instantiated to handle multi-agent trajectory prediction and human motion forecasting tasks, respectively. In the former task, the model learns to recognize the relations between agents with a graph representation and to determine their relative significance. In the latter task, the model learns to capture the temporal proximity and dependency in long-term human motions. We also propose an effective double-stage training pipeline with an alternating training strategy to optimize the parameters in different modules of the framework. We validate the framework on both synthetic simulations and motion forecasting benchmarks in different domains, demonstrating that our method not only achieves state-of-the-art forecasting performance, but also provides interpretable and reasonable hybrid attention weights.

CVJun 5, 2021
Spectral Temporal Graph Neural Network for Trajectory Prediction

Defu Cao, Jiachen Li, Hengbo Ma et al.

An effective understanding of the contextual environment and accurate motion forecasting of surrounding agents is crucial for the development of autonomous vehicles and social mobile robots. This task is challenging since the behavior of an autonomous agent is not only affected by its own intention, but also by the static environment and surrounding dynamically interacting agents. Previous works focused on utilizing the spatial and temporal information in time domain while not sufficiently taking advantage of the cues in frequency domain. To this end, we propose a Spectral Temporal Graph Neural Network (SpecTGNN), which can capture inter-agent correlations and temporal dependency simultaneously in frequency domain in addition to time domain. SpecTGNN operates on both an agent graph with dynamic state information and an environment graph with the features extracted from context images in two streams. The model integrates graph Fourier transform, spectral graph convolution and temporal gated convolution to encode history information and forecast future trajectories. Moreover, we incorporate a multi-head spatio-temporal attention mechanism to mitigate the effect of error propagation in a long time horizon. We demonstrate the performance of SpecTGNN on two public trajectory prediction benchmark datasets, which achieves state-of-the-art performance in terms of prediction accuracy.

ROApr 23, 2021
Speed Planning Using Bezier Polynomials with Trapezoidal Corridors

Jialun Li, Xiaojia Xie, Hengbo Ma et al.

To generate safe and real-time trajectories for an autonomous vehicle in dynamic environments, path and speed decoupled planning methods are often considered. This paper studies speed planning, which mainly deals with dynamic obstacle avoidance given the planning path. The main challenges lie in the decisions in non-convex space and the trade-off between safety, comfort and efficiency performances. This work uses dynamic programming to search heuristic waypoints on the S-T graph and to construct convex feasible spaces. Further, a piecewise Bezier polynomials optimization approach with trapezoidal corridors is presented, which theoretically guarantees the safety and optimality of the trajectory. The simulations verify the effectiveness of the proposed approach.

CVFeb 18, 2021
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking

Jiachen Li, Hengbo Ma, Zhihao Zhang et al.

An effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are indispensable for intelligent mobile systems (e.g. autonomous vehicles and social robots) to achieve safe and high-quality planning when they navigate in highly interactive and crowded scenarios. Due to the existence of frequent interactions and uncertainty in the scene evolution, it is desired for the prediction system to enable relational reasoning on different entities and provide a distribution of future trajectories for each agent. In this paper, we propose a generic generative neural system (called STG-DAT) for multi-agent trajectory prediction involving heterogeneous agents. The system takes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The constraint not only ensures physical feasibility but also enhances model performance. Moreover, the proposed prediction model can be easily adopted by multi-target tracking frameworks. The tracking accuracy proves to be improved by empirical results. The proposed system is evaluated on three public benchmark datasets for trajectory prediction, where the agents cover pedestrians, cyclists and on-road vehicles. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction and tracking accuracy.

ROAug 20, 2020
Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Liting Sun, Zheng Wu, Hengbo Ma et al.

In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.

CVFeb 14, 2020
Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network

Jiachen Li, Hengbo Ma, Zhihao Zhang et al.

Effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are indispensable for intelligent mobile systems (like autonomous vehicles and social robots) to achieve safe and high-quality planning when they navigate in highly interactive and crowded scenarios. Due to the existence of frequent interactions and uncertainty in the scene evolution, it is desired for the prediction system to enable relational reasoning on different entities and provide a distribution of future trajectories for each agent. In this paper, we propose a generic generative neural system (called Social-WaGDAT) for multi-agent trajectory prediction, which makes a step forward to explicit interaction modeling by incorporating relational inductive biases with a dynamic graph representation and leverages both trajectory and scene context information. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction which not only ensures physical feasibility but also enhances model performance. The proposed system is evaluated on three public benchmark datasets for trajectory prediction, where the agents cover pedestrians, cyclists and on-road vehicles. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction accuracy.

CVMay 5, 2019
Conditional Generative Neural System for Probabilistic Trajectory Prediction

Jiachen Li, Hengbo Ma, Masayoshi Tomizuka

Effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are critical for intelligent systems such as autonomous vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and high-quality decision making, motion planning and control. Due to the uncertain nature of the future, it is desired to make inference from a probability perspective instead of deterministic prediction. In this paper, we propose a conditional generative neural system (CGNS) for probabilistic trajectory prediction to approximate the data distribution, with which realistic, feasible and diverse future trajectory hypotheses can be sampled. The system combines the strengths of conditional latent space learning and variational divergence minimization, and leverages both static context and interaction information with soft attention mechanisms. We also propose a regularization method for incorporating soft constraints into deep neural networks with differentiable barrier functions, which can regulate and push the generated samples into the feasible regions. The proposed system is evaluated on several public benchmark datasets for pedestrian trajectory prediction and a roundabout naturalistic driving dataset collected by ourselves. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction accuracy.

ROMay 2, 2019
Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling

Jiachen Li, Hengbo Ma, Wei Zhan et al.

Coordination recognition and subtle pattern prediction of future trajectories play a significant role when modeling interactive behaviors of multiple agents. Due to the essential property of uncertainty in the future evolution, deterministic predictors are not sufficiently safe and robust. In order to tackle the task of probabilistic prediction for multiple, interactive entities, we propose a coordination and trajectory prediction system (CTPS), which has a hierarchical structure including a macro-level coordination recognition module and a micro-level subtle pattern prediction module which solves a probabilistic generation task. We illustrate two types of representation of the coordination variable: categorized and real-valued, and compare their effects and advantages based on empirical studies. We also bring the ideas of Bayesian deep learning into deep generative models to generate diversified prediction hypotheses. The proposed system is tested on multiple driving datasets in various traffic scenarios, which achieves better performance than baseline approaches in terms of a set of evaluation metrics. The results also show that using categorized coordination can better capture multi-modality and generate more diversified samples than the real-valued coordination, while the latter can generate prediction hypotheses with smaller errors with a sacrifice of sample diversity. Moreover, employing neural networks with weight uncertainty is able to generate samples with larger variance and diversity.

ROApr 4, 2019
Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning

Jiachen Li, Hengbo Ma, Masayoshi Tomizuka

In order to enable high-quality decision making and motion planning of intelligent systems such as robotics and autonomous vehicles, accurate probabilistic predictions for surrounding interactive objects is a crucial prerequisite. Although many research studies have been devoted to making predictions on a single entity, it remains an open challenge to forecast future behaviors for multiple interactive agents simultaneously. In this work, we take advantage of the Generative Adversarial Network (GAN) due to its capability of distribution learning and propose a generic multi-agent probabilistic prediction and tracking framework which takes the interactions among multiple entities into account, in which all the entities are treated as a whole. However, since GAN is very hard to train, we make an empirical research and present the relationship between training performance and hyperparameter values with a numerical case study. The results imply that the proposed model can capture both the mean, variance and multi-modalities of the groundtruth distribution. Moreover, we apply the proposed approach to a real-world task of vehicle behavior prediction to demonstrate its effectiveness and accuracy. The results illustrate that the proposed model trained by adversarial learning can achieve a better prediction performance than other state-of-the-art models trained by traditional supervised learning which maximizes the data likelihood. The well-trained model can also be utilized as an implicit proposal distribution for particle filtered based Bayesian state estimation.

AISep 9, 2018
Generic Probabilistic Interactive Situation Recognition and Prediction: From Virtual to Real

Jiachen Li, Hengbo Ma, Wei Zhan et al.

Accurate and robust recognition and prediction of traffic situation plays an important role in autonomous driving, which is a prerequisite for risk assessment and effective decision making. Although there exist a lot of works dealing with modeling driver behavior of a single object, it remains a challenge to make predictions for multiple highly interactive agents that react to each other simultaneously. In this work, we propose a generic probabilistic hierarchical recognition and prediction framework which employs a two-layer Hidden Markov Model (TLHMM) to obtain the distribution of potential situations and a learning-based dynamic scene evolution model to sample a group of future trajectories. Instead of predicting motions of a single entity, we propose to get the joint distribution by modeling multiple interactive agents as a whole system. Moreover, due to the decoupling property of the layered structure, our model is suitable for knowledge transfer from simulation to real world applications as well as among different traffic scenarios, which can reduce the computational efforts of training and the demand for a large data amount. A case study of highway ramp merging scenario is demonstrated to verify the effectiveness and accuracy of the proposed framework.