Mingxu Wang

CV
3papers
117citations
Novelty60%
AI Score44

3 Papers

CVSep 20, 2022
GANet: Goal Area Network for Motion Forecasting

Mingkun Wang, Xinge Zhu, Changqian Yu et al.

Predicting the future motion of road participants is crucial for autonomous driving but is extremely challenging due to staggering motion uncertainty. Recently, most motion forecasting methods resort to the goal-based strategy, i.e., predicting endpoints of motion trajectories as conditions to regress the entire trajectories, so that the search space of solution can be reduced. However, accurate goal coordinates are hard to predict and evaluate. In addition, the point representation of the destination limits the utilization of a rich road context, leading to inaccurate prediction results in many cases. Goal area, i.e., the possible destination area, rather than goal coordinate, could provide a more soft constraint for searching potential trajectories by involving more tolerance and guidance. In view of this, we propose a new goal area-based framework, named Goal Area Network (GANet), for motion forecasting, which models goal areas rather than exact goal coordinates as preconditions for trajectory prediction, performing more robustly and accurately. Specifically, we propose a GoICrop (Goal Area of Interest) operator to effectively extract semantic lane features in goal areas and model actors' future interactions, which benefits a lot for future trajectory estimations. GANet ranks the 1st on the leaderboard of Argoverse Challenge among all public literature (till the paper submission), and its source codes will be released.

82.9AIMay 9
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents

Ying Chen, Rui Jiang, Lihuang Fang et al.

Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures--never completing the task, completing it but failing to stop, and reporting success without sufficient evidence--collapse into the same benchmark failure. We introduce VIGIL, an evaluation framework that makes terminal commitment independently measurable. Under VIGIL's default protocol, agents observe only egocentric RGB, receive no action-success signals, and must end each episode with a semantic report checked deterministically against hidden world state. This yields two separate scores: world-state completion (W) and benchmark success (B), where B additionally requires a correct terminal report. This decoupling makes four outcome categories distinguishable: missed execution, post-attainment drift, unsupported commitment, and verified success. Across 20 models on 1,000 frozen episodes, systems with comparable W differ by up to 19.7 pp in B: one model converts achieved states into correct reports, while another with near-identical execution drifts past the goal without closing. An action-feedback intervention further tests the separation: execution-oriented signals improve W broadly, yet commitment failures persist in models that do not already ground terminal reports in the achieved state. VIGIL provides a protocol that makes terminal commitment independently visible and scorable.

CVJun 20, 2024
FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

Mingkun Wang, Xiaoguang Ren, Ruochun Jin et al.

Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to enhance subsequent forecasting. Additionally, most previous motion forecasting works have focused on predicting independent futures for each agent. However, safe and smooth autonomous driving requires accurately predicting the diverse future behaviors of numerous surrounding agents jointly in complex dynamic environments. Given that all agents occupy certain potential travel spaces and possess lane driving priority, we propose Lane Occupancy Field (LOF), a new representation with lane semantics for motion forecasting in autonomous driving. LOF can simultaneously capture the joint probability distribution of all road participants' future spatial-temporal positions. Due to the high compatibility between lane occupancy field prediction and trajectory prediction, we propose a novel network with future context encoding for the joint prediction of these two tasks. Our approach ranks 1st on two large-scale motion forecasting benchmarks: Argoverse 1 and Argoverse 2.