Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
This work addresses a critical evaluation gap for autonomous driving systems, offering a more practical metric to select predictors that enhance safety and decision-making, though it is incremental in improving existing evaluation practices.
The paper tackles the problem that current error-based metrics for trajectory predictors in autonomous driving fail to capture their actual impact on self-driving vehicle performance, especially in complex interactive scenarios, and proposes a scenario-driven evaluation pipeline that combines accuracy and diversity, showing it yields more reasonable evaluations than traditional metrics.
Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.