StretchBEV: Stretching Future Instance Prediction Spatially and Temporally
This addresses the inherent uncertainty in predicting agent locations and motions over longer time horizons for self-driving systems, representing an incremental improvement in stochastic modeling.
The paper tackles the problem of degrading quality in long-term future instance prediction for self-driving by introducing a stochastic temporal model that learns dynamics in a latent space with stochastic residual updates, resulting in more diverse and accurate predictions, with improvements in spatial and temporal coverage compared to previous work.
In self-driving, predicting future in terms of location and motion of all the agents around the vehicle is a crucial requirement for planning. Recently, a new joint formulation of perception and prediction has emerged by fusing rich sensory information perceived from multiple cameras into a compact bird's-eye view representation to perform prediction. However, the quality of future predictions degrades over time while extending to longer time horizons due to multiple plausible predictions. In this work, we address this inherent uncertainty in future predictions with a stochastic temporal model. Our model learns temporal dynamics in a latent space through stochastic residual updates at each time step. By sampling from a learned distribution at each time step, we obtain more diverse future predictions that are also more accurate compared to previous work, especially stretching both spatially further regions in the scene and temporally over longer time horizons. Despite separate processing of each time step, our model is still efficient through decoupling of the learning of dynamics and the generation of future predictions.