FoV-Net: Field-of-View Extrapolation Using Self-Attention and Uncertainty
This addresses scene prediction for autonomous vehicles and robots, enabling early planning with uncertainty, but it is incremental as it builds on existing extrapolation methods.
The paper tackles the problem of predicting a wider field-of-view scene from narrow video sequences, proposing FoV-Net to extrapolate temporally consistent views with interpretable pixel-level uncertainty, showing it outperforms existing alternatives in experiments.
The ability to make educated predictions about their surroundings, and associate them with certain confidence, is important for intelligent systems, like autonomous vehicles and robots. It allows them to plan early and decide accordingly. Motivated by this observation, in this paper we utilize information from a video sequence with a narrow field-of-view to infer the scene at a wider field-of-view. To this end, we propose a temporally consistent field-of-view extrapolation framework, namely FoV-Net, that: (1) leverages 3D information to propagate the observed scene parts from past frames; (2) aggregates the propagated multi-frame information using an attention-based feature aggregation module and a gated self-attention module, simultaneously hallucinating any unobserved scene parts; and (3) assigns an interpretable uncertainty value at each pixel. Extensive experiments show that FoV-Net does not only extrapolate the temporally consistent wide field-of-view scene better than existing alternatives, but also provides the associated uncertainty which may benefit critical decision-making downstream applications. Project page is at http://charliememory.github.io/RAL21_FoV.