CV AIAug 17, 2025

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers

Felipe Carlos dos Santos, Eric Aislan Antonelo, Gustavo Claudio Karl Couto

arXiv:2508.12520v13.6h-index: 15Anais do XXII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2025)

Originality Synthesis-oriented

AI Analysis

This addresses perception challenges for autonomous vehicles, but it's an incremental application of existing methods to a specific domain.

The study tackled generating Bird's-Eye View maps for autonomous vehicles by using Cross-View Transformers to map camera images to road, lane markings, and planned trajectory channels, achieving robust performance with a four-camera setup and L1 loss in unseen towns.

Bird's-Eye View (BEV) maps provide a structured, top-down abstraction that is crucial for autonomous-driving perception. In this work, we employ Cross-View Transformers (CVT) for learning to map camera images to three BEV's channels - road, lane markings, and planned trajectory - using a realistic simulator for urban driving. Our study examines generalization to unseen towns, the effect of different camera layouts, and two loss formulations (focal and L1). Using training data from only a town, a four-camera CVT trained with the L1 loss delivers the most robust test performance, evaluated in a new town. Overall, our results underscore CVT's promise for mapping camera inputs to reasonably accurate BEV maps.

View on arXiv PDF

Similar