CVAIAug 17, 2025

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers

arXiv:2508.12520v1h-index: 15Anais do XXII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2025)
Originality Synthesis-oriented
AI Analysis

This addresses perception challenges for autonomous vehicles, but it's an incremental application of existing methods to a specific domain.

The study tackled generating Bird's-Eye View maps for autonomous vehicles by using Cross-View Transformers to map camera images to road, lane markings, and planned trajectory channels, achieving robust performance with a four-camera setup and L1 loss in unseen towns.

Bird's-Eye View (BEV) maps provide a structured, top-down abstraction that is crucial for autonomous-driving perception. In this work, we employ Cross-View Transformers (CVT) for learning to map camera images to three BEV's channels - road, lane markings, and planned trajectory - using a realistic simulator for urban driving. Our study examines generalization to unseen towns, the effect of different camera layouts, and two loss formulations (focal and L1). Using training data from only a town, a four-camera CVT trained with the L1 loss delivers the most robust test performance, evaluated in a new town. Overall, our results underscore CVT's promise for mapping camera inputs to reasonably accurate BEV maps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes