CVFeb 9, 2024

CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

arXiv:2402.06423v221 citationsh-index: 15IEEE transactions on intelligent transportation systems (Print)
Originality Incremental advance
AI Analysis

This addresses the problem of accurate 3D lane perception for autonomous driving systems, presenting an incremental improvement over existing methods.

The paper tackles 3D lane detection from monocular cameras by proposing CurveFormer++, a single-stage Transformer method that avoids the challenging bird's-eye-view transformation used in prior approaches. The results show outstanding performance on two real-world datasets compared to CNN and Transformer baselines.

In autonomous driving, accurate 3D lane detection using monocular cameras is important for downstream tasks. Recent CNN and Transformer approaches usually apply a two-stage model design. The first stage transforms the image feature from a front image into a bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the BEV feature to generate the 3D detection results. However, these approaches heavily rely on a challenging image feature transformation module from a perspective view to a BEV representation. In our work, we present CurveFormer++, a single-stage Transformer-based method that does not require the view transform module and directly infers 3D lane results from the perspective image features. Specifically, our approach models the 3D lane detection task as a curve propagation problem, where each lane is represented by a curve query with a dynamic and ordered anchor point set. By employing a Transformer decoder, the model can iteratively refine the 3D lane results. A curve cross-attention module is introduced to calculate similarities between image features and curve queries. To handle varying lane lengths, we employ context sampling and anchor point restriction techniques to compute more relevant image features. Furthermore, we apply a temporal fusion module that incorporates selected informative sparse curve queries and their corresponding anchor point sets to leverage historical information. In the experiments, we evaluate our approach on two publicly real-world datasets. The results demonstrate that our method provides outstanding performance compared with both CNN and Transformer based methods. We also conduct ablation studies to analyze the impact of each component.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes