Tang Kun

CVAug 8, 2023Code

LATR: 3D Lane Detection from Monocular Images with Transformer

Yueru Luo, Chaoda Zheng, Xu Yan et al.

3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo, realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR .

CVSep 13, 2022

M$^2$-3DLaneNet: Exploring Multi-Modal 3D Lane Detection

Yueru Luo, Xu Yan, Chaoda Zheng et al.

Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature. Previous works mainly focused on using images for 3D lane detection, leading to inherent projection error and loss of geometry information. To address these issues, we explore the potential of leveraging LiDAR for 3D lane detection, either as a standalone method or in combination with existing monocular approaches. In this paper, we propose M$^2$-3DLaneNet to integrate complementary information from multiple sensors. Specifically, M$^2$-3DLaneNet lifts 2D features into 3D space by incorporating geometry information from LiDAR data through depth completion. Subsequently, the lifted 2D features are further enhanced with LiDAR features through cross-modality BEV fusion. Extensive experiments on the large-scale OpenLane dataset demonstrate the effectiveness of M$^2$-3DLaneNet, regardless of the range (75m or 100m).

Tang Kun

2 Papers