CVJul 20, 2023

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

arXiv:2307.10934v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the challenge of low-cost, accurate ranging for autonomous vehicles in unstructured traffic scenarios, representing an incremental improvement over existing monocular depth methods.

The paper tackles the problem of inaccurate depth estimation from monocular cameras in autonomous navigation by proposing OCTraN, a transformer-based architecture that converts 2D image features into 3D occupancy features, achieving improved accuracy without relying on expensive LiDAR sensors.

Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes