SoK: Vehicle Orientation Representations for Deep Rotation Estimation
This work addresses the problem of improving orientation estimation in autonomous driving systems, but it is incremental as it primarily reviews and compares existing methods with a minor novel contribution.
The paper tackles the lack of systematic review in vehicle orientation prediction for 3D object detection by categorizing and comparing existing orientation representations on the KITTI dataset, finding that the 2D Cartesian-based representation achieves the highest accuracy, and proposes a new representation called Tricosine.
In recent years, there is an influx of deep learning models for 3D vehicle object detection. However, little attention was paid to orientation prediction. Existing research work proposed various vehicle orientation representation methods for deep learning, however a holistic, systematic review has not been conducted. Through our experiments, we categorize and compare the accuracy performance of various existing orientation representations using the KITTI 3D object detection dataset, and propose a new form of orientation representation: Tricosine. Among these, the 2D Cartesian-based representation, or Single Bin, achieves the highest accuracy, with additional channeled inputs (positional encoding and depth map) not boosting prediction performance. Our code is published on GitHub: https://github.com/umd-fire-coml/KITTI-orientation-learning