YOLOPoint Joint Keypoint and Object Detection
This addresses the need for GNSS-independent SLAM and visual odometry in camera-based vehicle systems, though it appears incremental as it combines existing methods.
The paper tackles the problem of enabling intelligent vehicles to understand their surroundings by proposing YOLOPoint, a CNN model that simultaneously detects keypoints and objects in images, achieving competitive performance on HPatches and KITTI benchmarks.
Intelligent vehicles of the future must be capable of understanding and navigating safely through their surroundings. Camera-based vehicle systems can use keypoints as well as objects as low- and high-level landmarks for GNSS-independent SLAM and visual odometry. To this end we propose YOLOPoint, a convolutional neural network model that simultaneously detects keypoints and objects in an image by combining YOLOv5 and SuperPoint to create a single forward-pass network that is both real-time capable and accurate. By using a shared backbone and a light-weight network structure, YOLOPoint is able to perform competitively on both the HPatches and KITTI benchmarks.