LG CV RONov 17, 2020

Modality-Buffet for Real-Time Object Detection

Nicolai Dorka, Johannes Meyer, Wolfram Burgard

arXiv:2011.08726v13.33 citations

Originality Incremental advance

AI Analysis

This work is significant for robotics applications requiring real-time object detection on resource-constrained hardware, offering an incremental improvement in efficiency and accuracy.

This paper addresses real-time object detection on lightweight hardware by dynamically selecting from a portfolio of detectors. The method uses reinforcement learning to choose the best detector for each frame, achieving performance that exceeds any single detector on the Waymo Open Dataset.

Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different trade-offs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g., in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in at processing time. We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach on the Waymo Open Dataset and show that it exceeds the performance of each single detector.

View on arXiv PDF

Similar