LGCVRONov 17, 2020

Modality-Buffet for Real-Time Object Detection

arXiv:2011.08726v13 citations
Originality Incremental advance
AI Analysis

This work is significant for robotics applications requiring real-time object detection on resource-constrained hardware, offering an incremental improvement in efficiency and accuracy.

This paper addresses real-time object detection on lightweight hardware by dynamically selecting from a portfolio of detectors. The method uses reinforcement learning to choose the best detector for each frame, achieving performance that exceeds any single detector on the Waymo Open Dataset.

Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks. Detectors using different modalities and with varying computational complexities offer different trade-offs. One option is to have a very lightweight model that can predict from all modalities at once for each frame. However, in some situations (e.g., in static scenes) it might be better to have a more complex but more accurate model and to extrapolate from previous predictions for the frames coming in at processing time. We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction. The objective of the RL agent is to maximize the accuracy of the predictions per image. We evaluate the approach on the Waymo Open Dataset and show that it exceeds the performance of each single detector.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes