CVJan 16, 2022

YOLO -- You only look 10647 times

arXiv:2201.06159v27 citations
AI Analysis

This work offers a clearer understanding of YOLO's mechanism for researchers in computer vision, though it is incremental as it reframes an existing method without introducing new performance gains.

The paper reinterprets YOLO as a parallel classification of 10647 fixed region proposals, bridging the conceptual gap between single-stage, two-stage, and classification models, and provides interactive tools for visualizing YOLO's processing streams.

With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like single-stage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification models. In addition, we created interactive exploration tools for a better visual understanding of the YOLO information processing streams: https://limchr.github.io/yolo_visualization

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes