CVNov 12, 2015

Basic Level Categorization Facilitates Visual Object Recognition

arXiv:1511.04103v316 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing CNN performance for computer vision tasks by incorporating insights from human cognition, though it is incremental as it builds on existing models like AlexNet.

The paper tackled the problem of improving visual object recognition in deep CNNs by linking them with principles from the human visual cortex, specifically using a strategy inspired by basic level categorization, and achieved a top-5 accuracy increase from 80.13% to 82.14% on the ILSVRC 2012 dataset.

Recent advances in deep learning have led to significant progress in the computer vision field, especially for visual object recognition tasks. The features useful for object classification are learned by feed-forward deep convolutional neural networks (CNNs) automatically, and they are shown to be able to predict and decode neural representations in the ventral visual pathway of humans and monkeys. However, despite the huge amount of work on optimizing CNNs, there has not been much research focused on linking CNNs with guiding principles from the human visual cortex. In this work, we propose a network optimization strategy inspired by both of the developmental trajectory of children's visual object recognition capabilities, and Bar (2003), who hypothesized that basic level information is carried in the fast magnocellular pathway through the prefrontal cortex (PFC) and then projected back to inferior temporal cortex (IT), where subordinate level categorization is achieved. We instantiate this idea by training a deep CNN to perform basic level object categorization first, and then train it on subordinate level categorization. We apply this idea to training AlexNet (Krizhevsky et al., 2012) on the ILSVRC 2012 dataset and show that the top-5 accuracy increases from 80.13% to 82.14%, demonstrating the effectiveness of the method. We also show that subsequent transfer learning on smaller datasets gives superior results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes