AINCOct 17, 2021

A model for full local image interpretation

arXiv:2110.08744v18 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental limitation in visual recognition for both human and computer vision models, though it appears incremental as it builds on existing top-down processing ideas.

The paper tackles the problem of detailed local image interpretation, which current models cannot achieve, and proposes a two-stage model where initial class activation triggers class-specific validation to recover richer and accurate scene interpretations.

We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene. Humans can identify in an image meaningful components almost everywhere, and identifying these components is an essential part of the visual process, and of understanding the surrounding scene and its potential meaning to the viewer. Detailed interpretation is beyond the scope of current models of visual recognition. Our model suggests that this is a fundamental limitation, related to the fact that existing models rely on feed-forward but limited top-down processing. In our model, a first recognition stage leads to the initial activation of class candidates, which is incomplete and with limited accuracy. This stage then triggers the application of class-specific interpretation and validation processes, which recover richer and more accurate interpretation of the visible scene. We discuss implications of the model for visual interpretation by humans and by computer vision models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes