CVMar 5, 2019

Distinguishing mirror from glass: A 'big data' approach to material perception

arXiv:1903.01671v112 citations
Originality Incremental advance
AI Analysis

This addresses material perception in vision science, showing incremental progress by questioning the general applicability of neural networks as models of human vision.

The study tackled the challenge of distinguishing mirrors from glass by training thousands of convolutional neural networks on over 750,000 simulated images, achieving high accuracy but only up to 0.6 correlation with human judgments, below inter-human levels.

Visually identifying materials is crucial for many tasks, yet material perception remains poorly understood. Distinguishing mirror from glass is particularly challenging as both materials derive their appearance from their surroundings, yet we rarely experience difficulties telling them apart. Here we took a 'big data' approach to uncovering the underlying visual cues and processes, leveraging recent advances in neural network models of vision. We trained thousands of convolutional neural networks on >750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on 'hand-engineered' image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. To tease the models apart, we then painstakingly assembled a diagnostic image set for which humans make highly systematic errors, allowing us to decouple accuracy from human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow networks predicted human judgments better than any other models. However, surprisingly, no network correlated better than 0.6 with humans (below inter-human correlations). Thus, although the model sets new standards for simulating human vision in a challenging material perception task, the results cast doubt on recent claims that such architectures are generally good models of human vision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes