CV LG NCNov 20, 2019

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

Katherine L. Hermann, Ting Chen, Simon Kornblith

arXiv:1911.09071v331.189 citations

Originality Incremental advance

AI Analysis

This addresses a key limitation in computer vision models for applications requiring human-like shape recognition, though it is incremental in modifying training data rather than proposing new architectures.

The study investigated why ImageNet-trained CNNs favor texture over shape in classification, finding that data augmentation (e.g., less aggressive cropping and naturalistic augmentations) significantly reduces texture bias, enabling models to classify by shape most of the time and improve out-of-distribution performance.

Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see.

View on arXiv PDF

Similar