Configural processing as an optimized strategy for robust object recognition in neural networks
This provides neurocomputational evidence for how configural processing emerges in networks to enhance object recognition robustness, but it is incremental as it builds on existing research in visual perception.
The study tackled the problem of understanding why configural processing (perceiving spatial relationships among object components) is crucial for object recognition by testing neural network models on letter and face stimuli, finding that configural cues yielded more robust performance to transformations like rotation or scaling, with up to improved robustness in identification tasks.
Configural processing, the perception of spatial relationships among an object's components, is crucial for object recognition. However, the teleology and underlying neurocomputational mechanisms of such processing are still elusive, notwithstanding decades of research. We hypothesized that processing objects via configural cues provides a more robust means to recognizing them relative to local featural cues. We evaluated this hypothesis by devising identification tasks with composite letter stimuli and comparing different neural network models trained with either only local or configural cues available. We found that configural cues yielded more robust performance to geometric transformations such as rotation or scaling. Furthermore, when both features were simultaneously available, configural cues were favored over local featural cues. Layerwise analysis revealed that the sensitivity to configural cues emerged later relative to local feature cues, possibly contributing to the robustness to pixel-level transformations. Notably, this configural processing occurred in a purely feedforward manner, without the need for recurrent computations. Our findings with letter stimuli were successfully extended to naturalistic face images. Thus, our study provides neurocomputational evidence that configural processing emerges in a naïve network based on task contingencies, and is beneficial for robust object processing under varying viewing conditions.