27.7CVMay 20
Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-ExpertsGene Tangtartharakul, Katherine R. Storrs
Mixture-of-Experts (MoE) models are often interpreted by analysing which categories are routed to which experts. However, routing alone does not reveal what each expert actually encodes. We train sparsely-gated convolutional MoE models with a contrastive objective on natural images and characterise expert specialisation using tools from visual neuroscience. Extending from gating-level to expert-level analyses, we measure per-expert category separability, and per-expert tuning using the most exciting inputs. Extending from category-level to feature-level explanations, we interpret tuning via semantic dimensions derived from a dataset of human behavioural judgements (THINGS). Finally, we use tuning and representational similarity analysis to assess the stability of expertise-allocation across independent initialisations. We find that an animate-inanimate distinction dominates expert partitioning, apparent from gating through to expert readout, and is stable across independently trained models. Although routing statistics suggest relatively sparse, categorical preferences, expert analyses reveal broader tuning to continuous visual and semantic dimensions that extend beyond category boundaries. Experts exhibit similar category-separability to one another, despite distinct feature tuning, demonstrating the explanatory benefits of moving beyond category-level analyses. Together, these results show that expert specialisation in vision MoEs extends well beyond category routing and is better understood by probing fine-grained expert-level tuning and representational structure.
50.0NCMay 12
Human face perception reflects inverse-generative and naturalistic discriminative objectivesWenxuan Guo, Heiko H. Schütt, Kamila Maria Jozwik et al.
The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human judgments. Furthermore, models trained on natural images typically outperformed synthetic-trained counterparts. Together, these findings suggest that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.
GRMar 26, 2024
Predicting Perceived Gloss: Do Weak Labels Suffice?Julia Guerrero-Viu, J. Daniel Subias, Ana Serrano et al.
Estimating perceptual attributes of materials directly from images is a challenging task due to their complex, not fully-understood interactions with external factors, such as geometry and lighting. Supervised deep learning models have recently been shown to outperform traditional approaches, but rely on large datasets of human-annotated images for accurate perception predictions. Obtaining reliable annotations is a costly endeavor, aggravated by the limited ability of these models to generalise to different aspects of appearance. In this work, we show how a much smaller set of human annotations ("strong labels") can be effectively augmented with automatically derived "weak labels" in the context of learning a low-dimensional image-computable gloss metric. We evaluate three alternative weak labels for predicting human gloss perception from limited annotated data. Incorporating weak labels enhances our gloss prediction beyond the current state of the art. Moreover, it enables a substantial reduction in human annotation costs without sacrificing accuracy, whether working with rendered images or real photographs.
CVApr 15, 2025
Visual Language Models show widespread visual deficits on neuropsychological testsGene Tangtartharakul, Katherine R. Storrs
Visual Language Models (VLMs) show remarkable performance in visual reasoning tasks, successfully tackling college-level challenges that require high-level understanding of images. However, some recent reports of VLMs struggling to reason about elemental visual concepts like orientation, position, continuity, and occlusion suggest a potential gulf between human and VLM vision. Here we use the toolkit of neuropsychology to systematically assess the capabilities of three state-of-the-art VLMs across visual domains. Using 51 tests drawn from six clinical and experimental batteries, we characterise the visual abilities of leading VLMs relative to normative performance in healthy adults. While the models excel in straightforward object recognition tasks, we find widespread deficits in low- and mid-level visual abilities that would be considered clinically significant in humans. These selective deficits, profiled through validated test batteries, suggest that an artificial system can achieve complex object recognition without developing foundational visual concepts that in humans require no explicit training.
NCMar 4, 2019
Deep Learning for Cognitive NeuroscienceKatherine R. Storrs, Nikolaus Kriegeskorte
Neural network models can now recognise images, understand text, translate languages, and play many human games at human or superhuman levels. These systems are highly abstracted, but are inspired by biological brains and use only biologically plausible computations. In the coming years, neural networks are likely to become less reliant on learning from massive labelled datasets, and more robust and generalisable in their task performance. From their successes and failures, we can learn about the computational requirements of the different tasks at which brains excel. Deep learning also provides the tools for testing cognitive theories. In order to test a theory, we need to realise the proposed information-processing system at scale, so as to be able to assess its feasibility and emergent behaviours. Deep learning allows us to scale up from principles and circuit models to end-to-end trainable models capable of performing complex tasks. There are many levels at which cognitive neuroscientists can use deep learning in their work, from inspiring theories to serving as full computational models. Ongoing advances in deep learning bring us closer to understanding how cognition and perception may be implemented in the brain -- the grand challenge at the core of cognitive neuroscience.