AISep 12, 2023
The Relational Bottleneck as an Inductive Bias for Efficient AbstractionTaylor W. Webb, Steven M. Frankland, Awni Altabaa et al.
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. In that approach, neural networks are constrained via their architecture to focus on relations between perceptual inputs, rather than the attributes of individual inputs. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
AIOct 31, 2024
Understanding the Limits of Vision Language Models Through the Lens of the Binding ProblemDeclan Campbell, Sunayana Rane, Tyler Giallanza et al.
Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and simple forms of visual analogy -- that humans perform with near perfect accuracy. To better understand this puzzling pattern of successes and failures, we turn to theoretical accounts of the binding problem in cognitive science and neuroscience, a fundamental problem that arises when a shared set of representational resources must be used to represent distinct entities (e.g., to represent multiple objects in an image), necessitating the use of serial processing to avoid interference. We find that many of the puzzling failures of state-of-the-art VLMs can be explained as arising due to the binding problem, and that these failure modes are strikingly similar to the limitations exhibited by rapid, feedforward processing in the human brain.
LGJun 1, 2025
Bound by semanticity: universal laws governing the generalization-identification tradeoffMarco Nurisso, Jesseba Fernando, Raj Deshpande et al.
Intelligent systems must deploy internal representations that are simultaneously structured -- to support broad generalization -- and selective -- to preserve input identity. We expose a fundamental limit on this tradeoff. For any model whose representational similarity between inputs decays with finite semantic resolution $\varepsilon$, we derive closed-form expressions that pin its probability of correct generalization $p_S$ and identification $p_I$ to a universal Pareto front independent of input space geometry. Extending the analysis to noisy, heterogeneous spaces and to $n>2$ inputs predicts a sharp $1/n$ collapse of multi-input processing capacity and a non-monotonic optimum for $p_S$. A minimal ReLU network trained end-to-end reproduces these laws: during learning a resolution boundary self-organizes and empirical $(p_S,p_I)$ trajectories closely follow theoretical curves for linearly decaying similarity. Finally, we demonstrate that the same limits persist in two markedly more complex settings -- a convolutional neural network and state-of-the-art vision-language models -- confirming that finite-resolution similarity is a fundamental emergent informational constraint, not merely a toy-model artifact. Together, these results provide an exact theory of the generalization-identification trade-off and clarify how semantic resolution shapes the representational capacity of deep networks and brains alike.
CVJul 9, 2020
Learning Representations that Support ExtrapolationTaylor W. Webb, Zachary Dulberg, Steven M. Frankland et al.
Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to interpolation between data points in their training corpora. In this paper, we consider the challenge of learning representations that support extrapolation. We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation as a function of distance from the convex domain defined by the training data. We also introduce a simple technique, temporal context normalization, that encourages representations that emphasize the relations between objects. We find that this technique enables a significant improvement in the ability to extrapolate, considerably outperforming a number of competitive techniques.