CLAIQUANT-PHJun 27, 2025

Towards a Comparative Framework for Compositional AI Models

arXiv:2507.02940v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of systematically comparing compositional AI models for researchers in natural language processing and AI, though it is incremental as it builds on existing frameworks and tests.

The paper tackles the problem of evaluating compositional generalization and interpretability in AI models by proposing a framework-agnostic approach using category theory and adapting tests for compositional generalization. It compares quantum circuit-based models and classical neural networks on an extended bAbI dataset, finding that both architectures score within 5% on productivity and substitutivity tasks but differ by at least 10% on systematicity, with neural models more prone to overfitting.

The DisCoCirc framework for natural language processing allows the construction of compositional models of text, by combining units for individual words together according to the grammatical structure of the text. The compositional nature of a model can give rise to two things: compositional generalisation -- the ability of a model to generalise outside its training distribution by learning compositional rules underpinning the entire data distribution -- and compositional interpretability -- making sense of how the model works by inspecting its modular components in isolation, as well as the processes through which these components are combined. We present these notions in a framework-agnostic way using the language of category theory, and adapt a series of tests for compositional generalisation to this setting. Applying this to the DisCoCirc framework, we consider how well a selection of models can learn to compositionally generalise. We compare both quantum circuit based models, as well as classical neural networks, on a dataset derived from one of the bAbI tasks, extended to test a series of aspects of compositionality. Both architectures score within 5% of one another on the productivity and substitutivity tasks, but differ by at least 10% for the systematicity task, and exhibit different trends on the overgeneralisation tasks. Overall, we find the neural models are more prone to overfitting the Train data. Additionally, we demonstrate how to interpret a compositional model on one of the trained models. By considering how the model components interact with one another, we explain how the model behaves.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes