LGJan 31, 2025

A Compressive-Expressive Communication Framework for Compositional Representations

arXiv:2501.19182v31 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling AI systems to generalize compositionally, which is crucial for language and cognitive tasks, though it appears incremental by building on existing theories of language emergence.

The paper tackles the problem of compositional generalization in deep neural networks by introducing CELEBI, a self-supervised framework that uses a communication game to induce compositionality in learned representations. It significantly improves efficiency and compositionality on Shapes3D and MPI3D datasets, surpassing prior methods in reconstruction accuracy and topographic similarity.

Compositional generalization--the ability to interpret novel combinations of familiar elements--is a hallmark of human cognition and language. Despite recent advances, deep neural networks still struggle to acquire this property reliably. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositionality in learned representations from pre-trained models, through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, interactive decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, a reconstruction-based imitation phase, inspired by iterated learning, trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, pairwise distance maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes