LGMar 28

Semantic Interaction Information mediates compositional generalization in latent space

arXiv:2603.2713418.9h-index: 1

AI Analysis

For researchers in compositional generalization and meta-learning, this work provides a formal framework and metric (SII) to diagnose and address bottlenecks in learning latent variable interactions, though the results are demonstrated only in a synthetic gridworld environment.

The paper introduces Semantic Interaction Information (SII) to measure how latent variable interactions affect compositional generalization, showing that SII explains accuracy gaps in RNNs and reveals a failure mode where confidence decouples from accuracy. The authors then propose Representation Classification Chains (RCCs) to address the circular dependence between learning interactions and inference, demonstrating improved generalization to novel variable combinations.

Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) where observations are generated jointly by multiple latent variables, yet feedback is provided for only a single goal variable. This setting allows us to define Semantic Interaction Information (SII): a metric measuring the contribution of latent variable interactions to task performance. Using SII, we analyze Recurrent Neural Networks (RNNs) provided with these interactions, finding that SII explains the accuracy gap between Echo State and Fully Trained networks. Our analysis also uncovers a theoretically predicted failure mode where confidence decouples from accuracy, suggesting that utilizing interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions must be learned by an embedding model. Learning how latent variables interact requires accurate inference, yet accurate inference depends on knowing those interactions. The Cognitive Gridworld reveals this circular dependence as a core challenge for continual meta-learning. We approach this dilemma via Representation Classification Chains (RCCs), a JEPA-style architecture that disentangles these processes: variable inference and variable embeddings are learned by separate modules through Reinforcement Learning and self-supervised learning, respectively. Lastly, we demonstrate that RCCs facilitate compositional generalization to novel combinations of relevant variables. Together, these results establish a grounded setting for evaluating goal-directed generalist agents.

View on arXiv PDF

Similar