Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
This work addresses a foundational issue in AI by identifying a key weakness in current machine learning systems that affects their ability to generalize efficiently, with implications for improving data usage and adaptability in various applications.
The paper tackles the problem of poor generalization in parametric machine learning systems by proposing that latent learning, enabled through episodic memory and retrieval mechanisms, can complement these systems to improve data efficiency and flexibility. The results show that a system with an oracle retrieval mechanism achieves better generalization across challenges like the reversal curse in language modeling and agent-based navigation.
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of parametric machine learning systems is their failure to exhibit latent learning -- learning information that is not relevant to the task at hand, but that might be useful in a future task. We show how this perspective links failures ranging from the reversal curse in language modeling to new findings on agent-based navigation. We then highlight how cognitive science points to episodic memory as a potential part of the solution to these issues. Correspondingly, we show that a system with an oracle retrieval mechanism can use learning experiences more flexibly to generalize better across many of these challenges. We also identify some of the essential components for effectively using retrieval, including the importance of within-example in-context learning for acquiring the ability to use information across retrieved examples. In summary, our results illustrate one possible contributor to the relative data inefficiency of current machine learning systems compared to natural intelligence, and help to understand how retrieval methods can complement parametric learning to improve generalization. We close by discussing some of the links between these findings and prior results in cognitive science and neuroscience, and the broader implications.