LG CLMay 7, 2024

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell

arXiv:2405.04669v225.435 citationsh-index: 13Has CodeNIPS

Originality Incremental advance

AI Analysis

This provides a theoretical explanation for a specific failure mode in LLMs, which is incremental as it builds on known issues but offers new insights into training dynamics.

The paper tackles the 'reversal curse' in auto-regressive LLMs, where models fail to infer logically equivalent reversed statements, by theoretically analyzing training dynamics in simplified models and showing it results from weight asymmetry, with experiments validating the theory on multi-layer transformers.

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on '$A \to B$' (e.g., 'Tom is the parent of John'), LLM fails to directly conclude '$B \gets A$' (e.g., 'John is the child of Tom') during inference even if the two sentences are semantically identical, which is known as the 'reversal curse'. In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights 'asymmetry', i.e., the increase of weights from a token $A$ to token $B$ during training does not necessarily cause the increase of the weights from $B$ to $A$, which is caused by the training dynamics under certain choice of loss function and the optimization space of model parameters. Moreover, our analysis can be naturally applied to other logical reasoning tasks such as chain-of-thought (COT), which provides a new perspective different from previous work that focuses on expressivity. Finally, we conduct experiments to validate our theory on multi-layer transformers under different settings. Our code is available at https://github.com/marlo-z/reversal_curse_analysis/.

View on arXiv PDF Code

Similar