Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs
This work addresses the need for interpretability in LLMs by providing a mathematical framework to disentangle memorization from reasoning, which is incremental as it builds on existing understanding of model behaviors.
The study tackled the problem of quantifying memorization and in-context reasoning effects in large language models by proposing an axiomatic system to define and decompose these effects, resulting in a method that enables straightforward examination of detailed inference patterns encoded by LLMs.
In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation. These effects are formulated as non-linear interactions between tokens/words encoded by the LLM. Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects, and further classify in-context reasoning effects into enhanced inference patterns, eliminated inference patterns, and reversed inference patterns. Besides, the decomposed effects satisfy the sparsity property and the universal matching property, which mathematically guarantee that the LLM's confidence score can be faithfully decomposed into the memorization effects and in-context reasoning effects. Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.