Negation-Induced Forgetting in LLMs
This work provides initial evidence of cognitive biases in LLMs, which is incremental for understanding memory phenomena in AI systems.
The study investigated whether Large Language Models (LLMs) exhibit negation-induced forgetting, a human cognitive bias where negating incorrect attributes reduces recall, and found that ChatGPT-3.5 shows this effect, GPT-4o mini has a marginal effect, and Llama3-70b-instruct does not.
The study explores whether Large Language Models (LLMs) exhibit negation-induced forgetting (NIF), a cognitive phenomenon observed in humans where negating incorrect attributes of an object or event leads to diminished recall of this object or event compared to affirming correct attributes (Mayo et al., 2014; Zang et al., 2023). We adapted Zang et al. (2023) experimental framework to test this effect in ChatGPT-3.5, GPT-4o mini and Llama3-70b-instruct. Our results show that ChatGPT-3.5 exhibits NIF, with negated information being less likely to be recalled than affirmed information. GPT-4o-mini showed a marginally significant NIF effect, while LLaMA-3-70B did not exhibit NIF. The findings provide initial evidence of negation-induced forgetting in some LLMs, suggesting that similar cognitive biases may emerge in these models. This work is a preliminary step in understanding how memory-related phenomena manifest in LLMs.