Memory Is All You Need: Testing How Model Memory Affects LLM Performance in Annotation Tasks
This work addresses the problem of inefficient and low-performance LLM-based annotation for applied researchers, offering incremental improvements through memory integration.
The study investigated whether allowing LLMs to retain memory of their own previous annotations improves performance in text annotation tasks, finding that it yields significant improvements of 5-25% over zero-shot and few-shot learning, with a novel memory reinforcement approach providing additional gains in most tests.
Generative Large Language Models (LLMs) have shown promising results in text annotation using zero-shot and few-shot learning. Yet these approaches do not allow the model to retain information from previous annotations, making each response independent from the preceding ones. This raises the question of whether model memory -- the LLM having knowledge about its own previous annotations in the same task -- affects performance. In this article, using OpenAI's GPT-4o and Meta's Llama 3.1 on two political science datasets, we demonstrate that allowing the model to retain information about its own previous classifications yields significant performance improvements: between 5 and 25\% when compared to zero-shot and few-shot learning. Moreover, memory reinforcement, a novel approach we propose that combines model memory and reinforcement learning, yields additional performance gains in three out of our four tests. These findings have important implications for applied researchers looking to improve performance and efficiency in LLM annotation tasks.