Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
This addresses a bottleneck in long-context modeling for LLMs, offering an incremental improvement for applications requiring extended text processing.
The paper tackles the problem of performance degradation in Transformer-based large language models when handling long-term contexts by proposing a method that inserts sentinel tokens to summarize information from text chunks, resulting in validated superiority on language modeling and downstream tasks.
Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.