Joint Repetition Suppression and Content Moderation of Large Language Models
This addresses content moderation and output quality issues for LLM applications like writing assistance, though it appears incremental as it builds on existing unlikelihood training frameworks.
The paper tackles the problems of offensive content replication and repetitive outputs in large language models by applying non-exact repetition suppression and extending unlikelihood training to jointly avoid generating offensive words and phrases, demonstrating exceptional performance in controlling repetition and content quality.
Natural language generation (NLG) is one of the most impactful fields in NLP, and recent years have witnessed its evolution brought about by large language models (LLMs). As the key instrument for writing assistance applications, they are generally prone to replicating or extending offensive content provided in the input. In low-resource data regime, they can also lead to repetitive outputs. Usually, offensive content and repetitions are mitigated with post-hoc methods, including n-gram level blocklists, top-k and nucleus sampling. In this paper, we apply non-exact repetition suppression using token and sequence level unlikelihood loss, and further explore the framework of unlikelihood training objective in order to jointly endow the model with abilities to avoid generating offensive words and phrases from the beginning. Finally, with comprehensive experiments, we demonstrate that our proposed methods work exceptionally in controlling the repetition and content quality of LLM outputs.