DIRECTOR: Generator-Classifiers For Supervised Language Modeling
This addresses the issue of harmful or low-quality text generation in language models for users and developers, representing a novel method rather than an incremental improvement.
The paper tackles the problem of language models generating toxic, repetitive, or contradictory outputs by introducing DIRECTOR, a unified generator-classifier architecture that trains jointly on language modeling and labeled sequence data. The result is competitive speed and superior performance in alleviating these issues while maintaining generation quality, outperforming existing guiding approaches in accuracy and efficiency.
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token. Training is conducted jointly using both standard language modeling data, and data labeled with desirable and undesirable sequences. Experiments in several settings show that the model has competitive training and decoding speed compared to standard language models while yielding superior results, alleviating known issues while maintaining generation quality. It also outperforms existing model guiding approaches in terms of both accuracy and efficiency.