Classifiers are Better Experts for Controllable Text Generation
This addresses the problem of generating text with specific attributes (e.g., non-toxic or desired sentiment) for users in NLP applications, offering a simpler and more effective approach compared to existing methods.
The paper tackles controllable text generation by proposing CAIF sampling, a method that uses a free-form classifier to weight language model logits, and shows it significantly outperforms recent methods like PPLM, GeDi, and DExperts in toxicity avoidance and sentiment control tasks, with improvements in PPL and task accuracy metrics.
This paper proposes a simple method for controllable text generation based on weighting logits with a free-form classifier, namely CAIF sampling. Using an arbitrary text classifier, we adjust a small part of a language model's logits and guide text generation towards or away from classifier prediction. We experimented with toxicity avoidance and sentiment control tasks and showed that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and task accuracy metrics based on the external classifier of generated texts. In addition, compared to other approaches, it is easier to implement and tune and has significantly fewer restrictions and requirements.