BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward
This addresses the need for scalable sequence evaluation metrics in machine learning, offering an incremental improvement over existing n-gram methods.
The paper tackled the problem of evaluating generated sequences by proposing a scalable reward function based on BERT embeddings instead of n-gram counts, showing it provides a more effective learning signal for unconditional generation in reinforcement learning.
Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model architectures, metrics that scale independently of the number of references are still based on n-gram estimates. We show that the underlying operations, counting words and comparing counts, can be lifted to embedding words and comparing embeddings. An in-depth analysis of BERT embeddings shows empirically that contextual embeddings can be employed to capture the required dependencies while maintaining the necessary scalability through appropriate pruning and smoothing techniques. We cast unconditional generation as a reinforcement learning problem and show that our reward function indeed provides a more effective learning signal than n-gram reward in this challenging setting.