LG AI CLOct 17, 2022

Teacher Forcing Recovers Reward Functions for Text Generation

arXiv:2210.08708v213.021 citationsh-index: 35Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of sparse and task-specific rewards in RL for text generation, offering a more generalizable solution for researchers and practitioners in natural language processing.

The paper tackles the problem of designing effective reward functions for reinforcement learning in text generation by proposing a task-agnostic approach that derives step-wise rewards from teacher-forced models, and it shows empirical outperformance over self-training and reward regression methods on several tasks.

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

View on arXiv PDF Code

Similar