CLJun 8, 2021

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

Jacob Parnell, Inigo Jauregi Unanue, Massimo Piccardi

arXiv:2106.04080v131.5711 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of optimizing summarization models for better performance, though it is incremental as it builds on existing reinforcement learning approaches.

The paper tackles the problem of improving abstractive summarization by proposing two new reward functions for reinforcement learning, which consistently outperform negative log-likelihood baselines across nine diverse datasets.

To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.

View on arXiv PDF

Similar