CLLGJul 30, 2019

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

arXiv:1907.12894v127 citations
Originality Highly original
AI Analysis

This addresses the problem of high computational cost in summarization for NLP researchers, offering a more efficient alternative to existing methods.

The paper tackles the inefficiency of reinforcement learning in extractive document summarization by proposing RELIS, a novel paradigm that learns input-specific policies using learned rewards, reducing training time by two orders of magnitude while matching state-of-the-art performance.

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes