LGCLMLNov 22, 2019

Optimizing Data Usage via Differentiable Rewards

arXiv:1911.10088v373 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of adaptive data selection for machine learning models, offering a method to potentially speed up training and improve performance, though it appears incremental as it builds on existing reinforcement learning and gradient-based techniques.

The paper tackles the problem of efficiently selecting important training data instances during model training by proposing Differentiable Data Selection (DDS), which uses a scorer network updated with reinforcement learning based on gradient similarity to a dev set, resulting in strong improvements over baselines on machine translation and image classification tasks.

To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that "adapts" to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes