MLLGFeb 11, 2018

Differentiable Dynamic Programming for Structured Prediction and Attention

arXiv:1802.03676v2153 citations
AI Analysis

This addresses a bottleneck for researchers and practitioners in machine learning who need differentiable structured prediction layers, though it is an incremental advance on existing smoothing techniques.

The paper tackles the non-differentiability of dynamic programming algorithms, which limits their integration into neural networks, by smoothing the max operator with a strongly convex regularizer to make them differentiable. This approach is applied to tasks like sequence prediction and neural machine translation, achieving improvements such as a 0.5 BLEU score gain in translation.

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes