LGMLOct 18, 2019

Differentiable Combinatorial Losses through Generalized Gradients of Linear Programs

arXiv:1910.08211v41 citations
Originality Incremental advance
AI Analysis

This addresses the problem of non-differentiable combinatorial losses in machine learning for researchers and practitioners, though it builds incrementally on existing differentiable optimization techniques.

The paper tackles the mismatch between training objectives and inference goals in structured prediction problems by enabling gradient descent over combinatorial optimization algorithms expressed as linear programs. They demonstrate this approach improves sequence-to-sequence modeling and weakly supervised image classification, achieving competitive performance on standard benchmarks.

When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. The natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in sequence-to-sequence modeling using differentiable encoder-decoder architecture with softmax or Gumbel-softmax, and in image classification in a weakly supervised setting where instead of the correct class for each photo, only groups of photos labeled with correct but unordered set of classes are available during training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes