Neural Conditional Gradients
This work addresses the computational and differentiability issues in constrained optimization for machine learning practitioners, offering a novel learning-based approach.
The paper tackles the challenge of constrained optimization by learning projection-free algorithms through Frank-Wolfe Networks, which outperform hand-designed and unconstrained learned optimizers in tasks like training support vector machines and softmax classifiers.
The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a learning problem based on Frank-Wolfe Networks: recurrent networks implementing the Frank-Wolfe algorithm aka. conditional gradients. This allows them to learn to exploit structure when, e.g., optimizing over rank-1 matrices. Our LSTM-learned optimizers outperform hand-designed as well learned but unconstrained ones. We demonstrate this for training support vector machines and softmax classifiers.