On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds
This foundational work addresses the problem of theoretical guarantees for optimization-based machine learning models, with implications for algorithmic design, though it appears incremental in extending existing approximation theory to optimization functions.
The paper tackles the problem of understanding the expressibility and learnability of convex optimization solution functions, showing that linear and quadratic programming solution functions can universally approximate smooth model classes and achieve reduced rate-distortion with deep architectures, while providing the first rigorous analysis of their approximation and learning-theoretic properties.
We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension. The main results are: \emph{(1)} the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the $C^k$ smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, \emph{(2)} the approximation power is investigated through a viewpoint of regression error, where information about the target function is provided in terms of data observations, \emph{(3)} compositionality in the form of a deep architecture with optimization as a layer is shown to reconstruct some basic functions used in numerical analysis without error, which implies that \emph{(4)} a substantial reduction in rate-distortion can be achieved with a universal network architecture, and \emph{(5)} we discuss the statistical bounds of empirical covering numbers for LP/QP, as well as a generic optimization problem (possibly nonconvex) by exploiting tame geometry. Our results provide the \emph{first rigorous analysis of the approximation and learning-theoretic properties of solution functions} with implications for algorithmic design and performance guarantees.