COCCDMLGOCNov 5, 2024

Neural Networks and (Virtual) Extended Formulations

arXiv:2411.03006v38 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses foundational theoretical limitations in neural network design for combinatorial optimization, though it is incremental in extending known concepts from polyhedral geometry.

The paper tackles the problem of proving lower bounds on the size of neural networks with piecewise linear activations by linking their representational capabilities to extension complexity, showing that this quantity provides exponential lower bounds for certain problems like maximum weight matching. It introduces virtual extension complexity as a generalization to bound general neural networks and demonstrates efficient optimization using small virtual extended formulations.

Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied quantity in combinatorial optimization and polyhedral geometry describing the number of inequalities needed to model $P$ as a linear program. We show that $\mathrm{xc}(P)$ is a lower bound on the size of any monotone or input-convex neural network that solves the linear optimization problem over $P$. This implies exponential lower bounds on such neural networks for a variety of problems, including the polynomially solvable maximum weight matching problem. In an attempt to prove similar bounds also for general neural networks, we introduce the notion of virtual extension complexity $\mathrm{vxc}(P)$, which generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of any neural network that optimizes over $P$. While it remains an open question to derive useful lower bounds on $\mathrm{vxc}(P)$, we argue that this quantity deserves to be studied independently from neural networks by proving that one can efficiently optimize over a polytope $P$ using a small virtual extended formulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes