LG CC MLNov 18, 2023

Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes

arXiv:2311.10972v15.34 citationsh-index: 25

Originality Highly original

AI Analysis

This work addresses the computational hardness of neural network training for machine learning practitioners, offering both theoretical limits and practical algorithms, though it is incremental in building on convex formulations and Max-Cut analogies.

The paper tackles the complexity of training two-layer ReLU neural networks with weight decay regularization, proving that it is NP-hard to approximate the global optimizer within a small relative error (e.g., 0.006) and providing polynomial-time approximation algorithms with guarantees (e.g., 0.253 relative error for certain datasets).

We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $ε\leq\sqrt{84/83}-1\approx 0.006$, we show that it is NP-hard to find an approximate global optimizer of the ReLU network objective with relative error $ε$ with respect to the objective value. Moreover, we develop a randomized algorithm which mirrors the Goemans-Williamson rounding of semidefinite Max-Cut relaxations. To provide polynomial-time approximations, we classify training datasets into three categories: (i) For orthogonal separable datasets, a precise solution can be obtained in polynomial-time. (ii) When there is a negative correlation between samples of different classes, we give a polynomial-time approximation with relative error $\sqrt{π/2}-1\approx 0.253$. (iii) For general datasets, the degree to which the problem can be approximated in polynomial-time is governed by a geometric factor that controls the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these results present the first polynomial-time approximation guarantees along with first hardness of approximation results for regularized ReLU networks.

View on arXiv PDF

Similar