LGCCDSNEMLMay 18, 2021

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

arXiv:2105.08675v332 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical limits of neural network training efficiency for researchers in computational complexity and machine learning, but it is incremental as it builds on and extends existing literature.

The paper tackles the computational complexity of training two-layer ReLU networks, focusing on how data dimensionality affects it, and proves W[1]-hardness lower bounds and optimality of brute-force strategies under the Exponential Time Hypothesis, extending results to a broader range of loss functions including ℓ^p-loss.

Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension $d$ of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter $d$ and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including $\ell^p$-loss for all $p\in[0,\infty]$. In particular, we extend a known polynomial-time algorithm for constant $d$ and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes