LGDec 17, 2024

On the Hardness of Training Deep Neural Networks Discretely

arXiv:2412.13057v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the theoretical complexity of neural network training for researchers in computational theory and machine learning, providing foundational insights into the hardness of discrete and deep network optimization.

The paper tackles the computational hardness of training deep neural networks with discrete parameters (D-NNT), showing that D-NNT is not in NP for deep architectures under standard complexity assumptions, and extends this to continuous NNT with structured instances, while also providing NP-hardness lower bounds for two-layer networks and a pseudo-polynomial algorithm for fixed dataset sizes.

We study neural network training (NNT): optimizing a neural network's parameters to minimize the training loss over a given dataset. NNT has been studied extensively under theoretic lenses, mainly on two-layer networks with linear or ReLU activation functions where the parameters can take any real value (here referred to as continuous NNT (C-NNT)). However, less is known about deeper neural networks, which exhibit substantially stronger capabilities in practice. In addition, the complexity of the discrete variant of the problem (D-NNT in short), in which the parameters are taken from a given finite set of options, has remained less explored despite its theoretical and practical significance. In this work, we show that the hardness of NNT is dramatically affected by the network depth. Specifically, we show that, under standard complexity assumptions, D-NNT is not in the complexity class NP even for instances with fixed dimensions and dataset size, having a deep architecture. This separates D-NNT from any NP-complete problem. Furthermore, using a polynomial reduction we show that the above result also holds for C-NNT, albeit with more structured instances. We complement these results with a comprehensive list of NP-hardness lower bounds for D-NNT on two-layer networks, showing that fixing the number of dimensions, the dataset size, or the number of neurons in the hidden layer leaves the problem challenging. Finally, we obtain a pseudo-polynomial algorithm for D-NNT on a two-layer network with a fixed dataset size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes