LG OC MLOct 7, 2018

Principled Deep Neural Network Training through Linear Programming

Daniel Bienstock, Gonzalo Muñoz, Sebastian Pokutta

arXiv:1810.03218v39.826 citations

Originality Highly original

AI Analysis

This work provides a foundational theoretical framework for deep learning training, offering new insights into its structure and complexity, which is incremental in advancing theoretical understanding.

The authors tackled the theoretical understanding of deep neural network training by showing that all possible training problems for a given architecture can be encoded into a polyhedron with linear size dependency on sample-size, and used this to derive improved computational complexity results.

Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample-size. Notably, the size of the polyhedral representation depends only linearly on the sample-size, and a better dependency on several other network parameters is unlikely (assuming $P\neq NP$). Additionally, we use our polyhedral representation to obtain new and better computational complexity results for training problems of well-known neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal a strong structure arising from these problems.

View on arXiv PDF

Similar