LGOCMLOct 7, 2018

Principled Deep Neural Network Training through Linear Programming

arXiv:1810.03218v326 citations
Originality Highly original
AI Analysis

This work provides a foundational theoretical framework for deep learning training, offering new insights into its structure and complexity, which is incremental in advancing theoretical understanding.

The authors tackled the theoretical understanding of deep neural network training by showing that all possible training problems for a given architecture can be encoded into a polyhedron with linear size dependency on sample-size, and used this to derive improved computational complexity results.

Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample-size. Notably, the size of the polyhedral representation depends only linearly on the sample-size, and a better dependency on several other network parameters is unlikely (assuming $P\neq NP$). Additionally, we use our polyhedral representation to obtain new and better computational complexity results for training problems of well-known neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal a strong structure arising from these problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes