PAC-learning gains of Turing machines over circuits and neural networks
This work addresses the need for large-scale data in deep learning by exploring universal models, but it is incremental as it focuses on theoretical bounds without empirical validation.
The paper investigates the theoretical sample efficiency gains of using polynomial-time Turing machines over Boolean circuits (representing neural networks) under the minimum description length principle, providing lower and upper bounds that depend on input bit-size and linking these gains to open problems in circuit complexity.
A caveat to many applications of the current Deep Learning approach is the need for large-scale data. One improvement suggested by Kolmogorov Complexity results is to apply the minimum description length principle with computationally universal models. We study the potential gains in sample efficiency that this approach can bring in principle. We use polynomial-time Turing machines to represent computationally universal models and Boolean circuits to represent Artificial Neural Networks (ANNs) acting on finite-precision digits. Our analysis unravels direct links between our question and Computational Complexity results. We provide lower and upper bounds on the potential gains in sample efficiency between the MDL applied with Turing machines instead of ANNs. Our bounds depend on the bit-size of the input of the Boolean function to be learned. Furthermore, we highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.