Recovering the Lowest Layer of Deep Networks with High Threshold Activations
This addresses a core theoretical challenge in machine learning by extending parameter recovery guarantees to deeper networks, which is incremental as it builds on prior work for one-layer networks.
The paper tackles the problem of provably recovering the lowest layer of deep neural networks, showing that under specific assumptions (high threshold activations, polynomial upper layers, Gaussian inputs), such recovery is possible.
Giving provable guarantees for learning neural networks is a core challenge of machine learning theory. Most prior work gives parameter recovery guarantees for one hidden layer networks, however, the networks used in practice have multiple non-linear layers. In this work, we show how we can strengthen such results to deeper networks -- we address the problem of uncovering the lowest layer in a deep neural network under the assumption that the lowest layer uses a high threshold before applying the activation, the upper network can be modeled as a well-behaved polynomial and the input distribution is Gaussian.