The informativeness of the gradient revisited
This work addresses a theoretical bottleneck in understanding gradient-based optimization for researchers in machine learning theory, though it is incremental as it builds on existing variance frameworks.
The paper tackles the problem of limited informativeness in gradients for deep learning by deriving a general bound on gradient variance in terms of target function class independence and input distribution entropy, showing it scales as O(ε + e^{-E_c/2}). It applies this bound to Learning with Errors mappings and high-frequency functions, with experiments analyzing deep learning attacks on LWE.
In the past decade gradient-based deep learning has revolutionized several applications. However, this rapid advancement has highlighted the need for a deeper theoretical understanding of its limitations. Research has shown that, in many practical learning tasks, the information contained in the gradient is so minimal that gradient-based methods require an exceedingly large number of iterations to achieve success. The informativeness of the gradient is typically measured by its variance with respect to the random selection of a target function from a hypothesis class. We use this framework and give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class and the collision entropy of the input distribution. Our bound scales as $ \tilde{\mathcal{O}}(\varepsilon+e^{-\frac{1}{2}\mathcal{E}_c}) $, where $ \tilde{\mathcal{O}} $ hides factors related to the regularity of the learning model and the loss function, $ \varepsilon $ measures the pairwise independence of the target function class and $\mathcal{E}_c$ is the collision entropy of the input distribution. To demonstrate the practical utility of our bound, we apply it to the class of Learning with Errors (LWE) mappings and high-frequency functions. In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on LWE.