Noise Sensitivity and Learning Lower Bounds for Hierarchical Functions
This provides theoretical insights into the limitations of learning hierarchical structures, which is incremental as it builds on prior work in noise sensitivity and learning theory.
The paper tackles the problem of learning hierarchical functions by analyzing their noise sensitivity, showing that if functions are far from linear, noise stability decreases exponentially with hierarchy depth. This leads to super-polynomial lower bounds for agnostic learning in statistical query models and sample complexity lower bounds for neural networks.
Recent works explore deep learning's success by examining functions or data with hierarchical structure. To study the learning complexity of functions with hierarchical structure, we study the noise stability of functions with tree hierarchical structure on independent inputs. We show that if each function in the hierarchy is $\varepsilon$-far from linear, the noise stability is exponentially small in the depth of the hierarchy. Our results have immediate applications for agnostic learning. In the Boolean setting using the results of Dachman-Soled, Feldman, Tan, Wan and Wimmer (2014), our results provide Statistical Query super-polynomial lower bounds for agnostically learning classes that are based on hierarchical functions. We also derive similar SQ lower bounds based on the indicators of crossing events in critical site percolation. These crossing events are not formally hierarchical as we define but still have some hierarchical features as studied in mathematical physics. Using the results of Abbe, Bengio, Cornacchiam, Kleinberg, Lotfi, Raghu and Zhang (2022), our results imply sample complexity lower bounds for learning hierarchical functions with gradient descent on fully connected neural networks. Finally in the Gaussian setting, using the results of Diakonikolas, Kane, Pittas and Zarifis (2021), our results provide super-polynomial lower bounds for agnostic SQ learning.