Testing Stationarity Concepts for ReLU Networks: Hardness, Regularity, and Robust Algorithms
This addresses the challenge of verifying optimization convergence in nonsmooth neural networks, providing theoretical hardness results and practical algorithms for researchers and practitioners in machine learning.
The paper tackles the computational problem of testing stationarity for ReLU neural networks, showing that checking approximate stationarity is co-NP-hard, establishing a condition for subdifferential chain rules, and introducing a robust algorithm for near-approximate stationarity testing with no errors under certain conditions.
We study the computational problem of the stationarity test for the empirical loss of neural networks with ReLU activation functions. Our contributions are: Hardness: We show that checking a certain first-order approximate stationarity concept for a piecewise linear function is co-NP-hard. This implies that testing a certain stationarity concept for a modern nonsmooth neural network is in general computationally intractable. As a corollary, we prove that testing so-called first-order minimality for functions in abs-normal form is co-NP-complete, which was conjectured by Griewank and Walther (2019, SIAM J. Optim., vol. 29, p284). Regularity: We establish a necessary and sufficient condition for the validity of an equality-type subdifferential chain rule in terms of Clarke, Fréchet, and limiting subdifferentials of the empirical loss of two-layer ReLU networks. This new condition is simple and efficiently checkable. Robust algorithms: We introduce an algorithmic scheme to test near-approximate stationarity in terms of both Clarke and Fréchet subdifferentials. Our scheme makes no false positive or false negative error when the tested point is sufficiently close to a stationary one and a certain qualification is satisfied. This is the first practical and robust stationarity test approach for two-layer ReLU networks.